FASTA & FASTQ file formats

FASTA and FASTQ data formats

FASTA and FASTQ are two fundamental file formats used in bioinformatics for storing nucleotide sequences.

FASTA is a simple text-based format that stores nucleotide or protein sequences. Each entry in a FASTA file begins with a header line starting with a '>' character, followed by the sequence identifier and optional description. The subsequent lines contain the sequence data. FASTA is widely used for sequence alignment and database searches.

FASTQ is an extension of the FASTA format that includes quality scores for each nucleotide. Each entry in a FASTQ file consists of four lines: a header line starting with '@' followed by the sequence identifier, a line with the sequence data, a '+' separator line, and a line with the quality scores. FASTQ is essential for storing and analyzing raw sequencing data, providing both sequence information and the associated quality metrics.

Follow the link below for more details about FASTA and FASTQ formats

7.1 FASTA and FASTQ formats | Computational Genomics with R

PreviousFile formats, alignment, and genomic features NextBasic Unix Commands for Genomics

Last updated 7 months ago

Was this helpful?