Read alignment: exact matching

Read alignment, also known as sequence alignment, refers to the process of mapping or aligning short sequence reads obtained from high-throughput sequencing technologies to a reference genome or transcriptome. The goal is to determine where each read likely originated from in the reference sequence. This information is important for many downstream applications, such as variant calling, i.e. identifying differences between the reads and the reference genome, and transcriptome analysis, i.e. identifying which genes are being transcribed and at what levels.

There are numerous algorithms designed for aligning reads to a genome. This section introduces fundamental exact matching algorithms, which specifically locate exact occurrences of a given string of characters within a larger body of text or identify patterns in text.

Read alignment and why it's hard

Naive exact matching

Practical: Matching artificial reads

Practical: Matching real reads

Boyer-Moore basics

Boyer-Moore: putting it all together

Diversion: Repetitive elements

Practical: Implementing Boyer-Moore

Problems to solve

Try to solve these problems after completing the section. Some of them may be challenging :)

Congratulations !

If you made it here, then congratulations! You have successfully completed this section. Move to the next portion of the guide with the arrow buttons below.

PreviousDNA strings and sequencing file formats NextIndexing before alignment

Last updated 1 year ago

Was this helpful?