ABI Bioinformatics Guide 2024
  • INTRODUCTION
    • How to use the guide
  • MOLECULAR BIOLOGY
    • The Cell
      • Cells and Their Organelles
      • Cell Specialisation
      • Quiz 1
    • Biological Molecules
      • Carbohydrates
      • Lipids
      • Nucleic Acids (DNA and RNA)
      • Quiz 2
      • Proteins
      • Catalysis of Biological Reactions
      • Quiz 3
    • Information Flow in the Cell
      • DNA Replication
      • Gene Expression: Transcription
      • Gene Expression: RNA Processing
      • Quiz 4
      • Chromatin and Chromosomes
      • Regulation of Gene Expression
      • Quiz 5
      • The Genetic Code
      • Gene Expression: Translation
    • Cell Cycle and Cell Division
      • Quiz 6
    • Mutations and Variations
      • Point mutations
      • Genotype-Phenotype Interactions
      • Quiz 7
  • PROGRAMMING
    • Python for Genomics
    • R programming (optional)
  • STATISTICS: THEORY
    • Introduction to Probability
      • Conditional Probability
      • Independent Events
    • Random Variables
      • Independent, Dependent and Controlled Variables
    • Data distribution PMF, PDF, CDF
    • Mean, Variance of a Random Variable
    • Some Common Distributions
    • Exploratory Statistics: Mean, Median, Quantiles, Variance/SD
    • Data Visualization
    • Confidence Intervals
    • Comparison tests, p-value, z-score
    • Multiple test correction: Bonferroni, FDR
    • Regression & Correlation
    • Dimentionality Reduction
      • PCA (Principal Component Analysis)
      • t-SNE (t-Distributed Stochastic Neighbor Embedding)
      • UMAP (Uniform Manifold Approximation and Projection)
    • QUIZ
  • STATISTICS & PROGRAMMING
  • BIOINFORMATICS ALGORITHMS
    • Introduction
    • DNA strings and sequencing file formats
    • Read alignment: exact matching
    • Indexing before alignment
    • Read alignment: approximate matching
    • Global and local alignment
  • NGS DATA ANALYSIS & FUNCTIONAL GENOMICS
    • Experimental Techniques
      • Polymerase Chain Reaction
      • Sanger (first generation) Sequencing Technologies
      • Next (second) Generation Sequencing technologies
      • The third generation of sequencing technologies
    • The Linux Command-line
      • Connecting to the Server
      • The Linux Command-Line For Beginners
      • The Bash Terminal
    • File formats, alignment, and genomic features
      • FASTA & FASTQ file formats
      • Basic Unix Commands for Genomics
      • Sequences and Genomic Features Part 1
      • Sequences and Genomic Features Part 2: SAMtools
      • Sequences and Genomic Features Part 3: BEDtools
    • Genetic variations & variant calling
      • Genomic Variations
      • Alignment and variant detection: Practical
      • Integrative Genomics Viewer
      • Variant Calling with GATK
    • RNA Sequencing & Gene expression
      • Gene expression and how we measure it
      • Gene expression quantification and normalization
      • Explorative analysis of gene expression
      • Differential expression analysis with DESeq2
      • Functional enrichment analysis
    • Single-cell Sequencing and Data Analysis
      • scRNA-seq Data Analysis Workflow
      • scRNA-seq Data Visualization Methods
  • FINAL REMARKS
Powered by GitBook
On this page
  • Read alignment and why it's hard
  • Naive exact matching
  • Practical: Matching artificial reads
  • Practical: Matching real reads
  • Boyer-Moore basics
  • Boyer-Moore: putting it all together
  • Diversion: Repetitive elements
  • Practical: Implementing Boyer-Moore
  • Problems to solve
  • Congratulations !

Was this helpful?

  1. BIOINFORMATICS ALGORITHMS

Read alignment: exact matching

PreviousDNA strings and sequencing file formatsNextIndexing before alignment

Last updated 10 months ago

Was this helpful?

Read alignment, also known as sequence alignment, refers to the process of mapping or aligning short sequence reads obtained from high-throughput sequencing technologies to a reference genome or transcriptome. The goal is to determine where each read likely originated from in the reference sequence. This information is important for many downstream applications, such as variant calling, i.e. identifying differences between the reads and the reference genome, and transcriptome analysis, i.e. identifying which genes are being transcribed and at what levels.

There are numerous algorithms designed for aligning reads to a genome. This section introduces fundamental exact matching algorithms, which specifically locate exact occurrences of a given string of characters within a larger body of text or identify patterns in text.

Read alignment and why it's hard

Naive exact matching

Practical: Matching artificial reads

Practical: Matching real reads

Boyer-Moore basics

Boyer-Moore: putting it all together

Diversion: Repetitive elements

Practical: Implementing Boyer-Moore

Problems to solve

Try to solve these problems after completing the section. Some of them may be challenging :)

Congratulations !

If you made it here, then congratulations! You have successfully completed this section. Move to the next portion of the guide with the arrow buttons below.

Finding a Motif in DNA
Open Reading Frames
RNA Splicing