ABI Bioinformatics Guide 2024
  • INTRODUCTION
    • How to use the guide
  • MOLECULAR BIOLOGY
    • The Cell
      • Cells and Their Organelles
      • Cell Specialisation
      • Quiz 1
    • Biological Molecules
      • Carbohydrates
      • Lipids
      • Nucleic Acids (DNA and RNA)
      • Quiz 2
      • Proteins
      • Catalysis of Biological Reactions
      • Quiz 3
    • Information Flow in the Cell
      • DNA Replication
      • Gene Expression: Transcription
      • Gene Expression: RNA Processing
      • Quiz 4
      • Chromatin and Chromosomes
      • Regulation of Gene Expression
      • Quiz 5
      • The Genetic Code
      • Gene Expression: Translation
    • Cell Cycle and Cell Division
      • Quiz 6
    • Mutations and Variations
      • Point mutations
      • Genotype-Phenotype Interactions
      • Quiz 7
  • PROGRAMMING
    • Python for Genomics
    • R programming (optional)
  • STATISTICS: THEORY
    • Introduction to Probability
      • Conditional Probability
      • Independent Events
    • Random Variables
      • Independent, Dependent and Controlled Variables
    • Data distribution PMF, PDF, CDF
    • Mean, Variance of a Random Variable
    • Some Common Distributions
    • Exploratory Statistics: Mean, Median, Quantiles, Variance/SD
    • Data Visualization
    • Confidence Intervals
    • Comparison tests, p-value, z-score
    • Multiple test correction: Bonferroni, FDR
    • Regression & Correlation
    • Dimentionality Reduction
      • PCA (Principal Component Analysis)
      • t-SNE (t-Distributed Stochastic Neighbor Embedding)
      • UMAP (Uniform Manifold Approximation and Projection)
    • QUIZ
  • STATISTICS & PROGRAMMING
  • BIOINFORMATICS ALGORITHMS
    • Introduction
    • DNA strings and sequencing file formats
    • Read alignment: exact matching
    • Indexing before alignment
    • Read alignment: approximate matching
    • Global and local alignment
  • NGS DATA ANALYSIS & FUNCTIONAL GENOMICS
    • Experimental Techniques
      • Polymerase Chain Reaction
      • Sanger (first generation) Sequencing Technologies
      • Next (second) Generation Sequencing technologies
      • The third generation of sequencing technologies
    • The Linux Command-line
      • Connecting to the Server
      • The Linux Command-Line For Beginners
      • The Bash Terminal
    • File formats, alignment, and genomic features
      • FASTA & FASTQ file formats
      • Basic Unix Commands for Genomics
      • Sequences and Genomic Features Part 1
      • Sequences and Genomic Features Part 2: SAMtools
      • Sequences and Genomic Features Part 3: BEDtools
    • Genetic variations & variant calling
      • Genomic Variations
      • Alignment and variant detection: Practical
      • Integrative Genomics Viewer
      • Variant Calling with GATK
    • RNA Sequencing & Gene expression
      • Gene expression and how we measure it
      • Gene expression quantification and normalization
      • Explorative analysis of gene expression
      • Differential expression analysis with DESeq2
      • Functional enrichment analysis
    • Single-cell Sequencing and Data Analysis
      • scRNA-seq Data Analysis Workflow
      • scRNA-seq Data Visualization Methods
  • FINAL REMARKS
Powered by GitBook
On this page
  • Approximate matching, Hamming and edit distance
  • Pigeonhole principle
  • Practical: Implementing the pigeonhole principle
  • Solving the edit distance problem
  • Using dynamic programming for edit distance
  • Practical: Implementing dynamic programming for edit distance
  • Edit distance for approximate matching
  • Problems to solve
  • Congratulations!

Was this helpful?

  1. BIOINFORMATICS ALGORITHMS

Read alignment: approximate matching

PreviousIndexing before alignmentNextGlobal and local alignment

Last updated 11 months ago

Was this helpful?

Often, reads do not exactly match the reference genome, owing to natural genomic variations or errors introduced during sequencing. In such instances, approximate matching algorithms facilitate identifying similarities between the query and reference sequences, enabling the alignment of reads. This approach is especially useful when working with organisms that are closely related, as their genomes may have evolved somewhat differently over time.

In this section you will learn how approximate matching algorithms work.

Approximate matching, Hamming and edit distance

Pigeonhole principle

Practical: Implementing the pigeonhole principle

Solving the edit distance problem

Using dynamic programming for edit distance

Practical: Implementing dynamic programming for edit distance

Edit distance for approximate matching

Problems to solve

Try to solve these problems after completing the section.

If these were too easy for you, try unlocking the following set of advanced problems

Congratulations!

If you made it here, then congratulations! You have successfully completed this section. Move to the next portion of the guide with the arrow buttons below.

Counting Point Mutations
Finding a Shared Motif
Enumerating Gene Orders
Enumerating k-mers Lexicographically
Longest Increasing Subsequence
k-Mer Composition
Finding a Shared Spliced Motif
Edit Distance
Edit Distance Alignment
Counting Optimal Alignments