ABI Bioinformatics Guide 2024
  • INTRODUCTION
    • How to use the guide
  • MOLECULAR BIOLOGY
    • The Cell
      • Cells and Their Organelles
      • Cell Specialisation
      • Quiz 1
    • Biological Molecules
      • Carbohydrates
      • Lipids
      • Nucleic Acids (DNA and RNA)
      • Quiz 2
      • Proteins
      • Catalysis of Biological Reactions
      • Quiz 3
    • Information Flow in the Cell
      • DNA Replication
      • Gene Expression: Transcription
      • Gene Expression: RNA Processing
      • Quiz 4
      • Chromatin and Chromosomes
      • Regulation of Gene Expression
      • Quiz 5
      • The Genetic Code
      • Gene Expression: Translation
    • Cell Cycle and Cell Division
      • Quiz 6
    • Mutations and Variations
      • Point mutations
      • Genotype-Phenotype Interactions
      • Quiz 7
  • PROGRAMMING
    • Python for Genomics
    • R programming (optional)
  • STATISTICS: THEORY
    • Introduction to Probability
      • Conditional Probability
      • Independent Events
    • Random Variables
      • Independent, Dependent and Controlled Variables
    • Data distribution PMF, PDF, CDF
    • Mean, Variance of a Random Variable
    • Some Common Distributions
    • Exploratory Statistics: Mean, Median, Quantiles, Variance/SD
    • Data Visualization
    • Confidence Intervals
    • Comparison tests, p-value, z-score
    • Multiple test correction: Bonferroni, FDR
    • Regression & Correlation
    • Dimentionality Reduction
      • PCA (Principal Component Analysis)
      • t-SNE (t-Distributed Stochastic Neighbor Embedding)
      • UMAP (Uniform Manifold Approximation and Projection)
    • QUIZ
  • STATISTICS & PROGRAMMING
  • BIOINFORMATICS ALGORITHMS
    • Introduction
    • DNA strings and sequencing file formats
    • Read alignment: exact matching
    • Indexing before alignment
    • Read alignment: approximate matching
    • Global and local alignment
  • NGS DATA ANALYSIS & FUNCTIONAL GENOMICS
    • Experimental Techniques
      • Polymerase Chain Reaction
      • Sanger (first generation) Sequencing Technologies
      • Next (second) Generation Sequencing technologies
      • The third generation of sequencing technologies
    • The Linux Command-line
      • Connecting to the Server
      • The Linux Command-Line For Beginners
      • The Bash Terminal
    • File formats, alignment, and genomic features
      • FASTA & FASTQ file formats
      • Basic Unix Commands for Genomics
      • Sequences and Genomic Features Part 1
      • Sequences and Genomic Features Part 2: SAMtools
      • Sequences and Genomic Features Part 3: BEDtools
    • Genetic variations & variant calling
      • Genomic Variations
      • Alignment and variant detection: Practical
      • Integrative Genomics Viewer
      • Variant Calling with GATK
    • RNA Sequencing & Gene expression
      • Gene expression and how we measure it
      • Gene expression quantification and normalization
      • Explorative analysis of gene expression
      • Differential expression analysis with DESeq2
      • Functional enrichment analysis
    • Single-cell Sequencing and Data Analysis
      • scRNA-seq Data Analysis Workflow
      • scRNA-seq Data Visualization Methods
  • FINAL REMARKS
Powered by GitBook
On this page
  • GATK
  • Practice
  • Materials

Was this helpful?

  1. NGS DATA ANALYSIS & FUNCTIONAL GENOMICS
  2. Genetic variations & variant calling

Variant Calling with GATK

PreviousIntegrative Genomics ViewerNextRNA Sequencing & Gene expression

Last updated 5 months ago

Was this helpful?

In this section, you will learn how to perform variant calling to identify single nucleotide polymorphisms (SNPs) and small insertions and deletions (indels) from NGS data using one of the most widely used tools. Below are the main steps involved in the variant calling pipeline.

GATK

The Genome Analysis Toolkit (GATK) is a software suite created by the Broad Institute to help analyze DNA sequencing data. It is commonly used to find genetic variants, like single nucleotide changes (SNPs) and small insertions or deletions (indels), in genomic data. GATK includes tools to improve the accuracy of the data, such as fixing errors in base quality scores and aligning indels correctly. Its workflows are designed to make variant calling reliable and precise, making GATK a valuable tool for both research and clinical studies. Even if you're new to genomics, GATK's features and step-by-step guides make it easier to start working with your data.

Learn nore about GATK using following link

For an advanced understanding of the variant caller algorithm, you can watch the following video.

Practice

Materials

The files are stored in the following directory on the server.

Copy it to your home directory

variant_calling_files/

In the folder you will have several files:

#Reference fasta file

variant_calling_files/ref_genome/ecoli_rel606.fasta

#Raw fastq files

variant_calling_files/trimmed_fastq_small/SRR2584863_1.trim.sub.fastq

variant_calling_files/trimmed_fastq_small/SRR2584863_2.trim.sub.fastq

#File with all codes

variant_calling_files/variant_calling_script.md #which is a markdown file

variant_calling_files/variant_calling_script.pdf

Note! All the paths in the variant_calling_script.md codes are relative to the variant_calling_files folder.

In order to perform variant calling yourself, you will use the and the presentation below as helping material.

Variant Calling with GATK webpage
Getting started with GATK4GATK
The photo is adapted form
here
Logo