Variant Calling with GATK

In this section, you will learn how to perform variant calling to identify single nucleotide polymorphisms (SNPs) and small insertions and deletions (indels) from NGS data using one of the most widely used tools. Below are the main steps involved in the variant calling pipeline.

The photo is adapted form here

GATK

The Genome Analysis Toolkit (GATK) is a software suite created by the Broad Institute to help analyze DNA sequencing data. It is commonly used to find genetic variants, like single nucleotide changes (SNPs) and small insertions or deletions (indels), in genomic data. GATK includes tools to improve the accuracy of the data, such as fixing errors in base quality scores and aligning indels correctly. Its workflows are designed to make variant calling reliable and precise, making GATK a valuable tool for both research and clinical studies. Even if you're new to genomics, GATK's features and step-by-step guides make it easier to start working with your data.

Learn nore about GATK using following link

For an advanced understanding of the variant caller algorithm, you can watch the following video.

Practice

In order to perform variant calling yourself, you will use the Variant Calling with GATK webpage and the presentation below as helping material.

Materials

The files are stored in the following directory on the server.

Copy it to your home directory

variant_calling_files/

In the folder you will have several files:

#Reference fasta file

variant_calling_files/ref_genome/ecoli_rel606.fasta

#Raw fastq files

variant_calling_files/trimmed_fastq_small/SRR2584863_1.trim.sub.fastq

variant_calling_files/trimmed_fastq_small/SRR2584863_2.trim.sub.fastq

#File with all codes

variant_calling_files/variant_calling_script.md #which is a markdown file

variant_calling_files/variant_calling_script.pdf

Note! All the paths in the variant_calling_script.md codes are relative to the variant_calling_files folder.

Last updated

Was this helpful?