Variant Calling with GATK
Last updated
Last updated
In this section, you will learn how to perform variant calling to identify single nucleotide polymorphisms (SNPs) and small insertions and deletions (indels) from NGS data using one of the most widely used tools. Below are the main steps involved in the variant calling pipeline.
The Genome Analysis Toolkit (GATK) is a software suite created by the Broad Institute to help analyze DNA sequencing data. It is commonly used to find genetic variants, like single nucleotide changes (SNPs) and small insertions or deletions (indels), in genomic data. GATK includes tools to improve the accuracy of the data, such as fixing errors in base quality scores and aligning indels correctly. Its workflows are designed to make variant calling reliable and precise, making GATK a valuable tool for both research and clinical studies. Even if you're new to genomics, GATK's features and step-by-step guides make it easier to start working with your data.
Learn nore about GATK using following link
For an advanced understanding of the variant caller algorithm, you can watch the following video.
In order to perform variant calling yourself, you will use the Variant Calling with GATK webpage and the presentation below as helping material.
The files are stored in the following directory on the server.
Copy it to your home directory
variant_calling_files/
In the folder you will have several files:
#Reference fasta file
variant_calling_files/ref_genome/ecoli_rel606.fasta
#Raw fastq files
variant_calling_files/trimmed_fastq_small/SRR2584863_1.trim.sub.fastq
variant_calling_files/trimmed_fastq_small/SRR2584863_2.trim.sub.fastq
#File with all codes
variant_calling_files/variant_calling_script.md #which is a markdown file
variant_calling_files/variant_calling_script.pdf
Note! All the paths in the variant_calling_script.md codes are relative to the variant_calling_files folder.