Variant Calling with GATK

This page takes an estimated 45 minutes to complete.

In this section, you will learnt how to perform Variant calling to identify single nucleotide polymorphisms (SNPs) and small insertions and deletion (indels) from NGS data using one of the most common tools.

First, scroll through this presentation to get familiar with the general idea.

Practice

In order to perform variant calling yourself, you will use the Variant Calling with GATK webpage and the presentation below as helping material.

Data

The files are stored in the following directory on the server.

Copy them to your home directory and then unzip.

/home/shared/ngs_data/variant_calling_files.tar.gz

In the folder you will have several files:

#Reference fasta file

variant_calling_files/ref_genome/ecoli_rel606.fasta

#Raw fastq files

variant_calling_files/trimmed_fastq_small/SRR2584863_1.trim.sub.fastq

variant_calling_files/trimmed_fastq_small/SRR2584863_2.trim.sub.fastq

#File with all codes

variant_calling_files/variant_calling_script.sh

Note! All the paths in the variant_calling_script.sh codes are relative to the variant_calling_files folder.

Tutorial

  1. Copy variant_calling_files.tar.gz archive from /home/shared/ngs_data/variant_calling folder to your home directory and extract files from the archive.

  2. In the Pre-Processing step, please skip the “Setting up your environment..” parts and start from “1) Alignment”, as we already have a ready environment in our server. Please also skip the “6) Base Quality Score Recalibration” section.

  3. Read the tutorial run the commands from the variant_calling_script.sh file according to the steps in the tutorial (commands presented in this tutorial will not work in your server as the software and input data are different)

  4. In the Variant discovery section skip the “3) Annotation” part. For the visualization of the results in the IGV (Integrative genomic viewer) browser, you should download and install IGV on your computer.

  5. After successfully running all commands from the variant_calling_script.sh file you should get filtered .vcf files for SNPs and Indels. For visualization of calculated results download these files to your computer:

    • dedup_reads.bam

    • dedup_reads.bam.bai

    • filtered_snps.vcf

    • filtered_snps.vcf.idx

    • ecoli_rel606.fasta

  6. Open IGV browser in your computer and load downloaded ecoli_rel606.fasta genome via Genomes > “Load Genom from file…”.

  7. Now you can load bam and .vcf files via File > “Load from file…”. You can check mutations and navigate through the genome by inserting coordinates on top of the window (example coordinate: CP000819.1:2,446,706-2,447,373)

Congratulations!

If you made it here, then congratulations! You have successfully completed this section. Move to the next portion of the guide with the arrow buttons below.

Last updated