# Variant Calling with GATK

In this section, you will learn how to perform variant calling to identify single nucleotide polymorphisms (SNPs) and small insertions and deletions (indels) from NGS data using one of the most widely used tools. Below are the main steps involved in the variant calling pipeline.

<figure><img src="https://3514673221-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FdDE3NSiCXcu5YQDQmgdU%2Fuploads%2FDohkfZPGSPsyd8XqUMoh%2Fvariant_calling_workflow.png?alt=media&#x26;token=d0d34f32-2009-4373-99cc-23decf3a2705" alt=""><figcaption><p>The photo is adapted form <a href="https://datacarpentry.github.io/wrangling-genomics/instructor/04-variant_calling.html">here</a></p></figcaption></figure>

## GATK

The Genome Analysis Toolkit (GATK) is a software suite created by the Broad Institute to help analyze DNA sequencing data. It is commonly used to find genetic variants, like single nucleotide changes (SNPs) and small insertions or deletions (indels), in genomic data. GATK includes tools to improve the accuracy of the data, such as fixing errors in base quality scores and aligning indels correctly. Its workflows are designed to make variant calling reliable and precise, making GATK a valuable tool for both research and clinical studies. Even if you're new to genomics, GATK's features and step-by-step guides make it easier to start working with your data.

Learn nore about GATK using following link

{% embed url="<https://gatk.broadinstitute.org/hc/en-us/articles/360036194592-Getting-started-with-GATK4>" %}

For an advanced understanding of the variant caller algorithm, you can watch the following video.

{% embed url="<https://www.youtube.com/watch?v=TwxzDHKLM58>" %}

## Practice

In order to perform variant calling yourself, you will use the [Variant Calling with GATK webpage ](https://learn.gencore.bio.nyu.edu/variant-calling/)and the presentation below as helping material.

### Materials

The files are stored in the following directory on the server.&#x20;

Copy it to your home directory

> variant\_calling\_files/

In the folder you will have several files:

> \#Reference fasta file
>
> variant\_calling\_files/ref\_genome/ecoli\_rel606.fasta
>
> \#Raw fastq files
>
> variant\_calling\_files/trimmed\_fastq\_small/SRR2584863\_1.trim.sub.fastq
>
> variant\_calling\_files/trimmed\_fastq\_small/SRR2584863\_2.trim.sub.fastq
>
> \#File with all codes
>
> variant\_calling\_files/variant\_calling\_script.md #which is a markdown file
>
> variant\_calling\_files/variant\_calling\_script.pdf&#x20;

Note! All the paths in the **variant\_calling\_script.md** codes are relative to the  v**ariant\_calling\_files** folder.&#x20;
