# Variant Calling with GATK

In this section, you will learn how to perform variant calling to identify single nucleotide polymorphisms (SNPs) and small insertions and deletions (indels) from NGS data using one of the most widely used tools. Below are the main steps involved in the variant calling pipeline.

<figure><img src="/files/JyG0FUyxdauDunpF6ZDn" alt=""><figcaption><p>The photo is adapted form <a href="https://datacarpentry.github.io/wrangling-genomics/instructor/04-variant_calling.html">here</a></p></figcaption></figure>

## GATK

The Genome Analysis Toolkit (GATK) is a software suite created by the Broad Institute to help analyze DNA sequencing data. It is commonly used to find genetic variants, like single nucleotide changes (SNPs) and small insertions or deletions (indels), in genomic data. GATK includes tools to improve the accuracy of the data, such as fixing errors in base quality scores and aligning indels correctly. Its workflows are designed to make variant calling reliable and precise, making GATK a valuable tool for both research and clinical studies. Even if you're new to genomics, GATK's features and step-by-step guides make it easier to start working with your data.

Learn nore about GATK using following link

{% embed url="<https://gatk.broadinstitute.org/hc/en-us/articles/360036194592-Getting-started-with-GATK4>" %}

For an advanced understanding of the variant caller algorithm, you can watch the following video.

{% embed url="<https://www.youtube.com/watch?v=TwxzDHKLM58>" %}

## Practice

In order to perform variant calling yourself, you will use the [Variant Calling with GATK webpage ](https://learn.gencore.bio.nyu.edu/variant-calling/)and the presentation below as helping material.

### Materials

The files are stored in the following directory on the server.&#x20;

Copy it to your home directory

> variant\_calling\_files/

In the folder you will have several files:

> \#Reference fasta file
>
> variant\_calling\_files/ref\_genome/ecoli\_rel606.fasta
>
> \#Raw fastq files
>
> variant\_calling\_files/trimmed\_fastq\_small/SRR2584863\_1.trim.sub.fastq
>
> variant\_calling\_files/trimmed\_fastq\_small/SRR2584863\_2.trim.sub.fastq
>
> \#File with all codes
>
> variant\_calling\_files/variant\_calling\_script.md #which is a markdown file
>
> variant\_calling\_files/variant\_calling\_script.pdf&#x20;

Note! All the paths in the **variant\_calling\_script.md** codes are relative to the  v**ariant\_calling\_files** folder.&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://edu.abi.am/ngs-data-analysis-and-functional-genomics/genetic-variations-and-variant-calling/variant-calling-with-gatk.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
