ABI Bioinformatics Guide 2024
  • INTRODUCTION
    • How to use the guide
  • MOLECULAR BIOLOGY
    • The Cell
      • Cells and Their Organelles
      • Cell Specialisation
      • Quiz 1
    • Biological Molecules
      • Carbohydrates
      • Lipids
      • Nucleic Acids (DNA and RNA)
      • Quiz 2
      • Proteins
      • Catalysis of Biological Reactions
      • Quiz 3
    • Information Flow in the Cell
      • DNA Replication
      • Gene Expression: Transcription
      • Gene Expression: RNA Processing
      • Quiz 4
      • Chromatin and Chromosomes
      • Regulation of Gene Expression
      • Quiz 5
      • The Genetic Code
      • Gene Expression: Translation
    • Cell Cycle and Cell Division
      • Quiz 6
    • Mutations and Variations
      • Point mutations
      • Genotype-Phenotype Interactions
      • Quiz 7
  • PROGRAMMING
    • Python for Genomics
    • R programming (optional)
  • STATISTICS: THEORY
    • Introduction to Probability
      • Conditional Probability
      • Independent Events
    • Random Variables
      • Independent, Dependent and Controlled Variables
    • Data distribution PMF, PDF, CDF
    • Mean, Variance of a Random Variable
    • Some Common Distributions
    • Exploratory Statistics: Mean, Median, Quantiles, Variance/SD
    • Data Visualization
    • Confidence Intervals
    • Comparison tests, p-value, z-score
    • Multiple test correction: Bonferroni, FDR
    • Regression & Correlation
    • Dimentionality Reduction
      • PCA (Principal Component Analysis)
      • t-SNE (t-Distributed Stochastic Neighbor Embedding)
      • UMAP (Uniform Manifold Approximation and Projection)
    • QUIZ
  • STATISTICS & PROGRAMMING
  • BIOINFORMATICS ALGORITHMS
    • Introduction
    • DNA strings and sequencing file formats
    • Read alignment: exact matching
    • Indexing before alignment
    • Read alignment: approximate matching
    • Global and local alignment
  • NGS DATA ANALYSIS & FUNCTIONAL GENOMICS
    • Experimental Techniques
      • Polymerase Chain Reaction
      • Sanger (first generation) Sequencing Technologies
      • Next (second) Generation Sequencing technologies
      • The third generation of sequencing technologies
    • The Linux Command-line
      • Connecting to the Server
      • The Linux Command-Line For Beginners
      • The Bash Terminal
    • File formats, alignment, and genomic features
      • FASTA & FASTQ file formats
      • Basic Unix Commands for Genomics
      • Sequences and Genomic Features Part 1
      • Sequences and Genomic Features Part 2: SAMtools
      • Sequences and Genomic Features Part 3: BEDtools
    • Genetic variations & variant calling
      • Genomic Variations
      • Alignment and variant detection: Practical
      • Integrative Genomics Viewer
      • Variant Calling with GATK
    • RNA Sequencing & Gene expression
      • Gene expression and how we measure it
      • Gene expression quantification and normalization
      • Explorative analysis of gene expression
      • Differential expression analysis with DESeq2
      • Functional enrichment analysis
    • Single-cell Sequencing and Data Analysis
      • scRNA-seq Data Analysis Workflow
      • scRNA-seq Data Visualization Methods
  • FINAL REMARKS
Powered by GitBook
On this page
  • Uniform distribution for discrete random variable
  • Uniform distribution for continuous random variable
  • Binomial distribution
  • Normal distribution
  • Student's t-Distribution
  • Poisson Distribution

Was this helpful?

  1. STATISTICS: THEORY

Some Common Distributions

PreviousMean, Variance of a Random VariableNextExploratory Statistics: Mean, Median, Quantiles, Variance/SD

Last updated 11 months ago

Was this helpful?

Uniform distribution for discrete random variable

A uniform distribution for a discrete random variable signifies a scenario where each possible outcome has an equal probability of occurring. This implies that the probability mass function (PMF) assigns the same value to each outcome within a finite set.

For instance, consider rolling a fair six-sided die; if each face has an equal chance of appearing, this would lead to a uniform distribution among the six possible outcomes.

Mathematically, if X represents the discrete random variable, and the outcomes are

x1,x2,...,xnx₁, x₂, ..., xₙx1​,x2​,...,xn​, then the PMF is given by the following formula

P(X=xi)=1n for all iP(X = xᵢ) = \frac{1}{n} \ for\ all \ iP(X=xi​)=n1​ for all i,

where n is the total number of possible outcomes.

Uniform distribution for continuous random variable

A uniform distribution for a continuous random variable describes a situation where the probability density function (PDF) remains constant within a specified interval. The PDF of a continuous uniform distribution is constant, indicating a constant probability density across the interval. Mathematically, if XX Xis a continuous random variable distributed uniformly over the interval [a,b][a, b][a,b], then the probability density function is

f(x)=1(b−a) for a≤x≤b.f(x) = \frac{1}{(b - a)} \ for \ a ≤ x ≤ b.f(x)=(b−a)1​ for a≤x≤b.

Binomial distribution

The binomial distribution models the number of successes in a fixed number of independent Bernoulli trials, where each trial has the same probability of success, denoted by p. Commonly used in scenarios such as coin flips or product defect rates, the binomial distribution's probability mass function (PMF) calculates the likelihood of observing a specific number of successes in a fixed number of trials. The key parameters are the number of trials (n)(n) (n) and the probability of success (p)(p) (p)on each trial.

Mathematically, if X represents the number of successes in n trials, then the probability mass function is:

P(X=k)=(nk)⋅pk⋅(1−p)n−kP(X = k) = \binom{n}{k} \cdot p^k \cdot (1-p)^{n-k}P(X=k)=(kn​)⋅pk⋅(1−p)n−k,

where (nk)\binom{n}{k} (kn​) denotes the binomial coefficient.

P.S. if you're unfamiliar with statistical tests, feel free to skip the tests part of the video for now and return to it later :)

Normal distribution

The normal distribution, also known as the Gaussian distribution, is the most common distribution in statistics. The normal distribution is characterized by a symmetric bell-shaped curve. In a normal distribution, the distribution is entirely defined by its mean (μ) and standard deviation (σ). Many natural phenomena, tend to follow a normal distribution.

Its probability density function is described by the formula

f(x)=1σ2π e−(x−μ)22σ2f(x) = \frac{1}{\sigma \sqrt{2 \pi}} \, e^{-\frac{(x - \mu)^2}{2 \sigma^2}}f(x)=σ2π​1​e−2σ2(x−μ)2​

If you feel comfortable with mathematics and want to watch something more advanced consider this video⬇️

Student's t-Distribution

The Student's t-distribution (often referred to simply as the t-distribution) is a probability distribution that arises in statistical inference when the sample size is small or when the population variance is unknown. It is widely used in hypothesis testing, confidence interval estimation, and linear regression analysis. The shape of the t-distribution depends on a parameter known as degrees of freedom, which is related to the sample size. As the degrees of freedom increase, the t-distribution approaches the standard normal distribution. However, for smaller sample sizes, the t-distribution has heavier tails, making it more robust against outliers compared to the normal distribution. Understanding the t-distribution is crucial for conducting accurate statistical analyses, especially when dealing with small samples or unknown population variances.

Poisson Distribution

The Poisson distribution is employed to model the number of events occurring within a fixed interval of time (or space), given a known average rate of occurrence. It is particularly useful in scenarios where events happen independently and at a constant average rate, such as the number of customer arrivals in a queue or the number of emails received per day. The Poisson distribution's probability mass function calculates the likelihood of observing a specific number of events in a given interval. Its key parameter is lambda, representing the average rate of occurrence. Mathematically, if X represents the number of events occurring in a fixed interval, then the probability mass function is:

P(X=k)=e−λλkk!P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}P(X=k)=k!e−λλk​

where k is a non-negative integer.