ABI Bioinformatics Guide 2024
  • INTRODUCTION
    • How to use the guide
  • MOLECULAR BIOLOGY
    • The Cell
      • Cells and Their Organelles
      • Cell Specialisation
      • Quiz 1
    • Biological Molecules
      • Carbohydrates
      • Lipids
      • Nucleic Acids (DNA and RNA)
      • Quiz 2
      • Proteins
      • Catalysis of Biological Reactions
      • Quiz 3
    • Information Flow in the Cell
      • DNA Replication
      • Gene Expression: Transcription
      • Gene Expression: RNA Processing
      • Quiz 4
      • Chromatin and Chromosomes
      • Regulation of Gene Expression
      • Quiz 5
      • The Genetic Code
      • Gene Expression: Translation
    • Cell Cycle and Cell Division
      • Quiz 6
    • Mutations and Variations
      • Point mutations
      • Genotype-Phenotype Interactions
      • Quiz 7
  • PROGRAMMING
    • Python for Genomics
    • R programming (optional)
  • STATISTICS: THEORY
    • Introduction to Probability
      • Conditional Probability
      • Independent Events
    • Random Variables
      • Independent, Dependent and Controlled Variables
    • Data distribution PMF, PDF, CDF
    • Mean, Variance of a Random Variable
    • Some Common Distributions
    • Exploratory Statistics: Mean, Median, Quantiles, Variance/SD
    • Data Visualization
    • Confidence Intervals
    • Comparison tests, p-value, z-score
    • Multiple test correction: Bonferroni, FDR
    • Regression & Correlation
    • Dimentionality Reduction
      • PCA (Principal Component Analysis)
      • t-SNE (t-Distributed Stochastic Neighbor Embedding)
      • UMAP (Uniform Manifold Approximation and Projection)
    • QUIZ
  • STATISTICS & PROGRAMMING
  • BIOINFORMATICS ALGORITHMS
    • Introduction
    • DNA strings and sequencing file formats
    • Read alignment: exact matching
    • Indexing before alignment
    • Read alignment: approximate matching
    • Global and local alignment
  • NGS DATA ANALYSIS & FUNCTIONAL GENOMICS
    • Experimental Techniques
      • Polymerase Chain Reaction
      • Sanger (first generation) Sequencing Technologies
      • Next (second) Generation Sequencing technologies
      • The third generation of sequencing technologies
    • The Linux Command-line
      • Connecting to the Server
      • The Linux Command-Line For Beginners
      • The Bash Terminal
    • File formats, alignment, and genomic features
      • FASTA & FASTQ file formats
      • Basic Unix Commands for Genomics
      • Sequences and Genomic Features Part 1
      • Sequences and Genomic Features Part 2: SAMtools
      • Sequences and Genomic Features Part 3: BEDtools
    • Genetic variations & variant calling
      • Genomic Variations
      • Alignment and variant detection: Practical
      • Integrative Genomics Viewer
      • Variant Calling with GATK
    • RNA Sequencing & Gene expression
      • Gene expression and how we measure it
      • Gene expression quantification and normalization
      • Explorative analysis of gene expression
      • Differential expression analysis with DESeq2
      • Functional enrichment analysis
    • Single-cell Sequencing and Data Analysis
      • scRNA-seq Data Analysis Workflow
      • scRNA-seq Data Visualization Methods
  • FINAL REMARKS
Powered by GitBook
On this page
  • Example 1: Flipping a Coin
  • Example 2: Rolling a Die
  • Example 3: Detecting the Presence of a Gene Variant

Was this helpful?

  1. STATISTICS: THEORY

Introduction to Probability

Probability Theory is a fundamental branch of mathematics concerned with quantifying the likelihood of events. Probability answers the question: "How likely is it that a particular outcome will happen?”

This concept is expressed as a number between 0 and 1, where 0 indicates impossibility of an event to occur and 1 indicates certainty of an event happening.

In probability theory, a probability space includes three key components:

  1. Sample Space (S): the set of all possible outcomes of a particular experiment

  2. Sigma-Algebra (F): representing subsets of the sample space for which probabilities are defined

  3. Probability Function (P): assigning probabilities to these subsets.

The sample space, encompasses all possible outcomes of a random experiment. The sigma-algebra consists of subsets of the sample space, ensuring that certain properties hold true under probability measures, such as closure under complementation and countable unions. Lastly, the probability function assigns probabilities to these subsets, reflecting the likelihood of different outcomes or events occurring within the sample space. These 3 components form the foundational framework for analyzing uncertainty and randomness in various statistical contexts.

Example 1: Flipping a Coin

Sample Space (S): The set of all possible outcomes.

  • S={Heads,Tails}S=\{Heads,Tails\}S={Heads,Tails}

Sigma-Algebra (F): The collection of all subsets of the sample space including the empty set.

  • F={∅,{Heads},{Tails},{Heads,Tails}}F = \{ \emptyset, \{\text{Heads}\}, \{\text{Tails}\}, \{\text{Heads}, \text{Tails}\} \}F={∅,{Heads},{Tails},{Heads,Tails}}

Probability Function (P): Assigns a probability to each subset in the sigma-algebra.

  • P(∅)=0P(∅)=0P(∅)=0

  • P({Heads})=12P(\{\text{Heads}\}) = \frac{1}{2}P({Heads})=21​

  • P({Tails})=12P(\{\text{Tails}\}) = \frac{1}{2}P({Tails})=21​

  • P({Heads, Tails})=1P(\{\text{Heads, Tails}\}) = {1}P({Heads, Tails})=1

Example 2: Rolling a Die

Sample Space (S): The set of all possible outcomes.

  • S={1,2,3,4,5,6}S = \{ 1, 2, 3, 4, 5, 6 \}S={1,2,3,4,5,6}

Sigma-Algebra (F): The collection of all subsets of the sample space including the empty set.

  • F={∅,{1},{2},…,{6},{1,2},{1,3},…,{1,2,3,4,5,6}}F = \{ \emptyset, \{1\}, \{2\}, \ldots, \{6\}, \{1, 2\}, \{1, 3\}, \ldots, \{1, 2, 3, 4, 5, 6\} \}F={∅,{1},{2},…,{6},{1,2},{1,3},…,{1,2,3,4,5,6}}

Probability Function (P): Assigns a probability to each subset in the sigma-algebra.

  • P(∅)=0P(\emptyset) = 0P(∅)=0

  • P({1})=16P(\{1\}) = \frac{1}{6}P({1})=61​

  • P({2})=16P(\{2\}) = \frac{1}{6}P({2})=61​​

  • ⋮\vdots⋮

  • P({1,2})=26P(\{1, 2\}) = \frac{2}{6}P({1,2})=62​

  • ⋮\vdots⋮

  • P({1,2,3,4,5,6})=1P(\{1, 2, 3, 4, 5, 6\}) = 1P({1,2,3,4,5,6})=1

Example 3: Detecting the Presence of a Gene Variant

Sample Space (S): The set of all possible genotypes at a particular genetic locus.

  • S={AA,Aa,aa}S = \{ \text{AA}, \text{Aa}, \text{aa} \}S={AA,Aa,aa}

Sigma-Algebra (F): The collection of all subsets of the sample space.

  • F={∅,{AA},{Aa},{aa},{AA,Aa},{AA,aa},{Aa,aa},{AA,Aa,aa}}F = \{ \emptyset, \{\text{AA}\}, \{\text{Aa}\}, \{\text{aa}\}, \{\text{AA}, \text{Aa}\}, \{\text{AA}, \text{aa}\}, \{\text{Aa}, \text{aa}\}, \{\text{AA}, \text{Aa}, \text{aa}\} \}F={∅,{AA},{Aa},{aa},{AA,Aa},{AA,aa},{Aa,aa},{AA,Aa,aa}}

Probability Function (P): Assigns a probability to each subset in the sigma-algebra, typically based on population genetics models such as Hardy-Weinberg equilibrium.

  • P(∅)=0P(\emptyset) = 0P(∅)=0

  • P({AA})=p2P(\{\text{AA}\}) = p^2P({AA})=p2

  • P({Aa})=2pqP(\{\text{Aa}\}) = 2pqP({Aa})=2pq

  • P({aa})=q2P(\{\text{aa}\}) = q^2P({aa})=q2

  • P({AA,Aa})=p2+2pqP(\{\text{AA}, \text{Aa}\}) = p^2 + 2pq P({AA,Aa})=p2+2pq

  • P({AA,aa})=p2+q2P(\{\text{AA}, \text{aa}\}) = p^2 + q^2P({AA,aa})=p2+q2

  • P({Aa,aa})=2pq+q2P(\{\text{Aa}, \text{aa}\}) = 2pq + q^2 P({Aa,aa})=2pq+q2

  • P({AA,Aa,aa})=1P(\{\text{AA}, \text{Aa}, \text{aa}\}) = 1P({AA,Aa,aa})=1

where ppp is the frequency of the dominant allele AAA and qqq is the frequency of the recessive allele aaa in the population, with p+q=1p+q=1p+q=1.

PreviousSTATISTICS: THEORYNextConditional Probability

Last updated 11 months ago

Was this helpful?

Page cover image