ABI Bioinformatics Guide 2024
  • INTRODUCTION
    • How to use the guide
  • MOLECULAR BIOLOGY
    • The Cell
      • Cells and Their Organelles
      • Cell Specialisation
      • Quiz 1
    • Biological Molecules
      • Carbohydrates
      • Lipids
      • Nucleic Acids (DNA and RNA)
      • Quiz 2
      • Proteins
      • Catalysis of Biological Reactions
      • Quiz 3
    • Information Flow in the Cell
      • DNA Replication
      • Gene Expression: Transcription
      • Gene Expression: RNA Processing
      • Quiz 4
      • Chromatin and Chromosomes
      • Regulation of Gene Expression
      • Quiz 5
      • The Genetic Code
      • Gene Expression: Translation
    • Cell Cycle and Cell Division
      • Quiz 6
    • Mutations and Variations
      • Point mutations
      • Genotype-Phenotype Interactions
      • Quiz 7
  • PROGRAMMING
    • Python for Genomics
    • R programming (optional)
  • STATISTICS: THEORY
    • Introduction to Probability
      • Conditional Probability
      • Independent Events
    • Random Variables
      • Independent, Dependent and Controlled Variables
    • Data distribution PMF, PDF, CDF
    • Mean, Variance of a Random Variable
    • Some Common Distributions
    • Exploratory Statistics: Mean, Median, Quantiles, Variance/SD
    • Data Visualization
    • Confidence Intervals
    • Comparison tests, p-value, z-score
    • Multiple test correction: Bonferroni, FDR
    • Regression & Correlation
    • Dimentionality Reduction
      • PCA (Principal Component Analysis)
      • t-SNE (t-Distributed Stochastic Neighbor Embedding)
      • UMAP (Uniform Manifold Approximation and Projection)
    • QUIZ
  • STATISTICS & PROGRAMMING
  • BIOINFORMATICS ALGORITHMS
    • Introduction
    • DNA strings and sequencing file formats
    • Read alignment: exact matching
    • Indexing before alignment
    • Read alignment: approximate matching
    • Global and local alignment
  • NGS DATA ANALYSIS & FUNCTIONAL GENOMICS
    • Experimental Techniques
      • Polymerase Chain Reaction
      • Sanger (first generation) Sequencing Technologies
      • Next (second) Generation Sequencing technologies
      • The third generation of sequencing technologies
    • The Linux Command-line
      • Connecting to the Server
      • The Linux Command-Line For Beginners
      • The Bash Terminal
    • File formats, alignment, and genomic features
      • FASTA & FASTQ file formats
      • Basic Unix Commands for Genomics
      • Sequences and Genomic Features Part 1
      • Sequences and Genomic Features Part 2: SAMtools
      • Sequences and Genomic Features Part 3: BEDtools
    • Genetic variations & variant calling
      • Genomic Variations
      • Alignment and variant detection: Practical
      • Integrative Genomics Viewer
      • Variant Calling with GATK
    • RNA Sequencing & Gene expression
      • Gene expression and how we measure it
      • Gene expression quantification and normalization
      • Explorative analysis of gene expression
      • Differential expression analysis with DESeq2
      • Functional enrichment analysis
    • Single-cell Sequencing and Data Analysis
      • scRNA-seq Data Analysis Workflow
      • scRNA-seq Data Visualization Methods
  • FINAL REMARKS
Powered by GitBook
On this page
  • Mean
  • Variance
  • Quantiles

Was this helpful?

  1. STATISTICS: THEORY

Mean, Variance of a Random Variable

Mean and variance are essential statistical measures that provide a concise summary of a random variable's distribution.

Mean

The mean, which is also referred to as the expected value or the expectation of a random variable, is a measure of central tendency in probability distributions. For discrete random variables, it is calculated by summing the product of each possible value of the variable and its corresponding probability from the PMF. Similarly, for continuous random variables, the mean is obtained by integrating the product of each possible value and its probability density from the PDF. The mean provides insight into the average value or central tendency of the distribution and is a fundamental metric for understanding the behavior of random variables. The expectation formula for discrete and continuous random variable XXX is given below respectively:

E(X)=μ=∑xxP(X=x)E(X)=μ=∫−∞∞xf(x) dxE(X) = \mu = \sum_{x} x P(X = x) \\ E(X) = \mu = \int_{-\infty}^{\infty} x f(x) \, dx E(X)=μ=x∑​xP(X=x)E(X)=μ=∫−∞∞​xf(x)dx

Examples:

  • Income Distribution:

    • Context: In economics, the mean income of a population is crucial for understanding the economic well-being of individuals and households.

    • Mean Interpretation: The mean income provides an average income level across a population, aiding in policy-making, taxation, and economic planning.

  • Gene Expression Levels:

    • Context: In genetics and molecular biology, gene expression levels can be measured across samples.

    • Mean Interpretation: The mean gene expression level indicates the average amount of mRNA or protein produced by a gene in a population of cells or organisms.

Variance

Variance measures the spread or dispersion of a probability distribution around its mean. It quantifies the degree to which values deviate from the expected value. In the case of discrete random variables, variance is computed by summing the squared differences between each value and the mean, weighted by their respective probabilities from the PMF. For continuous random variables, variance is obtained by integrating the squared differences between each value and the mean, weighted by their probability densities from the PDF. Variance is a crucial measure in assessing the variability and uncertainty within a probability distribution. The variance formula for discrete and continuous random variable XXX is given below respectively:

Var(X)=∑x(x−μ)2P(X=x)Var(X)=∫−∞∞(x−μ)2f(x) dx\text{Var}(X) = \sum_{x} (x - \mu)^2 P(X = x) \\ \text{Var}(X) = \int_{-\infty}^{\infty} (x - \mu)^2 f(x) \, dxVar(X)=x∑​(x−μ)2P(X=x)Var(X)=∫−∞∞​(x−μ)2f(x)dx

Example: Cell Growth Rates

  • Random Variable: WWW represents the growth rate of cells under specific conditions.

  • Variance Interpretation: The variance of WWW, Var(W)Var(W)Var(W), measures how growth rates vary among cells.

  • Biological Context:

    • Mean: E(W)E(W)E(W) could be the average growth rate observed in a cell culture.

    • Variance: A higher variance Var(W)Var(W)Var(W) indicates greater variability in cell growth rates, which could be influenced by genetic mutations, nutrient availability, or environmental factors.

The relationship between the mean (expected value) and variance of a random variable 𝑋 X can be expressed as follows:

Var(X)=E(X2)−[E(X)]2\text{Var}(X) = E(X^2) - [E(X)]^2Var(X)=E(X2)−[E(X)]2

Try to prove it!

Quantiles

Quantiles divide a probability distribution into equal-sized intervals, each containing a specified percentage of the data. Common quantiles include the median (50th percentile), quartiles (25th, 50th, and 75th percentiles), and percentiles (any arbitrary percentile). Quantiles provide valuable insights into the distribution of data, allowing for comparisons, summarizations, and identification of extreme values. They are particularly useful in understanding the spread and shape of a distribution, aiding in decision-making and risk assessment.

PreviousData distribution PMF, PDF, CDFNextSome Common Distributions

Last updated 11 months ago

Was this helpful?