ABI Bioinformatics Guide 2024
  • INTRODUCTION
    • How to use the guide
  • MOLECULAR BIOLOGY
    • The Cell
      • Cells and Their Organelles
      • Cell Specialisation
      • Quiz 1
    • Biological Molecules
      • Carbohydrates
      • Lipids
      • Nucleic Acids (DNA and RNA)
      • Quiz 2
      • Proteins
      • Catalysis of Biological Reactions
      • Quiz 3
    • Information Flow in the Cell
      • DNA Replication
      • Gene Expression: Transcription
      • Gene Expression: RNA Processing
      • Quiz 4
      • Chromatin and Chromosomes
      • Regulation of Gene Expression
      • Quiz 5
      • The Genetic Code
      • Gene Expression: Translation
    • Cell Cycle and Cell Division
      • Quiz 6
    • Mutations and Variations
      • Point mutations
      • Genotype-Phenotype Interactions
      • Quiz 7
  • PROGRAMMING
    • Python for Genomics
    • R programming (optional)
  • STATISTICS: THEORY
    • Introduction to Probability
      • Conditional Probability
      • Independent Events
    • Random Variables
      • Independent, Dependent and Controlled Variables
    • Data distribution PMF, PDF, CDF
    • Mean, Variance of a Random Variable
    • Some Common Distributions
    • Exploratory Statistics: Mean, Median, Quantiles, Variance/SD
    • Data Visualization
    • Confidence Intervals
    • Comparison tests, p-value, z-score
    • Multiple test correction: Bonferroni, FDR
    • Regression & Correlation
    • Dimentionality Reduction
      • PCA (Principal Component Analysis)
      • t-SNE (t-Distributed Stochastic Neighbor Embedding)
      • UMAP (Uniform Manifold Approximation and Projection)
    • QUIZ
  • STATISTICS & PROGRAMMING
  • BIOINFORMATICS ALGORITHMS
    • Introduction
    • DNA strings and sequencing file formats
    • Read alignment: exact matching
    • Indexing before alignment
    • Read alignment: approximate matching
    • Global and local alignment
  • NGS DATA ANALYSIS & FUNCTIONAL GENOMICS
    • Experimental Techniques
      • Polymerase Chain Reaction
      • Sanger (first generation) Sequencing Technologies
      • Next (second) Generation Sequencing technologies
      • The third generation of sequencing technologies
    • The Linux Command-line
      • Connecting to the Server
      • The Linux Command-Line For Beginners
      • The Bash Terminal
    • File formats, alignment, and genomic features
      • FASTA & FASTQ file formats
      • Basic Unix Commands for Genomics
      • Sequences and Genomic Features Part 1
      • Sequences and Genomic Features Part 2: SAMtools
      • Sequences and Genomic Features Part 3: BEDtools
    • Genetic variations & variant calling
      • Genomic Variations
      • Alignment and variant detection: Practical
      • Integrative Genomics Viewer
      • Variant Calling with GATK
    • RNA Sequencing & Gene expression
      • Gene expression and how we measure it
      • Gene expression quantification and normalization
      • Explorative analysis of gene expression
      • Differential expression analysis with DESeq2
      • Functional enrichment analysis
    • Single-cell Sequencing and Data Analysis
      • scRNA-seq Data Analysis Workflow
      • scRNA-seq Data Visualization Methods
  • FINAL REMARKS
Powered by GitBook
On this page
  • Student's t-test
  • Non-parametric tests
  • ANOVA
  • P-values
  • Z-score

Was this helpful?

  1. STATISTICS: THEORY

Comparison tests, p-value, z-score

PreviousConfidence IntervalsNextMultiple test correction: Bonferroni, FDR

Last updated 10 months ago

Was this helpful?

Comparison tests are statistical methods used to determine if there are significant differences between groups, they are crucial tools in inferential statistics, allowing researchers to draw conclusions about the populations from which their samples are drawn, and to make informed decisions based on statistical evidence.

Student's t-test

The Student's t-test is used to compare the means of two groups and is most effective when the data is normally distributed and the variances are equal.

Non-parametric tests

Non-parametric tests, such as the Mann-Whitney U test or Kruskal-Wallis test, are alternatives to t-tests that do not assume a normal distribution and are used when data does not meet parametric test assumptions.

ANOVA

ANOVA (Analysis of Variance) is used to compare the means of three or more groups. It assesses the overall variance to determine if there is at least one significant difference between group means.

P-values

P-values are a measure of the strength of evidence against the null hypothesis in statistical tests. A p-value indicates the probability of obtaining test results at least as extreme as the results observed, assuming that the null hypothesis is true. In hypothesis testing, a low p-value (typically ≤ 0.05) suggests that the null hypothesis can be rejected, indicating that there is a statistically significant difference between groups. Conversely, a high p-value suggests that there is insufficient evidence to reject the null hypothesis, implying that any observed differences may be due to random chance.

Z-score

A z-score, also known as a standard score, is a statistical measurement that describes a data point's relation to the mean of a group of values. It is expressed in terms of standard deviations from the mean.

The z-score of a data point xxx is calculated using the formula:

A positive z-score indicates that the data point is above the mean, while a negative z-score indicates that the data point is below the mean. The magnitude of the z-score reflects the number of standard deviations the data point is from the mean. A larger absolute value indicates the data point is further from the mean.

Z-scores are used for standardization, making different datasets comparable by converting data into a common scale. They also help identify outliers, as data points with z-scores beyond a certain threshold (commonly ±2 or ±3) are considered unusual. Additionally, in a normal distribution, z-scores correspond to probabilities, helping determine the likelihood of a data point occurring within a certain range.

For example, suppose we have a dataset with a mean (μ) of 50 and a standard deviation (σ) of 10. For a data point x = 70, the z-score is calculated as:

This z-score of 2 means that the data point is 2 standard deviations above the mean.

z=x−μσz = \frac{x - \mu}{\sigma}z=σx−μ​

where xxx is the value of the data point, μ\muμ is the mean of the dataset, and σ\sigmaσ is the standard deviation of the dataset.

z=70−5010=2010=2z = \frac{70 - 50}{10} = \frac{20}{10} = 2z=1070−50​=1020​=2