UMAP (Uniform Manifold Approximation and Projection)
Overview:
UMAP is another non-linear dimensionality reduction technique that focuses on preserving both the local and global structure of the data.
It constructs a high-dimensional graph and then optimizes a low-dimensional graph to be as similar as possible to the high-dimensional one.
Key Characteristics:
Generally faster and more scalable than t-SNE, making it suitable for larger datasets.
Often produces more meaningful global structure in the low-dimensional representation.
Less sensitive to hyperparameters compared to t-SNE, with only a few parameters to tune (n_neighbors and min_dist).
Applications:
Similar to t-SNE, UMAP is used for visualizing high-dimensional data in fields like genomics, image analysis, and natural language processing.
It is also used as a preprocessing step for clustering and classification algorithms.
Comparison
Algorithm Type
Linear
Non-linear
Non-linear
Parameters
None
Perplexity, learning rate
n_neighbors, min_dist
Scalability
Efficient, handles large datasets
Computationally intensive
Fast, scalable
Output
Linear combinations of original features
2D or 3D embedding for visualization
2D or 3D embedding for visualization
Strengths
Simple, fast, captures variance
Reveals clusters, good for visualization
Fast, captures both local and global structure
Weaknesses
Only captures linear relationships
Computationally expensive, parameter-sensitive
Slightly complex, needs parameter tuning
Last updated