# UMAP (Uniform Manifold Approximation and Projection)

**Overview:**

* **UMAP** is another non-linear dimensionality reduction technique that focuses on preserving both the local and global structure of the data.
* It constructs a high-dimensional graph and then optimizes a low-dimensional graph to be as similar as possible to the high-dimensional one.

**Key Characteristics:**

* Generally faster and more scalable than t-SNE, making it suitable for larger datasets.
* Often produces more meaningful global structure in the low-dimensional representation.
* Less sensitive to hyperparameters compared to t-SNE, with only a few parameters to tune (n\_neighbors and min\_dist).

**Applications:**

* Similar to t-SNE, UMAP is used for visualizing high-dimensional data in fields like genomics, image analysis, and natural language processing.
* It is also used as a preprocessing step for clustering and classification algorithms.

{% embed url="<https://www.youtube.com/watch?v=eN0wFzBA4Sc>" %}

{% embed url="<https://www.youtube.com/watch?v=jth4kEvJ3P8>" %}

### Comparison

| Feature            | PCA                                      | t-SNE                                          | UMAP                                           |
| ------------------ | ---------------------------------------- | ---------------------------------------------- | ---------------------------------------------- |
| **Algorithm Type** | Linear                                   | Non-linear                                     | Non-linear                                     |
| **Parameters**     | None                                     | Perplexity, learning rate                      | n\_neighbors, min\_dist                        |
| **Scalability**    | Efficient, handles large datasets        | Computationally intensive                      | Fast, scalable                                 |
| **Output**         | Linear combinations of original features | 2D or 3D embedding for visualization           | 2D or 3D embedding for visualization           |
| **Strengths**      | Simple, fast, captures variance          | Reveals clusters, good for visualization       | Fast, captures both local and global structure |
| **Weaknesses**     | Only captures linear relationships       | Computationally expensive, parameter-sensitive | Slightly complex, needs parameter tuning       |
