Dimensionality reduction techniques in machine learning, use of dimensionality reduction techniques in machine learning to simplify the models
Dimensionality Reduction
Creates
new features by transforming the original ones into a lower-dimensional
space.
Dimensionality
reduction is the
process of reducing the number of input variables (features) in a dataset while
preserving as much important information as possible.
In data
analytics and machine learning, datasets can have dozens, hundreds, or even
thousands of features — but not all of them are equally important. Too many
features can lead to:
- High computational cost (slower training, more memory
usage).
- Overfitting (model learns noise instead of
patterns).
- Difficulty in visualization and
interpretation
(especially beyond 3D).
“Dimensionality reduction simplifies models, removes redundancy, reduces noise, and helps visualization”.
PCA / t-SNE
/ UMAP (especially
for high-dimensional data)
· Principal Component Analysis (PCA) - Reduce dimensions by creating new uncorrelated components that explain variance. Transform features into new uncorrelated components that retain maximum variance. PCA does not use the target variable (i.e., y or labels) when reducing dimensionality. It only considers the features (X). In other words, PCA looks for directions (principal components) in the feature space that capture the most variance.
o When to use?
§ High-dimensional data.
§
When
you want to reduce dimensionality without losing much information.
Python
Example:
from
sklearn.decomposition import PCA
from
sklearn.preprocessing import StandardScaler
import
matplotlib.pyplot as plt
X_scaled = StandardScaler().fit_transform(X) # Important step
pca =
PCA(n_components=2)
X_pca =
pca.fit_transform(X_scaled)
print("Explained Variance Ratio:", pca.explained_variance_ratio_)
· t-SNE (t-Distributed Stochastic Neighbor Embedding) - Visualize complex high-dimensional data in 2D or 3D by preserving local structure.
o When to use?
§ High-dimensional data.
§
When
you want to reduce dimensionality without losing much information.
Python
Example:
from
sklearn.decomposition import PCA
from
sklearn.preprocessing import StandardScaler
import
matplotlib.pyplot as plt
X_scaled = StandardScaler().fit_transform(X) # Important step
pca =
PCA(n_components=2)
X_pca =
pca.fit_transform(X_scaled)
print("Explained Variance Ratio:", pca.explained_variance_ratio_)
· UMAP (Uniform Manifold Approximation and Projection) - Dimensionality reduction like t-SNE, but faster and preserves more global structure. Great for clustering or visualization.
o When to use?
§ Visualization or clustering of
high-dimensional data.
§
Works
better than t-SNE for larger datasets.
Python
Example:
import
umap.umap_ as umap
reducer = umap.UMAP(n_components=2, random_state=42)
X_umap =
reducer.fit_transform(X)
plt.scatter(X_umap[:, 0], X_umap[:, 1], c=y)
plt.title("UMAP
visualization")
plt.show()