Jomit's Blog: Machine Learning Course Summary (Part 5)

Sunday, January 11, 2015

Machine Learning Course Summary (Part 5)

Summary from the Stanford's Machine learning class by Andrew Ng

Part 1
- Supervised vs. Unsupervised learning, Linear Regression, Logistic Regression, Gradient Descent
Part 2
- Regularization, Neural Networks
Part 3
- Debugging and Diagnostic, Machine Learning System Design
Part 4
- Support Vector Machine, Kernels
Part 5
- K-means algorithm, Principal Component Analysis (PCA) algorithm
Part 6
- Anomaly detection, Multivariate Gaussian distribution
Part 7
- Recommender Systems, Collaborative filtering algorithm, Mean normalization
Part 8
- Stochastic gradient descent, Mini batch gradient descent, Map-reduce and data parallelism

K-means Algorithm

Clustering

K-means algorithm
- Put random cluster centroids
- Find mean and replot cluster centroids
- repeat…

*Note – If a centroid has NO POINTS assigned to it, than eliminate that cluster centroid

If we randomly choose WRONG X values than it might get stuck in “local optima”

To solve the above random initialization problem, we run it 100 times and pick the clustering that gave lowest cost.
- Note:
  - if K is small (between 2 to 10) than multiple random initialization will find better local optima
  - if K is large than multiple random initialization may not help or make a huge difference.

How to choose the Number of Clusters ??
- Elbow method

Dimensionality Reduction

Reduce data from 2D to 1D and 3D to 2D

Principal Component Analysis (PCA) Algorithm

Data Preprocessing

Algorithm
- Reduce data from n dimensions to k dimensions
- Compute “covariance matrix”
- Compute “eigenvectors” of matrix sigma

Summary

Choosing K (number of principal components)
- Typically choose k to be smallest value:
- 95% to 99% is a common variance value

Keep changing K and see what gives us the smallest value which gives us 99% variance.

Applying Principal Component Analysis (PCA)

Supervised Learning Speedup
- Extract inputs
- Apply PCA
- Get New Training Set
- Use logistic regression or other algorithms

Application of PCA
- Compressions
  - Reduce memory/disk needed to store data
  - Speed up learning algorithm
- Visualization
- DO NOT USE PCA to prevent overfitting (to reduce the number of features), use regularization instead.
- Before implementing PCA, first try running whatever you want to do with the original/raw data. Only if that doesn’t do what you want, then implement PCA

No comments:

Subscribe to: Post Comments (Atom)