Summary from the Stanford's Machine learning class by Andrew Ng
- Part 1
- Supervised vs. Unsupervised learning, Linear Regression, Logistic Regression, Gradient Descent
- Part 2
- Regularization, Neural Networks
- Part 3
- Debugging and Diagnostic, Machine Learning System Design
- Part 4
- Support Vector Machine, Kernels
- Part 5
- K-means algorithm, Principal Component Analysis (PCA) algorithm
- Part 6
- Anomaly detection, Multivariate Gaussian distribution
- Part 7
- Recommender Systems, Collaborative filtering algorithm, Mean normalization
- Part 8
- Stochastic gradient descent, Mini batch gradient descent, Map-reduce and data parallelism
K-means Algorithm
- Clustering
- K-means algorithm
- Put random cluster centroids
- Find mean and replot cluster centroids
- repeat…
- *Note – If a centroid has NO POINTS assigned to it, than eliminate that cluster centroid
- If we randomly choose WRONG X values than it might get stuck in “local optima”
- To solve the above random initialization problem, we run it 100 times and pick the clustering that gave lowest cost.
- Note:
- if K is small (between 2 to 10) than multiple random initialization will find better local optima
- if K is large than multiple random initialization may not help or make a huge difference.
- Note:
- How to choose the Number of Clusters ??
- Elbow method
Dimensionality Reduction
-
Reduce data from 2D to 1D and 3D to 2D
Principal Component Analysis (PCA) Algorithm
- Data Preprocessing
- Algorithm
- Reduce data from n dimensions to k dimensions
- Compute “covariance matrix”
- Compute “eigenvectors” of matrix sigma
- Summary
- Choosing K (number of principal components)
- Typically choose k to be smallest value:
- 95% to 99% is a common variance value
- Keep changing K and see what gives us the smallest value which gives us 99% variance.
Applying Principal Component Analysis (PCA)
- Supervised Learning Speedup
- Extract inputs
- Apply PCA
- Get New Training Set
- Use logistic regression or other algorithms
- Application of PCA
- Compressions
- Reduce memory/disk needed to store data
- Speed up learning algorithm
- Visualization
- DO NOT USE PCA to prevent overfitting (to reduce the number of features), use regularization instead.
- Before implementing PCA, first try running whatever you want to do with the original/raw data. Only if that doesn’t do what you want, then implement PCA
- Compressions
No comments:
Post a Comment