Sunday, January 11, 2015

Machine Learning Course Summary (Part 5)

Summary from the Stanford's Machine learning class by Andrew Ng


  • Part 1
    • Supervised vs. Unsupervised learning, Linear Regression, Logistic Regression, Gradient Descent
  • Part 2
    • Regularization, Neural Networks
  • Part 3
    • Debugging and Diagnostic, Machine Learning System Design
  • Part 4
    • Support Vector Machine, Kernels
  • Part 5
    • K-means algorithm, Principal Component Analysis (PCA) algorithm
  • Part 6
    • Anomaly detection, Multivariate Gaussian distribution
  • Part 7
    • Recommender Systems, Collaborative filtering algorithm, Mean normalization
  • Part 8
    • Stochastic gradient descent, Mini batch gradient descent, Map-reduce and data parallelism

K-means Algorithm

  • Clustering

image

  • K-means algorithm
    • Put random cluster centroids
    • Find mean and replot cluster centroids
    • repeat…

image

image

image

    • *Note – If a centroid has NO POINTS assigned to it, than eliminate that cluster centroid

image

    • If we randomly choose WRONG X values than it might get stuck in “local optima”

image

    • To solve the above random initialization problem, we run it 100 times and pick the clustering that gave lowest cost.
      • Note:
        • if K is small (between 2 to 10) than multiple random initialization will find better local optima
        • if K is large than multiple random initialization may not help or make a huge difference.

image

  • How to choose the Number of Clusters ??
    • Elbow method

image

image

Dimensionality Reduction

  • Reduce data from 2D to 1D and 3D to 2D

image

image

image

image

image

Principal Component Analysis (PCA) Algorithm

  • Data Preprocessing

image

image

  • Algorithm
    • Reduce data from n dimensions to k dimensions
    • Compute “covariance matrix”
    • Compute “eigenvectors” of matrix sigma

image

image

    • Summary

image

  • Choosing K (number of principal components)
    • Typically choose k to be smallest value:
    • 95% to 99% is a common variance value

image

    • Keep changing K and see what gives us the smallest value which gives us 99% variance.

image

Applying Principal Component Analysis (PCA)

  • Supervised Learning Speedup
    • Extract inputs
    • Apply PCA
    • Get New Training Set
    • Use logistic regression or other algorithms

image

  • Application of PCA
    • Compressions
      • Reduce memory/disk needed to store data
      • Speed up learning algorithm
    • Visualization
    • DO NOT USE PCA to prevent overfitting (to reduce the number of features), use regularization instead.
    • Before implementing PCA, first try running whatever you want to do with the original/raw data. Only if that doesn’t do what you want, then implement PCA

No comments:

AddIn