Machine Learning
Purpose
Machine Learning is to generalize.
Classic Problem
Normal Programming: "Hello world"
Machine Learning: MNIST
Approach
Problems --> Tools--->Metrics (apply to all problems?)
Data to generalize --> Use different algorithms --> Monitor performance of algorithms and adjust
Key Words
Classification
Discrete output
Regression
Continuous numeric output
Clustering
Gradient descent, Backpropagation, Cost function,
Cross-entropy
Any loss consisting of a negative log-likelihood between the empirical distribution
defined by the training set and the probability distribution defined by model. For example,
Mean Squared Error: cross-entropy between empirical distribution and a Gaussian model
Activation function
Step function
discrete 0, 1
Sigmoid function
Tanh function
Rectified Linear function (ReLU)
Exponential linear unit (ELU)
Training data set
Train parameter
Validation data set
Tune Hyperparameter
Test data set
Bias, Variance
Linked to capacity, underfitting, overfitting
Closed-form solution
Parameter
Learned
Weight, Bias
HyperParameter
Tuned
Learning rate
number of layers
number of nueons each layer
number of iterations
Accuracy
Sensitivity
Specificity
F1-score
Kernel trick
Maximum likelihood estimation
Point estimate of variables
Bayesian estimation
Full distribution of variables
Optimization
Hill Climbing
One step along axis one time
Achieve Optimal solution for Convex problem
Problems: local maxima, ridges and alleys, plateau
Good for function complex and/or not differentiable
Gradient Descent
Vanishing/exploding gradients problems
approaches to solve: He initialization, Batch Normalization
Momentum
AdaGrad
RMSProp
Adam
Regularization
Modification to ML algorithms, intending to reduce generalization error, not training error
Example: weight decay for linear regression
Early stoppping, L1, L2, Dropout, Max-Norm, Data Augmentation
Generalize
To have small gap between training error and test error
Supervised Learning
features + labels
Nonprobabilistic SL
K-Nearest Neighbor
Decision Tree
Unsupervised Learning
features without labels
Reinforcement Learning
Learning by getting feedback from the environment
Transformer:
Modify or filter data before feeding it to learning algorithms
Preprocessing
Feature selection
Feature extraction
Dimension reduction (PCA, manifold learning)
Kernel approximation
Cross-validation schemes
K-fold
Stratified K-fold
Leave-one-out (small amount of data)
Dimension Reduction
PCA
KPCA
LLE
Math behind ML
z=wx+b
σ(z)=1/(1+e−z)
...
Concepts
Model--Train--Evaluate--Predict
Classification, Regression, Clustering, Dimension deduction
Algorithms
Linear Regression
Find optimal weights by solving normal equations
Logistic Regression
No closed-form solution. Maximizing the log-likelihood, or minimizing the negative log-likelihood using gradient descent.
Neural Network
RNN (Recurrent Neural Network)
CNN (Convolutional Neural Network)
Decision Tree
Identification Tree
Naive Bayes
Features independent of each other
Conditional Probability Model
Highly scalable, only requires small amount of training data
Linear Performance Time
Generally outperformed by other algorithms, SVM...
Support Vector Machines
For both classification and regression
Widest street to separate instances of different classes
Random Forest
Decision Tree ensemble
Test Methodologies
Leave one out LOO
for small amount of data
Data split (80/20)
Practical Guidelines for DNN
Initialization He
Activation ELU
Normalization Batch Normalization
Regularization Dropout
Optimizer Adam
Learning Rate Schedule None
Software
Tensorflow, Scikit-learn
Spark MLLib, Spark ML, Weka,
Use cases
Linear Regression
House size---> House price in a community
Naive Bayes
Document classification: separate legitimate emails from spam emails
For example, based on key words: cheap, free
Questions
When to use which algorithm(s)?
Classic Applications
Alphago vs Lee Sedol
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
Autonomous Car
Netflix movie recommendations
Image recognitions
Natural language processing
Summary
No ML algorithm is universally better than any other algorithm.
Understand data distribution, and pick proper algorithm(s).
References
Books
(One of my favorite books, highly recommended)
Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
Algorithms for Reinforcement Learning
Deep Learning (Adaptive Computation and Machine Learning series)
http://neuralnetworksanddeeplearning.com/
TensorFlow
scikit-learn
https://www.kaggle.com/
Reinforcement Learning - David Silver
http://www.wildml.com/
https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-034-artificial-intelligence-fall-2010/
Machine learning series from Luis Serrano (Very good explanations for beginners)
https://www.youtube.com/watch?v=aDW44NPhNw0
https://www.youtube.com/watch?v=BR9h47Jtqyw&t=24s
https://www.youtube.com/watch?v=2-Ol7ZB0MmU&t=7s
https://www.youtube.com/watch?v=IpGxLWOIZy4
http://scikit-learn.org/stable/tutorial/machine_learning_map/
https://blogs.sas.com/content/subconsciousmusings/2017/04/12/machine-learning-algorithm-use/
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PythonForDataScience.pdf
https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Scikit_Learn_Cheat_Sheet_Python.pdf
https://storage.googleapis.com/deepmind-media/alphago/AlphaGoNaturePaper.pdf
(AWS machine learning service)
https://aws.amazon.com/blogs/aws/sagemaker/
(Spark MLlib example)
https://stanford.edu/~rezab/sparkworkshop/slides/xiangrui.pdf
https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-0002-introduction-to-computational-thinking-and-data-science-fall-2016/
https://biomedical-engineering-online.biomedcentral.com/articles/10.1186/s12938-017-0378-z
https://iknowfirst.com/rsar-machine-learning-trading-stock-market-and-chaos
Mastering the game of Go without human knowledge