Java + Cloud + AI: 2017

Machine Learning

Purpose
Machine Learning is to generalize.

Classic Problem
Normal Programming: "Hello world"
Machine Learning: MNIST

Approach
Problems --> Tools--->Metrics (apply to all problems?)
Data to generalize --> Use different algorithms --> Monitor performance of algorithms and adjust

Key Words
Classification
Discrete output
Regression
Continuous numeric output
Clustering

Gradient descent, Backpropagation, Cost function,
Cross-entropy
Any loss consisting of a negative log-likelihood between the empirical distribution
defined by the training set and the probability distribution defined by model. For example,
Mean Squared Error: cross-entropy between empirical distribution and a Gaussian model

Activation function
Step function
discrete 0, 1
Sigmoid function

Tanh function

Rectified Linear function (ReLU)

Exponential linear unit (ELU)

Training data set
Train parameter
Validation data set
Tune Hyperparameter
Test data set

Bias, Variance
Linked to capacity, underfitting, overfitting

Closed-form solution

Parameter
Learned
Weight, Bias
HyperParameter
Tuned
Learning rate
number of layers
number of nueons each layer
number of iterations

Accuracy
Sensitivity
Specificity
F1-score

Kernel trick
Maximum likelihood estimation
Point estimate of variables

Bayesian estimation
Full distribution of variables

Optimization
Hill Climbing
One step along axis one time
Achieve Optimal solution for Convex problem
Problems: local maxima, ridges and alleys, plateau
Good for function complex and/or not differentiable

Gradient Descent
Vanishing/exploding gradients problems
approaches to solve: He initialization, Batch Normalization

Momentum

AdaGrad

RMSProp

Adam

Regularization
Modification to ML algorithms, intending to reduce generalization error, not training error
Example: weight decay for linear regression
Early stoppping, L1, L2, Dropout, Max-Norm, Data Augmentation

Generalize
To have small gap between training error and test error
Supervised Learning
features + labels
Nonprobabilistic SL
K-Nearest Neighbor
Decision Tree
Unsupervised Learning
features without labels
Reinforcement Learning
Learning by getting feedback from the environment

Transformer:
Modify or filter data before feeding it to learning algorithms
Preprocessing
Feature selection
Feature extraction
Dimension reduction (PCA, manifold learning)
Kernel approximation

Cross-validation schemes
K-fold
Stratified K-fold
Leave-one-out (small amount of data)

Dimension Reduction
PCA
KPCA
LLE

Math behind ML
  z=wx+b
σ(z)=1/(1+e−z)
...

Concepts
Model--Train--Evaluate--Predict
Classification, Regression, Clustering, Dimension deduction

Algorithms
Linear Regression
Find optimal weights by solving normal equations

Logistic Regression
No closed-form solution. Maximizing the log-likelihood, or minimizing the negative log-likelihood using gradient descent.

Neural Network
RNN (Recurrent Neural Network)
CNN (Convolutional Neural Network)

Decision Tree

Identification Tree

Naive Bayes
Features independent of each other
Conditional Probability Model
Highly scalable, only requires small amount of training data
Linear Performance Time
Generally outperformed by other algorithms, SVM...

Support Vector Machines
For both classification and regression
Widest street to separate instances of different classes

Random Forest
Decision Tree ensemble

Test Methodologies
Leave one out LOO
for small amount of data

Data split (80/20)

Practical Guidelines for DNN
Initialization He
Activation ELU
Normalization Batch Normalization
Regularization Dropout
Optimizer Adam
Learning Rate Schedule None

Software
Tensorflow, Scikit-learn
Spark MLLib, Spark ML, Weka,

Use cases
Linear Regression
House size---> House price in a community

Naive Bayes
Document classification: separate legitimate emails from spam emails
For example, based on key words: cheap, free

Questions
When to use which algorithm(s)?

Classic Applications
Alphago vs Lee Sedol
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol

Autonomous Car

Netflix movie recommendations

Image recognitions

Natural language processing

Summary
No ML algorithm is universally better than any other algorithm.
Understand data distribution, and pick proper algorithm(s).

References

Books
(One of my favorite books, highly recommended)
Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

  Algorithms for Reinforcement Learning

Deep Learning (Adaptive Computation and Machine Learning series)

http://neuralnetworksanddeeplearning.com/

  TensorFlow

  scikit-learn

  https://www.kaggle.com/

  Reinforcement Learning - David Silver

http://www.wildml.com/

  https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-034-artificial-intelligence-fall-2010/

Machine learning series from Luis Serrano (Very good explanations for beginners)
  https://www.youtube.com/watch?v=aDW44NPhNw0
  https://www.youtube.com/watch?v=BR9h47Jtqyw&t=24s
  https://www.youtube.com/watch?v=2-Ol7ZB0MmU&t=7s
  https://www.youtube.com/watch?v=IpGxLWOIZy4

  http://scikit-learn.org/stable/tutorial/machine_learning_map/

  https://blogs.sas.com/content/subconsciousmusings/2017/04/12/machine-learning-algorithm-use/

https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PythonForDataScience.pdf

https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Scikit_Learn_Cheat_Sheet_Python.pdf

  https://storage.googleapis.com/deepmind-media/alphago/AlphaGoNaturePaper.pdf

(AWS machine learning service)
  https://aws.amazon.com/blogs/aws/sagemaker/

(Spark MLlib example)
https://stanford.edu/~rezab/sparkworkshop/slides/xiangrui.pdf

  https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-0002-introduction-to-computational-thinking-and-data-science-fall-2016/

https://biomedical-engineering-online.biomedcentral.com/articles/10.1186/s12938-017-0378-z

  https://iknowfirst.com/rsar-machine-learning-trading-stock-market-and-chaos

  Mastering the game of Go without human knowledge

Java + Cloud + AI

Friday, December 1, 2017

Machine Learning

Tuesday, January 31, 2017

String valueOf() pitfalls