Friday, December 1, 2017

Machine Learning

Machine Learning Notes
(It is a work in progress.)

Purpose
    Machine Learning is to generalize.

Classic Problem
    Normal Programming: "Hello world"
    Machine Learning:  MNIST

Approach
    Problems --> Tools--->Metrics  (apply to all problems?)
    Data to generalize --> Use different algorithms --> Monitor performance of algorithms and adjust

Key Words
     Classification, Regression, Clustering,
     Gradient descent, Backpropagation, Cost function,
     Cross-entropy

     Training data set
           Train parameter
     Validation data set
           Train Hyperparameter
     Test data set
         
     Weight, Bias, Learning rate
     Parameter
     HyperParameter
     Accuracy
     Sensitivity
     Specificity

    Optimization

    Regularization
        Modification to ML algorithms, intending to reduce generalization error, not training error
        Example: weight decay for linear regression

     Generalize
            To have small gap between training error and test error
     Supervised Learning
            features + labels
     Unsupervised Learning
            features without labels
     Reinforcement Learning
         
   

Math behind ML
      z=wx+b
    σ(z)=1/(1+ez)
    ...


Concepts


Algorithms
    Linear Regression
    Logistic Regression
    Neural Network
    RNN (Recurrent Neural Network)
    CNN (Convolutional Neural Network)

    Decision Tree

    Identification Tree

    Naive Bayes
           Features independent of each other
           Conditional Probability Model
           Highly scalable, only requires small amount of training data
           Linear Performance Time
           Generally outperformed by other algorithms, SVM...
    Support Vector Machines

    Random Forest

Test Methodologies
   Leave one out   LOO
       for small amount of data

   Data split (80/20)
     

Software
    Spark MLLib, Spark ML,  Weka,
    Tensorflow

Use cases
    Linear Regression
          House size---> House price in a community
 
    Naive Bayes
          Document classification: separate legitimate emails from spam emails
          For example, based on key words: cheap, free
       
   

Questions
    When to use which algorithm(s)?

Famous Applications
       Alphago vs Lee Sedol
       https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol

        Netflix movie recommendations

Summary
     No ML algorithm is universally better than any other algorithm.
     Understand data distribution, and pick proper algorithm(s).

References
 
    Machine learning series from Luis Serrano  (best explanations)
    https://www.youtube.com/watch?v=aDW44NPhNw0
    https://www.youtube.com/watch?v=BR9h47Jtqyw&t=24s
    https://www.youtube.com/watch?v=2-Ol7ZB0MmU&t=7s
    https://www.youtube.com/watch?v=IpGxLWOIZy4

    http://www.deeplearningbook.org/

    https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-0002-introduction-to-computational-thinking-and-data-science-fall-2016/

   https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-034-artificial-intelligence-fall-2010/

    http://neuralnetworksanddeeplearning.com/

    https://storage.googleapis.com/deepmind-media/alphago/AlphaGoNaturePaper.pdf

    (AWS machine learning service)
    https://aws.amazon.com/blogs/aws/sagemaker/

    (Spark MLlib example)
    https://stanford.edu/~rezab/sparkworkshop/slides/xiangrui.pdf

    https://biomedical-engineering-online.biomedcentral.com/articles/10.1186/s12938-017-0378-z

    https://iknowfirst.com/rsar-machine-learning-trading-stock-market-and-chaos

 

Tuesday, January 31, 2017

String valueOf() pitfalls

What will the console output of this program?


public class TestStringValueOf {

public static void main(String[] args) {
testStringValueOfChar();
}


      public static void testStringValueOfChar() {
char a = 'a';
String str1 = String.valueOf(a);
String str2 = String.valueOf(a);
System.out.println("char comparison:" + (str1 == str2));


double d = 12.3d;
String str3 = String.valueOf(d);
String str4 = String.valueOf(d);
System.out.println("double comparison:" + (str3 == str4));


boolean b = false;
String str5 = String.valueOf(b);
String str6 = String.valueOf(b);
System.out.println("boolean comparison:" + (str5 == str6));


Object o = null;
String str7 = String.valueOf(o);
String str8 = String.valueOf(o);
System.out.println("Object null comparison:" + (str7 == str8));


Object notNull = new Object();
String str9 = String.valueOf(notNull);
String str10 = String.valueOf(notNull);
System.out.println("Object Not null comparison:" + (str9 == str10));
  }
}

see the end of this article for the output.

Overall, the string comparison should use 'equals' no matter how String objects were created.


-------console output----------

char comparison:false
double comparison:false
boolean comparison:true
Object null comparison:true
Object Not null comparison:false


Monday, July 18, 2016

Spring MVC UTF-8

Key points

web.xml:
     <filter>
    <filter-name>encodingFilter</filter-name>
    <filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
    <init-param>
       <param-name>encoding</param-name>
       <param-value>UTF-8</param-value>
    </init-param>
    <init-param>
       <param-name>forceEncoding</param-name>
       <param-value>true</param-value>
    </init-param>
</filter>
<filter-mapping>
    <filter-name>encodingFilter</filter-name>
    <url-pattern>/*</url-pattern>
</filter-mapping>


Maven pom.xml:
  <properties>
      <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
       ...
  </properties>

JSP:
   <%@ page language="java" pageEncoding="UTF-8"%>
  <%@ page contentType="text/html;charset=UTF-8" %>


Friday, June 17, 2016

Compile xsl files and store in cache to improve XSLT performance


Common code found online to do XSLT transformation. (removed non essential pieces for brevity)

------------------
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer(new StreamSource(new File(xsltPath)));
transformer.transform(new StreamSource(new File(sourceFilePath)), new StreamResult(new File(resultPath)));
----------------

The code works. But if a xslt file is relatively big and  
needs to be used over and over again to transform 
a lot of files, for example, in the batch mode, 
it may not perform well. 


The following shows a way to cache the compiled version of an xsl file, which is a 'Templates' object. This object is thread safe.

Code snippet to cache the 'Templates' object.

static final Map<String, Templates> cacheTemplates = new ConcurrentHashMap<String, Templates>();

       static TransformerFactory transformFactory = null;

       static {
             init();
       }

     private static void init() {
         try {
             transformFactory =TransformerFactory.newInstance();
        }
        catch(Exception e) {
            throw new RuntimeException(e);
        }
    }

     public static void cacheCompiled( String xsl) {
File file = null;
  StreamSource source= null;
                Templates  templates = null;
try {
file = new File( xsl);
source = new StreamSource(file);
                         templates = transformFactory.newTemplates(source); //create this once for a file, save in a cache.
cacheTemplates .put(xsl, templates );
} catch (Exception e) {
throw new RuntimeException(e);
} finally {
}
}

The above 'templates' object is basically a coompiled version of the original xsl file.  If the original file is relatively big, for example, 20KB, it takes more than 2 seconds on my local machine to transform a small file.  Without caching the templates, it takes more than 2 seconds every time.  With caching,  it takes about 0.1 seconds  for every transformation after the first time.



The basic code is like this:

//get the Templates object from cache based on the xsl file name, then get a Transformer object

Transformer transformer = templates.newTransformer();

transformer.transform(new StreamSource(new File(sourceFilePath)),
new StreamResult(new File(resultPath)));

The 'transformer' object mentioned above is not thread safe.

The SAXON parser seems becoming more popular, and the Xalan parser seems fading away.

The home edition of the SAXON parser, which is free, may be good enough for a lot of applications.

Friday, November 15, 2013

First impressions on open source ESBs

Used commercial ESB and BPMs for a couple of years, recently had a chance to evaluate some open source ESBs.

WSO2:  not easy to use, had difficulty even making the sample projects to work. No DataMapper tool, which is a big no-no to my projects.

Mulesoft ESB:  Nice documentation, instructions easy to follow, sample projects can be built and run in a couple of minutes, nice DataMapper tool in the 3.4 version.   Have not had a chance to build a relatively complex application using this.  Not sure whether the community edition is good enough to be used in the Production.


Monday, September 2, 2013

String getBytes could lead to difficult bugs

If you execute the following function,  what do you think should be the size of the 'def' byte array?

The logic is really simple: an input as byte array that have two elements, then create a string out of this with 'UTF-8' encoding, then create another byte array using this string with the same UTF-8 encoding.

public static void testStringUTF8() {
byte[] abc = new byte[2];
abc[0] = 31;
abc[1] = -117;

try {
String stringAbc = new String(abc, "UTF-8");
byte[] def = stringAbc.getBytes("UTF-8");
if (def != null) {
System.out.println("size of output byte array:" + def.length);  //print the array size
}

System.out.println(def[1]);  //print the second element of the output byte array

System.out.println(abc[1]); //print the second element of the input byte array

} catch (Exception e) {
e.printStackTrace();
}
}

---------------

Wednesday, April 11, 2012

How to invoke local EJB session beans in WebLogic

Sometimes you may have a need to invoke a LOCAL EJB session beans in a normal java class, for example, Business Delegate class, you can use ServiceLocator to locate a local EJB session bean proxy by JNDI name. Even though it is relatively easy to do so for a REMOTE EJB session bean by using the value of  'name' or 'mappedName' in the bean class definition, it is a little tricky for LOCAL session beans.

Here is what you need to do.

For exampe:

Here is an interface:

package  com.play;

@Local
public interface PlayFacadeInf {
     public void play(String var);
}


Here is the implementation bean class.

package  com.play;

@Stateless
public class  PlayFacadeImpl implements  PlayFacadeInf {
     public void play(String var) {
          //...do somthing
    }
}


Here is the part of the ejb-jar.xml


display-name>myEJB </display-name>
  <enterprise-beans>
<session>
<ejb-name> PlayFacadeImpl</ejb-name>
<ejb-class>com.play.PlayFacadeImpl</ejb-class>
<ejb-local-ref>
<ejb-ref-name>ejb/PlayFacadeInf</ejb-ref-name>
<ejb-ref-type>Session</ejb-ref-type>
<local>com.play.PlayFacadeInf</local>
</ejb-local-ref>
</session>
   </enterprise-beans>

Here is part of web.xml


<ejb-local-ref>
<ejb-ref-name>ejb/PlayFacadeInf</ejb-ref-name>
<ejb-ref-type>Session</ejb-ref-type>
<local>com.play.PlayFacadeInf</local>
</ejb-local-ref>

Here is part of the ServiceLocator.java



private static InitialContext ctx = null;
static {
try {
ctx = new InitialContext();
}
catch (NamingException e) {
//... throw some exception
}
}

private static InitialContext getInitialContext() throws NamingException{
return ctx;
}


public static  PlayFacadeInf  getPlayFacade() throws NamingException {

PlayFacadeInf     playFacadeInf   = null;

playFacadeInf     = ( PlayFacadeInf )            
                       ServiceLocator.getInitialContext().lookup("java:/comp/env/ejb/PlayFacadeInf");

return  playFacadeInf;
}

Then any normal java class can use the ServiceLocator to get hold of the local ejb session bean proxy.