Friday, December 30, 2011

Stanford's Online Courses: ml-class, ai-class and db-class

About three months ago, I signed up to Stanford's first-of-its-kind, free, online courses on Artifical Intelligence, Machine Learning and Databases. I have now successfully completed all three courses!
  • The Artifical Intelligence (ai-class) class, led by Sebastian Thrun and Peter Norvig, covered probability, Bayes networks, machine learning, planning, Markov decision processes (MDPs), particle filters, game theory, computer vision, robotics and natural language processing. There were in-class quizzes, homework exercises and two exams.
  • The Machine Learning (ml-class) class, led by Professor Andrew Ng, covered supervised learning (linear regression, logistic regression, neural networks, support vector machines), unsupervised learning (k-Means clustering), anomaly detection and recommender systems. There were in-class quizzes, review questions and programming exercises.
  • The Introduction to Databases(db-class) class, led by Professor Jennifer Widom, covered relational databases, relational algebra, XML, XPaths, SQL, UML, constraints, triggers, transactions, authorization, recursion and NoSQL systems. There were in-class quizzes, assignments, including practical exercises (such as creating triggers, writing SQL or XPaths that would run on a set of data) and two exams.
Even though I had studied these topics a long time ago at university, I found it really useful to refresh my memory. They took up quite a bit of time (about 2 hours a week for each class), but it was definitely worth it.

I am now looking forward to starting some new classes in 2012!

Tuesday, December 27, 2011

Args4j vs JCommander for Parsing Command Line Parameters

In the past, I've always used Apache Commons CLI for parsing command line options passed to programs and have found it quite tedious because of all the boiler plate code involved. Just take a look at their Ant Example and you will see how much code is required to create each option.

As an alternative, there are two annotation-based command line parsing frameworks which I have been evaluating recently:

I'm going to use the Ant example to illustrate how to parse command line options using these two libraries. I'm only going to use a few options in my example because they are all very similar. Here is an extract of the help output for Ant, which I will be aiming to replicate:
ant [options] [target [target2 [target3] ...]]
Options:
  -help, -h              print this message
  -lib <path>            specifies a path to search for jars and classes
  -buildfile <file>      use given buildfile
    -file    <file>              ''
    -f       <file>              ''
  -D<property>=<value>   use value for given property
  -nice  number          A niceness value for the main thread:
Args4j (v2.0.12)
The class below demonstrates how to parse command line options for Ant using Args4j. The main method parses some sample arguments and also prints out the usage of the command.
import static org.junit.Assert.*;
import java.io.File;
import java.util.*;
import org.kohsuke.args4j.*;

/**
 * Example of using Args4j for parsing
 * Ant command line options
 */
public class AntOptsArgs4j {

  @Argument(metaVar = "[target [target2 [target3] ...]]", usage = "targets")
  private List<String> targets = new ArrayList<String>();

  @Option(name = "-h", aliases = "-help", usage = "print this message")
  private boolean help = false;

  @Option(name = "-lib", metaVar = "<path>",
          usage = "specifies a path to search for jars and classes")
  private String lib;

  @Option(name = "-f", aliases = { "-file", "-buildfile" }, metaVar = "<file>",
          usage = "use given buildfile")
  private File buildFile;

  @Option(name = "-nice", metaVar = "number",
          usage = "A niceness value for the main thread:\n"
          + "1 (lowest) to 10 (highest); 5 is the default")
  private int nice = 5;

  private Map<String, String> properties = new HashMap<String, String>();
  @Option(name = "-D", metaVar = "<property>=<value>",
          usage = "use value for given property")
  private void setProperty(final String property) throws CmdLineException {
    String[] arr = property.split("=");
    if(arr.length != 2) {
        throw new CmdLineException("Properties must be specified in the form:"+
                                   "<property>=<value>");
    }
    properties.put(arr[0], arr[1]);
  }

  public static void main(String[] args) throws CmdLineException {
    final String[] argv = { "-D", "key=value", "-f", "build.xml",
                            "-D", "key2=value2", "clean", "install" };
    final AntOptsArgs4j options = new AntOptsArgs4j();
    final CmdLineParser parser = new CmdLineParser(options);
    parser.parseArgument(argv);

    // print usage
    parser.setUsageWidth(Integer.MAX_VALUE);
    parser.printUsage(System.err);

    // check the options have been set correctly
    assertEquals("build.xml", options.buildFile.getName());
    assertEquals(2, options.targets.size());
    assertEquals(2, options.properties.size());
  }
}
Running this program prints:
 [target [target2 [target3] ...]] : targets
 -D <property>=<value>            : use value for given property
 -f (-file, -buildfile) <file>    : use given buildfile
 -h (-help)                       : print this message
 -lib <path>                      : specifies a path to search for jars and classes
 -nice number                     : A niceness value for the main thread:
                                    1 (lowest) to 10 (highest); 5 is the default
JCommander (v1.13)
Similarly, here is a class which demonstrates how to parse command line options for Ant using JCommander.
import static org.junit.Assert.*;
import java.io.File;
import java.util.*;
import com.beust.jcommander.*;

/**
 * Example of using JCommander for parsing
 * Ant command line options
 */
public class AntOptsJCmdr {

  @Parameter(description = "targets")
  private List<String> targets = new ArrayList<String>();

  @Parameter(names = { "-help", "-h" }, description = "print this message")
  private boolean help = false;

  @Parameter(names = { "-lib" },
             description = "specifies a path to search for jars and classes")
  private String lib;

  @Parameter(names = { "-buildfile", "-file", "-f" },
             description = "use given buildfile")
  private File buildFile;

  @Parameter(names = "-nice", description = "A niceness value for the main thread:\n"
        + "1 (lowest) to 10 (highest); 5 is the default")
  private int nice = 5;

  @Parameter(names = { "-D" }, description = "use value for given property")
  private List<String> properties = new ArrayList<String>();

  public static void main(String[] args) {
    final String[] argv = { "-D", "key=value", "-f", "build.xml",
                            "-D", "key2=value2", "clean", "install" };
    final AntOptsJCmdr options = new AntOptsJCmdr();
    final JCommander jcmdr = new JCommander(options, argv);

    // print usage
    jcmdr.setProgramName("ant");
    jcmdr.usage();

    // check the options have been set correctly
    assertEquals("build.xml", options.buildFile.getName());
    assertEquals(2, options.targets.size());
    assertEquals(2, options.properties.size());
  }
}
Running this program prints:
Usage: ant [options]
 targets
  Options:
    -D                      use value for given property
                            Default: [key=value, key2=value2]
    -buildfile, -file, -f   use given buildfile
    -help, -h               print this message
                            Default: false
    -lib                    specifies a path to search for jars and classes
    -nice                   A niceness value for the main thread:
1 (lowest) to
                            10 (highest); 5 is the default
                            Default: 5
Args4j vs JCommander
As you can see from the implementations above, both frameworks are very similar. There are a few differences though:
  1. JCommander does not have an equivalent to Arg4j's metaVar which allows you to display the value that an option might take. For example, if you have an option called "-f" which takes a file, you can set metaVar="<file>" and Args4j will display -f <file> when it prints the usage. This is not possible in JCommander, so it is difficult to see which options take values and which ones don't.

  2. JCommander's @Parameter option can only be applied to fields, not methods. This makes it slightly restrictive. In Args4j, you can add the annotation on a "setter" method, which allows you to tweak the value before it is set. In JCommander, you would have to create a custom converter.

  3. In the example above, JCommander was unable to place -D property=value options into a map. It was able to save them into a list and then you would have to do some post-processing to convert the elements in the list to key-value pairs in a map. On the other hand, Args4j was able to put the properties straight into the map by applying the annotation on a setter method.

  4. JCommander's usage output is not as pretty as Args4j's. In particular, the description of the "nice" option is not aligned correctly.

Based purely on this example, the winner is Args4j. However, note that there are other features present in JCommander which are not available in Args4j, such as parameter validation and password type parameters. Please read the documentation to find out which one is better suited to your needs. But one thing is quite clear: annotation based command line parsing is the way forward!

Saturday, December 24, 2011

Guava Cache

Google's Guava Cache is a lightweight, threadsafe Java cache that provides some nice features such as:
  • eviction of least recently used entries when a maximum size is breached
  • eviction of entries based on time since last access or last write
  • notification of evicted entries
  • performance statistics e.g. hit and miss counts
In order to create the Cache, you need to use a CacheBuilder. This allows you to specify the eviction policy and other features such as concurrency level, soft or weak values etc. You also need to specify a CacheLoader which will be invoked automatically by the cache if a key does not exist and is used to populate it.

The following code demonstrates how to create a cache:

// Create the cache. Only allow a max of 10 entries.
// Old entries will be evicted.
final Cache<String, String> cache = CacheBuilder.newBuilder()
    .maximumSize(10)
    .removalListener(new RemovalListener<String, String>() {
        @Override
        public void onRemoval(RemovalNotification<String, String> n) {
            System.out.println("REMOVED: " + n);
        }
    })
    .build(new CacheLoader<String, String>() {
        @Override
        public String load(String key) throws Exception {
            System.out.println("LOADING: " + key);
            return key + "-VALUE";
        }
    });

// Get values from the cache.
// If a key does not exist, it will be loaded.
for (int i = 0; i < 10; i++) {
  System.out.println(cache.get("Key" + i));
}
for (int i = 9; i >= 0; i--) {
  System.out.println(cache.get("Key" + i));
}

//Print out the hit counts.
System.out.println(cache.stats());
The output of this program is:
LOADING: Key0
LOADING: Key1
LOADING: Key2
LOADING: Key3
LOADING: Key4
LOADING: Key5
LOADING: Key6
REMOVED: Key0=Key0-VALUE
LOADING: Key7
REMOVED: Key3=Key3-VALUE
LOADING: Key8
LOADING: Key9
LOADING: Key3
REMOVED: Key7=Key7-VALUE
LOADING: Key0
REMOVED: Key6=Key6-VALUE
CacheStats{hitCount=8, missCount=12, loadSuccessCount=12, loadExceptionCount=0, 
totalLoadTime=563806, evictionCount=4}
It is important to note that entries were evicted BEFORE the maximum size of 10 was reached. In this case, an entry was removed when the cache had 7 entries in it.

The cache stats show that out of 20 calls to the cache, 12 missed and had to be loaded. However, 8 were successfully retrieved from the cache. 4 entries were evicted.

Saturday, December 03, 2011

Using XStream to Map a Single Element

Let's say you have the following XML which has a single element containing an attribute and some text:
<error code="99">This is an error message</error>
and you would like to convert it, using XStream, into an Error object:
public class Error {
    String message;
    int code;

    public String getMessage() {
        return message;
    }

    public int getCode() {
        return code;
    }
}
It took me a while to figure this out. It was easy getting the code attribute set in the Error object, but it just wasn't picking up the message.

Eventually, I found the ToAttributedValueConverter class which "supports the definition of one field member that will be written as value and all other field members are written as attributes."

The following code shows how the ToAttributedValueConverter is used. You specify which instance variable maps to the value of the XML element (in this case message). All other instance variables automatically map to attributes, so you don't need to explicitly annotate them with XStreamAsAttribute.

@XStreamAlias("error")
@XStreamConverter(value=ToAttributedValueConverter.class, strings={"message"})
public class Error {

  String message;

  @XStreamAlias("code")
  int code;

  public String getMessage() {
      return message;
  }

  public int getCode() {
      return code;
  }

  public static void main(String[] args) {
      XStream xStream = new XStream();
      xStream.processAnnotations(Error.class);

      String xmlResponse="<error code=\"99\">This is an error message</error>";

      Error error = (Error)xStream.fromXML(xmlResponse);
      System.out.println(error.getCode());
      System.out.println(error.getMessage());
  }
}