Saturday, July 22, 2017

Java: Splitting a Pipe-delimited String - Fast!

This post shows how you can efficiently split a pipe-delimited string e.g. "foo|bar|baz". There are many ways to do this - I could even write my own - but I will only use those that are available in the JDK (or commonly used libraries) and will measure the performance of each.

Remember that, since the pipe symbol (|) is a special character in regular expressions, it needs to be escaped if necessary.

1. String.split

The most obvious way to split a string on the pipe character is to use Java's String.split:

public static String[] split(String s) {
  return s.split("\\|");
}

2. String.split with Pattern.quote

Instead of escaping the pipe ourselves, we can use Pattern.quote to do it for us. (Note: Pattern.quote("|") returns "\Q|\E".)

public static String[] splitWithPatternQuote(String s) {
  return s.split(Pattern.quote("|"));
}

3. Pattern.split

Create a static Pattern and use it to split the string.

private static final Pattern SPLITTER = Pattern.compile("\\|");

public static String[] splitWithPattern(String s) {
  return SPLITTER.split(s);
}

4. StringUtils.split

Apache Commons provides StringUtils.split, which splits a string on a single character:

import org.apache.commons.lang3.StringUtils;

public static String[] splitWithStringUtils(String s) {
  return StringUtils.split(s, '|');
}

So, which one is fastest?

I ran each method on 1 million pipe-delimited strings of different lengths - RandomStringUtils.randomAlphabetic is great for generating random strings - and the table below shows how long each one took:

MethodTime (ms)
split485
splitWithStringUtils520
splitWithPattern643
splitWithPatternQuote936

An interesting observation is that splitWithPatternQuote is so much slower than split, even though they both call String.split internally! If we delve into the source code for String.split, we can see that there is an optimisation (a "fastpath") if the provided regex has two-chars and the first char is a backslash. This applies to "\\|" but, since Pattern.quote produces \Q|\E, it does not use the fastpath and instead creates a new Pattern object for every split. This also explains why it is slower than splitWithPattern, which re-uses the same Pattern object.

5 comments:

  1. Excellent post, now a day’s huge demand for the certified java professionals in IT industry. Java gives more career opportunity for the fresher’s as well as experienced experts.
    Regards,
    JAVA Training in Chennai|JAVA Course in Chennai

    ReplyDelete
  2. Appreciation for really being thoughtful and also for deciding on certain marvelous guides most people really want to be aware of.
    <a href="http://www.traininginmarathahalli.in/core-java-training-in-bangalore/”> Java Training in Marathahalli </a>|

    ReplyDelete
  3. Thanks for one marvelous posting! I enjoyed reading it; you are a great author. I will make sure to bookmark your blog and may come back someday. I want to encourage that you continue your great posts, have a nice weekend!


    Java Training in Chennai


    Java Training in Bangalore


    Java Training in Bangalore

    ReplyDelete