Rational Java: jmh

Recently I was asked this question - Is it bad for performance to use the + operator to concatenate Strings in Java?

This got me thinking about the different ways in Java to concatenate Strings and how they would all perform against each other. These are the methods I'm going to investigate:

Using the + operator
Using a StringBuilder
Using a StringBuffer
Using String.concat()
Using String.join (new in Java8)

I also experimented with String.format() but that is so hideously slow that I will leave it out of this post for now.

Before we go any further we should separate two use cases:

Concatenating two Strings together as a single call, for example in a logging message. Because this is only one call you would have thought that performance is hardly an issue but the results are still interesting and shed light on the subject.
Concatenating two Strings in a loop. Here performance is much more of an issue especially if your loops are large.

My initial thoughts and questions were as follows:

The + operator is implemented with StringBuilder, so at least in the case of concatenating two Strings it should produce similar results to StringBuilder. What exactly is going on under the covers?
StringBuilder should be the most efficient method, after all the class was designed for the very purpose of concatenating Strings and supersedes StringBuffer. But what is the overhead of creating the StringBuilder when compared with String.concat()?
StringBuffer was the original class for concatenating Strings - unfortunately its methods are synchronized. There really is no need for the synchronization and it was subsequently replaced by StringBuilder which is not synchronized. The question is, does the JIT optimise away the synchronisation?
String.concat() ought to work well for 2 strings but does it work well in a loop?
String.join() has more functionality that StringBuilder, how does it affect performance if we instruct it to join Strings using an empty delimiter?

The first question I wanted to get out of the way was how the + operator works. I'd always understood that it used a StringBuilder under the covers but to prove this we need to examine the byte code.

The easiest way to look at byte code these days is with JITWatch which is a really excellent tool created to understand how your code is compiled by the JIT. It has a great view where you can view your source code side by side with byte code (also machine code if you want to go to that level).

Here's the byte code for a really simple method plus2() and we can see that indeed on line 6 a StringBuilder is created and appends the variables a (line 14) and b (line 18).

I thought it would be interesting to compare this against a handcrafted use of the StringBuffer so I create another method build2() with results below.

The byte code generated here is not quite as compact as the plus() method. The StringBuilder is stored into the variable cache (line 13) rather than just left on the stack. I'm not sure why this should be but the JIT might be able to do something with this, we'll have to see how the timings look. In any case it would be very surprising if the results of concatenating 2 strings with the plus operator and and the StringBuilder were significantly different.

I wrote a small JMH test to determine how the different methods performed. Let's first look at the two Strings test. See code below:

The results look like this:

Benchmark	Score	Score Error (99.9%)	Unit
testPlus	15750720.32	957703.6388	ops/s
testStringBuffer	14545063.2	812623.9396	ops/s
testStringBuilder	15671930.21	436265.5796	ops/s
testStringConcat	24124041.47	2498000.326	ops/s
testStringJoiner	10749530.45	388130.9845	ops/s

The clear winner here is String.concat(). Not really surprising as it doesn't have to pay the performance penalty of creating a StringBuilder / StringBuffer for each call. It does though, have to create a new String each time (which will be significant later) but for the very simple case of joining two Stings it is faster.

Another point is that as we expected plus and StringBuilder are equivalent despite the extra byte code produced. StringBuffer is only marginally slower than StringBuilder which is interesting and shows that the JIT must be doing some magic to optimise away the synchronisation.

The next test creates an array of 100 Strings with 10 characters each. The benchmark compares how long it takes for the different methods to concatenate the 100 Strings together. See code below:

The results look quite different this time:

Benchmark	Score	Score Error (99.9%)	Unit
testPlus	82297.2646	1496.838588	ops/s
testStringBuffer	501613.3375	14461.60235	ops/s
testStringBuilder	507697.9058	9510.921128	ops/s
testStringConcat	403378.159	17458.6318	ops/s
testStringJoiner	381805.4569	6572.704663	ops/s

Here the plus method really suffers. The overhead of creating a StringBuilder every time you go round the loop is crippling. You can see this clearly in the byte code:

You can see that a new StringBuilder is created (line 30) every time the loop is executed. It is arguable that the JIT ought to spot this and be able to optimise, but it doesn't and using + becomes very slow.

Again StringBuilder and StringBuffer perform exactly the same but this time they are both faster than String.concat(). The price that String.concat() pays for creating a new String on each iteration of the loop eventually mounts up and a StringBuilder becomes more efficient.

String.join() does pretty well given all the extra functionality you can add to this method but, as expected, for pure concatenation it is not the best option.

Summary

If you are concatenating Strings in a single line of code I would use the + operator as it is the most readable and performance really doesn't matter that much for a single call. Also beware of String.concat() as you will almost certainly need to carry out a null check which is not necessary with the other methods.

When you are concatenating Strings in a loop you should use a StringBuilder. You could use a StringBuffer but I wouldn't necessarily trust the JIT in all circumstances to optimise away the synchronization as efficiently as it would in a benchmark.

All my results were achieved using JMH and they come with the usual health warning.

Health Warning!
This post describes how to setup and run a simple JMH benchmark. Micro benchmarks are notoriously difficult to get right and even when you do get them right (by using tools such as JMH) they can still be misleading. Just because your code runs in a certain way in an extremely isolated artificial situation does not mean it will run in the same way inside your production code. To name but a few issues, in a real program the CPU caches will be subject to pressures from other parts of your code, any object creation will have a downstream effect on GC and the JIT may have inlined and compiled code from other parts of your code that conflict with the code you have benchmarked. Nevertheless micro benchmarks do have their place and if you are going to do them you might as well do them properly with JMH.

In a recent post I was asked to execute my tests as a JMH performance benchmark.

JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks written in Java and other languages targeting the JVM. See full documentation here.

Amongst other things JMH is great, because it takes care of warm up iterations, forking JVM processes so that benchmarks don't interfere with each other, collating results and presenting then in a uniform manner. And there's much much more.

I'd heard a lot about JMH and seen many JMH results but never actually run one myself. It was surprisingly easy! This is how I did it.

There are two way to run a benchmark:

Add the JMH maven dependencies to your pom file and then add a main method to your code using the Runner object. This is useful if you want to run in your IDE. There is some discussion about whether running inside your IDE makes a difference to the results see here for more details. The reported difference is 2.2%.
The recommended way is to generate a pom file and use that to create a jar. The mvn install uses the shade plugin to create a jar file so that you don't have create a main method.

Method 1 - For running in your IDE

Add these dependencies to your Maven pom.xml file:

<dependency>
            <groupId>org.openjdk.jmh</groupId>
            <artifactId>jmh-core</artifactId>
            <version>1.5.1</version>
        </dependency>
        <dependency>
            <groupId>org.openjdk.jmh</groupId>
            <artifactId>jmh-generator-annprocess</artifactId>
            <version>1.5.1</version>
</dependency>

Then decide which methods you want benchmarked and add the annotation @Benchmark to them. If you need any initialisation code add it in a method which should be marked @Setup.

The easiest way to run the benchmark is by adding by adding this implementation into your main method. (See here for other ways to run your tests).

public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(MyBenchmark.class.getSimpleName())
                .forks(1)
                .build();

        new Runner(opt).run();
}

Alternatively, even more simply use this code:

public static void main(String[] args) throws Exception {
     Main.main(args);
}

Then just run as you would any normal program and you will get all the JMH goodness!

Method 2 - The recommended way

Generate a pom file using this mvn command.

$ mvn archetype:generate \
          -DinteractiveMode=false \
          -DarchetypeGroupId=org.openjdk.jmh \
          -DarchetypeArtifactId=jmh-java-benchmark-archetype \
          -DgroupId=org.sample \
          -DartifactId=test \
          -Dversion=1.0

This will create a project called test with an empty benchmark in it called MyBenchmark.
To build the project just use mvn clean install. This will build a jar called benchmark.jar. It is the benchmark.jar that should be run to run the benchmark not any other jars produced along the way that will be in your target folder.

To run use the command java -jar benchmark.jar - that's it.

As an example to see the format of a JMH benchmark, this is what my results looked like:

Benchmark Mode Cnt Score Error Units

CompTestBenchmark.bmCustomComparator thrpt 20 2598.617 ± 67.634 ops/s

CompTestBenchmark.bmJDKComparator thrpt 20 751.110 ± 14.835 ops/s

CompTestBenchmark.bmNoVTLComparator thrpt 20 1348.750 ± 30.382 ops/s

CompTestBenchmark.bmNoVTLOrAutoBoxingComparator thrpt 20 2202.995 ± 43.673 ops/s

There are an enormous number of bells and whistles to fine tune your benchmarks which I'm not going into here but hopefully this will get you up and running.

For a full code listing of my test see here.

To see all the options you have available use the command:

java - jar benchmark.jar -h

One of the most useful options is to run with the benchmarks with a profiler. To list the available profilers on your system (for example perf is only available on Unix) use the command.

java -jar benchmark.jar -lprof

To run the test with a simple stack profiler (available on all systems) use

java -jar benchmark.jar -prof stack

There are loads of options including annotations to switch off inlining, to vary iterations and many more. I encourage you to have a look at them - here's some documentation to start with.

Rational Java

Pages

Tuesday, 17 February 2015

The Optimum Method to Concatenate Strings in Java

Monday, 2 February 2015

JMH: How to setup and run a JMH benchmark