This is the first of a series of posts that shows how certain language features and compiler optimisations are implemented on the JVM; on bytecode level and the Assembly emitted by the JIT. There are many blogposts/articles/presentation telling that these optimisations are done, but don't explain what is happening under the hood. So if you don't like to get your hands dirty, please move along.
I have been inspired to start blogging again by people like Gil Tene, Martin Thompson, Nits Wakart,
Peter Lawrey, Cliff Click and many others. They provide so much insights of low level performance issues to the community, that I want to contribute as well.
Java assert
Java asserts have been added to Java 1.4 and are an easy way to check preconditions, postconditions
and invariants. A nice feature of asserts is that they can easily be enabled/disabled either globally
using `-da` but can also be disabled on package or class level. For more information see the following
page.
Lets have a look at the following example:
public class Assert {
public static void main(String[] args) {
long l = 0;
for (int k = 0; k < 100_000; k++) {
l += twice(k);
}
System.out.println(l);
}
public static int twice(int a) {
assert a != 0 : "a can't be 0";
return a * 2;
}
}
We have a loop that does a simple calculation. We also store the result and print it so that the JIT doesn't apply dead code elimination. In the 'twice' method there is a simple assert that checks if 'a' is not 0. The assert condition isn't very important, its primary function is to give easy to understand Assembly.
If we compile the code and output the content of the class it using 'javap -c Assert.class', we get the following bytecode:
public class com.asserts.Assert {
static final boolean $assertionsDisabled;
public com.asserts.Assert();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."":()V
4: return
public static void main(java.lang.String[]);
Code:
0: lconst_0
1: lstore_1
2: iconst_0
3: istore_3
4: iload_3
5: ldc #2 // int 100000
7: if_icmpge 24
10: lload_1
11: iload_3
12: invokestatic #3 // Method twice:(I)I
15: i2l
16: ladd
17: lstore_1
18: iinc 3, 1
21: goto 4
24: getstatic #4 // Field java/lang/System.out:Ljava/io/PrintStream;
27: lload_1
28: invokevirtual #5 // Method java/io/PrintStream.println:(J)V
31: return
public static int twice(int);
Code:
0: getstatic #6 // Field $assertionsDisabled:Z
3: ifne 20
6: iload_0
7: ifeq 20
10: new #7 // class java/lang/AssertionError
13: dup
14: ldc #8 // String a can't be 0
16: invokespecial #9 // Method java/lang/AssertionError."":(Ljava/lang/Object;)V
19: athrow
20: iload_0
21: iconst_2
22: imul
23: ireturn
static {};
Code:
0: ldc #10 // class com/asserts/Assert
2: invokevirtual #11 // Method java/lang/Class.desiredAssertionStatus:()Z
5: ifne 12
8: iconst_1
9: goto 13
12: iconst_0
13: putstatic #6 // Field $assertionsDisabled:Z
16: return
}
It is interesting to see how the assert has been implemented. A new synthetic static final field `$assertionsDisabled` has been added to the class which gets initialized when the class is loaded. See the 'static {}' for more detail.
The first instruction of the `twice` method loads this static field and then there is a check if this field is set. If `$assertionsDisabledis` false, it continues with the assert statement on #6. If `$assertionsDisabled` is true (so assert disabled), then there is a jump to the actual logic on #20.
If we convert the 'twice' bytecode back to Java, we would get:
public class Assert {
static final boolean $assertionsDisabled;
public static int twice(int a) {
if(!$assertionsDisabled){
if(a == 0) {
throw new AssertError("a can't be 0")
}
}
return a * 2;
}
}
So the assert doesn't require any special new bytecode instructions. It is translated to a few simple checks.
Assert disabled
As a performance engineer, I'm always worried if we need to pay the price for something that isn't used. In this case, on the bytecode level, there is one extra branch to deal with when assertions are disabled and 2 branches when assertions are enabled. Till so far I'm not terribly excited since branching and high performance code, is not really a good mixture.
So lets see if the JIT can optimize it if assertions are disabled. For this I'm using the following commands:
-XX:+UnlockDiagnosticVMOptions
-XX:PrintAssemblyOptions=intel
-XX:-TieredCompilation
-XX:-Inline
-XX:CompileCommand=print,*Assert.twice
The TieredCompilation is disabled so we only get the C2 Assembly output. Inlining is disabled so that the 'twice' method doesn't get inlined in the main loop.
This provides us the following output:
Compiled method (c2) 190 8 com.asserts.Assert::twice (24 bytes)
total in heap [0x0000000101d7f310,0x0000000101d7f4e0] = 464
relocation [0x0000000101d7f430,0x0000000101d7f438] = 8
main code [0x0000000101d7f440,0x0000000101d7f460] = 32
stub code [0x0000000101d7f460,0x0000000101d7f478] = 24
oops [0x0000000101d7f478,0x0000000101d7f480] = 8
metadata [0x0000000101d7f480,0x0000000101d7f488] = 8
scopes data [0x0000000101d7f488,0x0000000101d7f498] = 16
scopes pcs [0x0000000101d7f498,0x0000000101d7f4d8] = 64
dependencies [0x0000000101d7f4d8,0x0000000101d7f4e0] = 8
Loaded disassembler from /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/lib/hsdis-amd64.dylib
Decoding compiled method 0x0000000101d7f310:
Code:
[Disassembling for mach='i386:x86-64']
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} {0x00000001ca2ba4f8} 'twice' '(I)I' in 'com/asserts/Assert'
# parm0: rsi = int
# [sp+0x20] (sp of caller)
0x0000000101d7f440: sub rsp,0x18
0x0000000101d7f447: mov QWORD PTR [rsp+0x10],rbp ;*synchronization entry
; - com.asserts.Assert::twice@-1 (line 14)
0x0000000101d7f44c: mov eax,esi
0x0000000101d7f44e: shl eax,1 ;*imul
; - com.asserts.Assert::twice@22 (line 15)
0x0000000101d7f450: add rsp,0x10
0x0000000101d7f454: pop rbp
0x0000000101d7f455: test DWORD PTR [rip+0xfffffffffe87fba5],eax # 0x00000001005ff000
; {poll_return}
0x0000000101d7f45b: ret
0x0000000101d7f45c: hlt
0x0000000101d7f45d: hlt
0x0000000101d7f45e: hlt
0x0000000101d7f45f: hlt
[Exception Handler]
[Stub Code]
0x0000000101d7f460: jmp 0x0000000101d70f60 ; {no_reloc}
[Deopt Handler Code]
0x0000000101d7f465: call 0x0000000101d7f46a
0x0000000101d7f46a: sub QWORD PTR [rsp],0x5
0x0000000101d7f46f: jmp 0x0000000101d4bd00 ; {runtime_call}
0x0000000101d7f474: hlt
0x0000000101d7f475: hlt
0x0000000101d7f476: hlt
0x0000000101d7f477: hlt
OopMapSet contains 0 OopMaps
There is quite a lot of output. If we only focus on the logic of the method and strip all ceremony,
the following instructions remain:
0x0000000101d7f44c: mov eax,esi
0x0000000101d7f44e: shl eax,1 ;*imul
; - com.asserts.Assert::twice@22 (line 15)
The 'a' argument (stored in register esi) is copied into the eax register. Then we multiply the eax register by shifting the bits one to the left and store the result back into eax register. The eax register will be used to access the return value.
Replacing the multiply by a shift left is a peephole optimization since bitshifting is faster than doing a multiply.
The conclusion we can make is that if assert is disabled, the JIT completely removes the '$assertionsDisabled' check. That is good news; I hate to pay the price for something not used.
Assert enabled
What if assertions are enabled? For this we run with the following settings
-XX:+UnlockDiagnosticVMOptions
-XX:PrintAssemblyOptions=intel
-XX:-TieredCompilation
-XX:-Inline
-XX:-BackgroundCompilation
-XX:CompileCommand=print,*Assert.twice
-ea
And we get the following Assembly
Compiled method (c2) 168 8 com.asserts.Assert::twice (24 bytes)
total in heap [0x0000000109e7c310,0x0000000109e7c548] = 568
relocation [0x0000000109e7c430,0x0000000109e7c440] = 16
main code [0x0000000109e7c440,0x0000000109e7c480] = 64
stub code [0x0000000109e7c480,0x0000000109e7c498] = 24
oops [0x0000000109e7c498,0x0000000109e7c4a0] = 8
metadata [0x0000000109e7c4a0,0x0000000109e7c4a8] = 8
scopes data [0x0000000109e7c4a8,0x0000000109e7c4d0] = 40
scopes pcs [0x0000000109e7c4d0,0x0000000109e7c540] = 112
dependencies [0x0000000109e7c540,0x0000000109e7c548] = 8
Loaded disassembler from /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/lib/hsdis-amd64.dylib
Decoding compiled method 0x0000000109e7c310:
Code:
[Disassembling for mach='i386:x86-64']
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} {0x000000011217a4f8} 'twice' '(I)I' in 'com/asserts/Assert'
# parm0: rsi = int
# [sp+0x20] (sp of caller)
0x0000000109e7c440: mov DWORD PTR [rsp-0x14000],eax
0x0000000109e7c447: push rbp
0x0000000109e7c448: sub rsp,0x10 ;*synchronization entry
; - com.asserts.Assert::twice@-1 (line 14)
0x0000000109e7c44c: test esi,esi
0x0000000109e7c44e: je 0x0000000109e7c460 ;*ifne
; - com.asserts.Assert::twice@7 (line 14)
0x0000000109e7c450: mov eax,esi
0x0000000109e7c452: shl eax,1 ;*imul
; - com.asserts.Assert::twice@22 (line 15)
0x0000000109e7c454: add rsp,0x10
0x0000000109e7c458: pop rbp
0x0000000109e7c459: test DWORD PTR [rip+0xffffffffff6ecba1],eax # 0x0000000109569000
; {poll_return}
0x0000000109e7c45f: ret
0x0000000109e7c460: mov esi,0x7
0x0000000109e7c465: xchg ax,ax
0x0000000109e7c467: call 0x0000000109e47120 ; OopMap{off=44}
;*new ; - com.asserts.Assert::twice@10 (line 14)
; {runtime_call}
0x0000000109e7c46c: call 0x000000010898fe44 ;*new
; - com.asserts.Assert::twice@10 (line 14)
; {runtime_call}
0x0000000109e7c471: hlt
0x0000000109e7c472: hlt
0x0000000109e7c473: hlt
0x0000000109e7c474: hlt
0x0000000109e7c475: hlt
0x0000000109e7c476: hlt
0x0000000109e7c477: hlt
0x0000000109e7c478: hlt
0x0000000109e7c479: hlt
0x0000000109e7c47a: hlt
0x0000000109e7c47b: hlt
0x0000000109e7c47c: hlt
0x0000000109e7c47d: hlt
0x0000000109e7c47e: hlt
0x0000000109e7c47f: hlt
[Exception Handler]
[Stub Code]
0x0000000109e7c480: jmp 0x0000000109e6bf60 ; {no_reloc}
[Deopt Handler Code]
0x0000000109e7c485: call 0x0000000109e7c48a
0x0000000109e7c48a: sub QWORD PTR [rsp],0x5
0x0000000109e7c48f: jmp 0x0000000109e46d00 ; {runtime_call}
0x0000000109e7c494: hlt
0x0000000109e7c495: hlt
0x0000000109e7c496: hlt
0x0000000109e7c497: hlt
OopMapSet contains 1 OopMaps
#0
OopMap{off=44}
Lets remove a lot of the clutter:
0x0000000109e7c44c: test esi,esi
0x0000000109e7c44e: je 0x0000000109e7c460 ;*ifne
; - com.asserts.Assert::twice@7 (line 14)
0x0000000109e7c450: mov eax,esi
0x0000000109e7c452: shl eax,1 ;*imul
; - com.asserts.Assert::twice@22 (line 15)
0x0000000109e7c454: add rsp,0x10
0x0000000109e7c458: pop rbp
0x0000000109e7c459: test DWORD PTR [rip+0xffffffffff6ecba1],eax # 0x0000000109569000
; {poll_return}
0x0000000109e7c45f: ret
0x0000000109e7c460: mov esi,0x7
0x0000000109e7c465: xchg ax,ax
0x0000000109e7c467: call 0x0000000109e47120 ; OopMap{off=44}
;*new ; - com.asserts.Assert::twice@10 (line 14)
; {runtime_call}
0x0000000109e7c46c: call 0x000000010898fe44 ;*new
; - com.asserts.Assert::twice@10 (line 14)
; {runtime_call}
This one is a bit more complicated. Lets start at `0x0000000109e7c44c` where a 'test esi,esi' is done. 'test esi,esi' checks if 'a' is 0 and updates the 'ZF' (Zero Flag) in the flags register. If ZF is 0, then we continue with the regular logic; we move 'a' in the 'eax' register and multiply it by 2 and then the procedure exit ceremony is executed. This is no different then when assert is disabled.
If ZF is 1 (so a==0), then we jump to `0x0000000109e7c460`. We are totally bypassing the regular logic and even fail to execute the exit ceremony for procedure. Everything we have done so far is frozen and then we shoot into space. The mechanism we are seeing is called the uncommon trap and required for speculative optimizations. In this case, we have never called `twice` with 0, so for the JIT there was no reason to emit the 'throw new AssertError' code. As soon as I figured out the handling of this uncommon trap, I'll create a blogpost about it.
If we would translate this back to Java, we would get something like:
public class Assert {
public static int twice(int a) {
if(a == 0) {
uncommonTrap();
}
return a * 2;
}
}
If 'twice' would be called with 0, the uncommon trap gets executed, the code deoptimized and the interpreter will continue execution with the 'throw new AssertError' in place.
The conclusion we can make is: if assert is enabled, then the '$assertionsDisabled' check is removed.
Size matters
Till so far we have seen that the JIT completely removed all assert logic if assert is disabled. So it looks like assert is completely free if disabled. Unfortunately that isn't the case. The extra instructions inserted in the bytecode, can prevent inlining because inline limits are determined on the size of the bytecode.
Lets proof this by inflating the assert logic with some bogus additional conditions:
public class AssertFat {
public static void main(String[] args) {
long l = 0;
for (int k = 1; k < 100_000; k++) {
l += twice(k);
}
System.out.println(l);
}
public static int twice(int a) {
assert a != 0 && a != -1 && a != -2 && a != -3 : "bad a";
return a * 2;
}
}
We have inflated the size of the assert artificially by adding some additional bogus checks.
If we would set the FreqInlineSize to 50, the method will not get inlined even when asserts are disabled:
-XX:+UnlockDiagnosticVMOptions
-XX:+PrintInlining
-XX:FreqInlineSize=50
-da
In the logging we'll see the following in the inlining output:
@ 12 com.asserts.AssertFat::twice (53 bytes) callee is too large
This means that the `twice` method, 53 bytes, was too fat to get inlined since the maximum size for a frequently called method was set to 50 bytes.
To proof that the twice method would have been inlined without assert, lets remove the assert and run again:
@ 12 com.asserts.AssertFat::twice (4 bytes)
Now the method does get inlined.
This means that even though the JIT is able to completely remove the assert code if assert is disabled, it can still prevent inlining. Therefor the assert feature isn't completely free. It doesn't mean that you should remove assert from your code; but it is good to keep in the back of your mind.