zaterdag 21 januari 2017

Assert & Bytecode & Assembly

This is the first of a series of posts that shows how certain language features and compiler optimisations are implemented on the JVM; on bytecode level and the Assembly emitted by the JIT. There are many blogposts/articles/presentation telling that these optimisations are done, but don't explain what is happening under the hood. So if you don't like to get your hands dirty, please move along.

I have been inspired to start blogging again by people like Gil Tene, Martin Thompson, Nits Wakart, Peter Lawrey, Cliff Click and many others. They provide so much insights of low level performance issues to the community, that I want to contribute as well.

Java assert

Java asserts have been added to Java 1.4 and are an easy way to check preconditions, postconditions and invariants. A nice feature of asserts is that they can easily be enabled/disabled either globally using `-da` but can also be disabled on package or class level. For more information see the following page.

Lets have a look at the following example:

public class Assert {

    public static void main(String[] args) {
        long l = 0;
        for (int k = 0; k < 100_000; k++) {
            l += twice(k);
        }
        System.out.println(l);
    }

    public static int twice(int a) {
        assert a != 0 : "a can't be 0";
        return a * 2;
    }
}
We have a loop that does a simple calculation. We also store the result and print it so that the JIT doesn't apply dead code elimination. In the 'twice' method there is a simple assert that checks if 'a' is not 0. The assert condition isn't very important, its primary function is to give easy to understand Assembly.

If we compile the code and output the content of the class it using 'javap -c Assert.class', we get the following bytecode:

public class com.asserts.Assert {
  static final boolean $assertionsDisabled;

  public com.asserts.Assert();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."":()V
       4: return

  public static void main(java.lang.String[]);
    Code:
       0: lconst_0
       1: lstore_1
       2: iconst_0
       3: istore_3
       4: iload_3
       5: ldc           #2                  // int 100000
       7: if_icmpge     24
      10: lload_1
      11: iload_3
      12: invokestatic  #3                  // Method twice:(I)I
      15: i2l
      16: ladd
      17: lstore_1
      18: iinc          3, 1
      21: goto          4
      24: getstatic     #4                  // Field java/lang/System.out:Ljava/io/PrintStream;
      27: lload_1
      28: invokevirtual #5                  // Method java/io/PrintStream.println:(J)V
      31: return

  public static int twice(int);
    Code:
       0: getstatic     #6                  // Field $assertionsDisabled:Z
       3: ifne          20
       6: iload_0
       7: ifeq          20
      10: new           #7                  // class java/lang/AssertionError
      13: dup
      14: ldc           #8                  // String a can't be 0
      16: invokespecial #9                  // Method java/lang/AssertionError."":(Ljava/lang/Object;)V
      19: athrow
      20: iload_0
      21: iconst_2
      22: imul
      23: ireturn

  static {};
    Code:
       0: ldc           #10                 // class com/asserts/Assert
       2: invokevirtual #11                 // Method java/lang/Class.desiredAssertionStatus:()Z
       5: ifne          12
       8: iconst_1
       9: goto          13
      12: iconst_0
      13: putstatic     #6                  // Field $assertionsDisabled:Z
      16: return
}
It is interesting to see how the assert has been implemented. A new synthetic static final field `$assertionsDisabled` has been added to the class which gets initialized when the class is loaded. See the 'static {}' for more detail.

The first instruction of the `twice` method loads this static field and then there is a check if this field is set. If `$assertionsDisabledis` false, it continues with the assert statement on #6. If `$assertionsDisabled` is true (so assert disabled), then there is a jump to the actual logic on #20.

If we convert the 'twice' bytecode back to Java, we would get:

public class Assert {

    static final boolean $assertionsDisabled;
 
    public static int twice(int a) {
        if(!$assertionsDisabled){
          if(a == 0) {
            throw new AssertError("a can't be 0")
          }
        }
        return a * 2;
    }
}
So the assert doesn't require any special new bytecode instructions. It is translated to a few simple checks.

Assert disabled

As a performance engineer, I'm always worried if we need to pay the price for something that isn't used. In this case, on the bytecode level, there is one extra branch to deal with when assertions are disabled and 2 branches when assertions are enabled. Till so far I'm not terribly excited since branching and high performance code, is not really a good mixture.

So lets see if the JIT can optimize it if assertions are disabled. For this I'm using the following commands:

-XX:+UnlockDiagnosticVMOptions
-XX:PrintAssemblyOptions=intel
-XX:-TieredCompilation
-XX:-Inline
-XX:CompileCommand=print,*Assert.twice
The TieredCompilation is disabled so we only get the C2 Assembly output. Inlining is disabled so that the 'twice' method doesn't get inlined in the main loop.

This provides us the following output:

Compiled method (c2)     190    8             com.asserts.Assert::twice (24 bytes)
 total in heap  [0x0000000101d7f310,0x0000000101d7f4e0] = 464
 relocation     [0x0000000101d7f430,0x0000000101d7f438] = 8
 main code      [0x0000000101d7f440,0x0000000101d7f460] = 32
 stub code      [0x0000000101d7f460,0x0000000101d7f478] = 24
 oops           [0x0000000101d7f478,0x0000000101d7f480] = 8
 metadata       [0x0000000101d7f480,0x0000000101d7f488] = 8
 scopes data    [0x0000000101d7f488,0x0000000101d7f498] = 16
 scopes pcs     [0x0000000101d7f498,0x0000000101d7f4d8] = 64
 dependencies   [0x0000000101d7f4d8,0x0000000101d7f4e0] = 8
Loaded disassembler from /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/lib/hsdis-amd64.dylib
Decoding compiled method 0x0000000101d7f310:
Code:
[Disassembling for mach='i386:x86-64']
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} {0x00000001ca2ba4f8} 'twice' '(I)I' in 'com/asserts/Assert'
  # parm0:    rsi       = int
  #           [sp+0x20]  (sp of caller)
  0x0000000101d7f440: sub    rsp,0x18
  0x0000000101d7f447: mov    QWORD PTR [rsp+0x10],rbp  ;*synchronization entry
                                                ; - com.asserts.Assert::twice@-1 (line 14)

  0x0000000101d7f44c: mov    eax,esi
  0x0000000101d7f44e: shl    eax,1              ;*imul
                                                ; - com.asserts.Assert::twice@22 (line 15)

  0x0000000101d7f450: add    rsp,0x10
  0x0000000101d7f454: pop    rbp
  0x0000000101d7f455: test   DWORD PTR [rip+0xfffffffffe87fba5],eax        # 0x00000001005ff000
                                                ;   {poll_return}
  0x0000000101d7f45b: ret    
  0x0000000101d7f45c: hlt    
  0x0000000101d7f45d: hlt    
  0x0000000101d7f45e: hlt    
  0x0000000101d7f45f: hlt    
[Exception Handler]
[Stub Code]
  0x0000000101d7f460: jmp    0x0000000101d70f60  ;   {no_reloc}
[Deopt Handler Code]
  0x0000000101d7f465: call   0x0000000101d7f46a
  0x0000000101d7f46a: sub    QWORD PTR [rsp],0x5
  0x0000000101d7f46f: jmp    0x0000000101d4bd00  ;   {runtime_call}
  0x0000000101d7f474: hlt    
  0x0000000101d7f475: hlt    
  0x0000000101d7f476: hlt    
  0x0000000101d7f477: hlt    
OopMapSet contains 0 OopMaps  
There is quite a lot of output. If we only focus on the logic of the method and strip all ceremony, the following instructions remain:
  0x0000000101d7f44c: mov    eax,esi
  0x0000000101d7f44e: shl    eax,1              ;*imul
                                                 ; - com.asserts.Assert::twice@22 (line 15)
The 'a' argument (stored in register esi) is copied into the eax register. Then we multiply the eax register by shifting the bits one to the left and store the result back into eax register. The eax register will be used to access the return value. Replacing the multiply by a shift left is a peephole optimization since bitshifting is faster than doing a multiply. The conclusion we can make is that if assert is disabled, the JIT completely removes the '$assertionsDisabled' check. That is good news; I hate to pay the price for something not used.

Assert enabled

What if assertions are enabled? For this we run with the following settings
-XX:+UnlockDiagnosticVMOptions
-XX:PrintAssemblyOptions=intel
-XX:-TieredCompilation
-XX:-Inline
-XX:-BackgroundCompilation
-XX:CompileCommand=print,*Assert.twice
-ea

And we get the following Assembly

Compiled method (c2)     168    8             com.asserts.Assert::twice (24 bytes)
 total in heap  [0x0000000109e7c310,0x0000000109e7c548] = 568
 relocation     [0x0000000109e7c430,0x0000000109e7c440] = 16
 main code      [0x0000000109e7c440,0x0000000109e7c480] = 64
 stub code      [0x0000000109e7c480,0x0000000109e7c498] = 24
 oops           [0x0000000109e7c498,0x0000000109e7c4a0] = 8
 metadata       [0x0000000109e7c4a0,0x0000000109e7c4a8] = 8
 scopes data    [0x0000000109e7c4a8,0x0000000109e7c4d0] = 40
 scopes pcs     [0x0000000109e7c4d0,0x0000000109e7c540] = 112
 dependencies   [0x0000000109e7c540,0x0000000109e7c548] = 8
Loaded disassembler from /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/lib/hsdis-amd64.dylib
Decoding compiled method 0x0000000109e7c310:
Code:
[Disassembling for mach='i386:x86-64']
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} {0x000000011217a4f8} 'twice' '(I)I' in 'com/asserts/Assert'
  # parm0:    rsi       = int
  #           [sp+0x20]  (sp of caller)
  0x0000000109e7c440: mov    DWORD PTR [rsp-0x14000],eax
  0x0000000109e7c447: push   rbp
  0x0000000109e7c448: sub    rsp,0x10           ;*synchronization entry
                                                ; - com.asserts.Assert::twice@-1 (line 14)

  0x0000000109e7c44c: test   esi,esi
  0x0000000109e7c44e: je     0x0000000109e7c460  ;*ifne
                                                ; - com.asserts.Assert::twice@7 (line 14)

  0x0000000109e7c450: mov    eax,esi
  0x0000000109e7c452: shl    eax,1              ;*imul
                                                ; - com.asserts.Assert::twice@22 (line 15)

  0x0000000109e7c454: add    rsp,0x10
  0x0000000109e7c458: pop    rbp
  0x0000000109e7c459: test   DWORD PTR [rip+0xffffffffff6ecba1],eax        # 0x0000000109569000
                                                ;   {poll_return}
  0x0000000109e7c45f: ret    
  0x0000000109e7c460: mov    esi,0x7
  0x0000000109e7c465: xchg   ax,ax
  0x0000000109e7c467: call   0x0000000109e47120  ; OopMap{off=44}
                                                ;*new  ; - com.asserts.Assert::twice@10 (line 14)
                                                ;   {runtime_call}
  0x0000000109e7c46c: call   0x000000010898fe44  ;*new
                                                ; - com.asserts.Assert::twice@10 (line 14)
                                                ;   {runtime_call}
  0x0000000109e7c471: hlt    
  0x0000000109e7c472: hlt    
  0x0000000109e7c473: hlt    
  0x0000000109e7c474: hlt    
  0x0000000109e7c475: hlt    
  0x0000000109e7c476: hlt    
  0x0000000109e7c477: hlt    
  0x0000000109e7c478: hlt    
  0x0000000109e7c479: hlt    
  0x0000000109e7c47a: hlt    
  0x0000000109e7c47b: hlt    
  0x0000000109e7c47c: hlt    
  0x0000000109e7c47d: hlt    
  0x0000000109e7c47e: hlt    
  0x0000000109e7c47f: hlt    
[Exception Handler]
[Stub Code]
  0x0000000109e7c480: jmp    0x0000000109e6bf60  ;   {no_reloc}
[Deopt Handler Code]
  0x0000000109e7c485: call   0x0000000109e7c48a
  0x0000000109e7c48a: sub    QWORD PTR [rsp],0x5
  0x0000000109e7c48f: jmp    0x0000000109e46d00  ;   {runtime_call}
  0x0000000109e7c494: hlt    
  0x0000000109e7c495: hlt    
  0x0000000109e7c496: hlt    
  0x0000000109e7c497: hlt    
OopMapSet contains 1 OopMaps

#0 
OopMap{off=44}
Lets remove a lot of the clutter:
  0x0000000109e7c44c: test   esi,esi
  0x0000000109e7c44e: je     0x0000000109e7c460  ;*ifne
                                                ; - com.asserts.Assert::twice@7 (line 14)

  0x0000000109e7c450: mov    eax,esi
  0x0000000109e7c452: shl    eax,1              ;*imul
                                                ; - com.asserts.Assert::twice@22 (line 15)

  0x0000000109e7c454: add    rsp,0x10
  0x0000000109e7c458: pop    rbp
  0x0000000109e7c459: test   DWORD PTR [rip+0xffffffffff6ecba1],eax        # 0x0000000109569000
                                                ;   {poll_return}
  0x0000000109e7c45f: ret    
  0x0000000109e7c460: mov    esi,0x7
  0x0000000109e7c465: xchg   ax,ax
  0x0000000109e7c467: call   0x0000000109e47120  ; OopMap{off=44}
                                                ;*new  ; - com.asserts.Assert::twice@10 (line 14)
                                                ;   {runtime_call}
  0x0000000109e7c46c: call   0x000000010898fe44  ;*new
                                                ; - com.asserts.Assert::twice@10 (line 14)
                                                ;   {runtime_call}
This one is a bit more complicated. Lets start at `0x0000000109e7c44c` where a 'test esi,esi' is done. 'test esi,esi' checks if 'a' is 0 and updates the 'ZF' (Zero Flag) in the flags register. If ZF is 0, then we continue with the regular logic; we move 'a' in the 'eax' register and multiply it by 2 and then the procedure exit ceremony is executed. This is no different then when assert is disabled.

If ZF is 1 (so a==0), then we jump to `0x0000000109e7c460`. We are totally bypassing the regular logic and even fail to execute the exit ceremony for procedure. Everything we have done so far is frozen and then we shoot into space. The mechanism we are seeing is called the uncommon trap and required for speculative optimizations. In this case, we have never called `twice` with 0, so for the JIT there was no reason to emit the 'throw new AssertError' code. As soon as I figured out the handling of this uncommon trap, I'll create a blogpost about it.

If we would translate this back to Java, we would get something like:

public class Assert {

    public static int twice(int a) {
        if(a == 0) {
            uncommonTrap();
        }
        return a * 2;
    }
}
If 'twice' would be called with 0, the uncommon trap gets executed, the code deoptimized and the interpreter will continue execution with the 'throw new AssertError' in place.

The conclusion we can make is: if assert is enabled, then the '$assertionsDisabled' check is removed.

Size matters

Till so far we have seen that the JIT completely removed all assert logic if assert is disabled. So it looks like assert is completely free if disabled. Unfortunately that isn't the case. The extra instructions inserted in the bytecode, can prevent inlining because inline limits are determined on the size of the bytecode.

Lets proof this by inflating the assert logic with some bogus additional conditions:

public class AssertFat {

    public static void main(String[] args) {
        long l = 0;
        for (int k = 1; k < 100_000; k++) {
            l += twice(k);
        }
        System.out.println(l);
    }

    public static int twice(int a) {
        assert a != 0 && a != -1 && a != -2 && a != -3 : "bad a";
        return a * 2;
    }
}
We have inflated the size of the assert artificially by adding some additional bogus checks.

If we would set the FreqInlineSize to 50, the method will not get inlined even when asserts are disabled:

-XX:+UnlockDiagnosticVMOptions 
-XX:+PrintInlining 
-XX:FreqInlineSize=50 
-da
In the logging we'll see the following in the inlining output:
    @ 12   com.asserts.AssertFat::twice (53 bytes)   callee is too large
This means that the `twice` method, 53 bytes, was too fat to get inlined since the maximum size for a frequently called method was set to 50 bytes.

To proof that the twice method would have been inlined without assert, lets remove the assert and run again:

@ 12   com.asserts.AssertFat::twice (4 bytes)
Now the method does get inlined.

This means that even though the JIT is able to completely remove the assert code if assert is disabled, it can still prevent inlining. Therefor the assert feature isn't completely free. It doesn't mean that you should remove assert from your code; but it is good to keep in the back of your mind.

1 opmerking:

  1. Thanks for your very good summary about optimization of asserts. It was a very good read. The inlining part has shocked me a little. I thought the inlining size is measured differently. But it is really bytecode instructions. Unfortunatly there seems to be that a useful param is missing in jdk 9 jlink. I have not found a --strip-assertions. That would be very cool. So that there is no excuse anymore for not using assertions.

    BeantwoordenVerwijderen

How does hyperthreading work.

Introduction In the last few months, there were 2 occurrences where people were talking about the implementation of hyperthreading; the In...