zondag 23 april 2017

JIT and instanceof

One of the things that always wondered me is how expensive an instanceof check is. In most cases this is never a concern, but out of professional curiosity I wondered how well the JIT is able to optimize it. Lets start with the simplest case, an instanceof of check using a final class as second argument. For this we use the following example:
public class InstanceOfCheck {

    public static void main(String[] args) {
        for (int k = 0; k < 100000; k++) {
            doSomething("");
        }
        System.out.println(doSomething(""));
    }

    public static String doSomething(Object o) {
        if (o.getClass() instanceof String.class) {
            return "string";
        } else {
            return "nostring";
        }
    }
}
For this post we use the following Java version:
java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
I'm too lazy to update the JDK for my MacBook which I hardly use. And the following JVM parameters:
-XX:+UnlockDiagnosticVMOptions 
-XX:PrintAssemblyOptions=intel 
-XX:-TieredCompilation 
-XX:-Inline 
-XX:-BackgroundCompilation 
-XX:CompileCommand=print,*InstanceOfCheck.doSomething
-XX:+PrintCompilation 
This will give us the Assembly of the 'doSomething' method. When I run the above example, I get the following output:
    125    1    b        java.lang.String::indexOf (70 bytes)
    128    2 %  b        java.lang.String::indexOf @ 37 (70 bytes)
    137    3    b        java.lang.String::hashCode (55 bytes)
    192    4    b        sun.nio.cs.UTF_8$Encoder::encode (359 bytes)
    213    5    b        java.lang.String::charAt (29 bytes)
    216    6    b        java.io.UnixFileSystem::normalize (75 bytes)
    220    7    b        java.lang.String::lastIndexOf (52 bytes)
    243    8    b        com.InstanceOfCheck::doSomething (13 bytes)
Compiled method (c2)     244    8             com.InstanceOfCheck::doSomething (13 bytes)
 total in heap  [0x0000000106479e10,0x000000010647a090] = 640
 relocation     [0x0000000106479f30,0x0000000106479f48] = 24
 main code      [0x0000000106479f60,0x0000000106479fc0] = 96
 stub code      [0x0000000106479fc0,0x0000000106479fd8] = 24
 oops           [0x0000000106479fd8,0x0000000106479fe0] = 8
 metadata       [0x0000000106479fe0,0x0000000106479ff0] = 16
 scopes data    [0x0000000106479ff0,0x000000010647a018] = 40
 scopes pcs     [0x000000010647a018,0x000000010647a078] = 96
 dependencies   [0x000000010647a078,0x000000010647a080] = 8
 nul chk table  [0x000000010647a080,0x000000010647a090] = 16
Loaded disassembler from /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/lib/hsdis-amd64.dylib
Decoding compiled method 0x0000000106479e10:
Code:
[Disassembling for mach='i386:x86-64']
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} {0x000000010e77a480} 'doSomething' '(Ljava/lang/Object;)Ljava/lang/String;' in 'com/InstanceOfCheck'
  # parm0:    rsi:rsi   = 'java/lang/Object'
  #           [sp+0x20]  (sp of caller)
  0x0000000106479f60: mov    DWORD PTR [rsp-0x14000],eax
  0x0000000106479f67: push   rbp
  0x0000000106479f68: sub    rsp,0x10           ;*synchronization entry
                                                ; - com.InstanceOfCheck::doSomething@-1 (line 14)

  0x0000000106479f6c: mov    r11d,DWORD PTR [rsi+0x8]  ; implicit exception: dispatches to 0x0000000106479fa1
  0x0000000106479f70: cmp    r11d,0xf80002da    ;   {metadata('java/lang/String')}
  0x0000000106479f77: jne    0x0000000106479f8f
  0x0000000106479f79: movabs rax,0x7955dfce0    ;   {oop("string")}
  0x0000000106479f83: add    rsp,0x10
  0x0000000106479f87: pop    rbp
  0x0000000106479f88: test   DWORD PTR [rip+0xffffffffff73f072],eax        # 0x0000000105bb9000
                                                ;   {poll_return}
  0x0000000106479f8e: ret    
  0x0000000106479f8f: mov    rbp,rsi
  0x0000000106479f92: mov    esi,0xffffffde
  0x0000000106479f97: call   0x0000000106447120  ; OopMap{rbp=Oop off=60}
                                                ;*instanceof
                                                ; - com.InstanceOfCheck::doSomething@1 (line 14)
                                                ;   {runtime_call}
  0x0000000106479f9c: call   0x0000000104fdfe44  ;   {runtime_call}
  0x0000000106479fa1: mov    esi,0xfffffff4
  0x0000000106479fa6: nop
  0x0000000106479fa7: call   0x0000000106447120  ; OopMap{off=76}
                                                ;*instanceof
                                                ; - com.InstanceOfCheck::doSomething@1 (line 14)
                                                ;   {runtime_call}
  0x0000000106479fac: call   0x0000000104fdfe44  ;*instanceof
                                                ; - com.InstanceOfCheck::doSomething@1 (line 14)
                                                ;   {runtime_call}
  0x0000000106479fb1: hlt    
  0x0000000106479fb2: hlt    
  0x0000000106479fb3: hlt    
  0x0000000106479fb4: hlt    
  0x0000000106479fb5: hlt    
  0x0000000106479fb6: hlt    
  0x0000000106479fb7: hlt    
  0x0000000106479fb8: hlt    
  0x0000000106479fb9: hlt    
  0x0000000106479fba: hlt    
  0x0000000106479fbb: hlt    
  0x0000000106479fbc: hlt    
  0x0000000106479fbd: hlt    
  0x0000000106479fbe: hlt    
  0x0000000106479fbf: hlt    
[Exception Handler]
[Stub Code]
  0x0000000106479fc0: jmp    0x000000010646bf60  ;   {no_reloc}
[Deopt Handler Code]
  0x0000000106479fc5: call   0x0000000106479fca
  0x0000000106479fca: sub    QWORD PTR [rsp],0x5
  0x0000000106479fcf: jmp    0x0000000106446d00  ;   {runtime_call}
  0x0000000106479fd4: hlt    
  0x0000000106479fd5: hlt    
  0x0000000106479fd6: hlt    
  0x0000000106479fd7: hlt    
OopMapSet contains 2 OopMaps

#0 
OopMap{rbp=Oop off=60}
#1 
OopMap{off=76}
    247    9 %  b        com.InstanceOfCheck::main @ 2 (32 bytes)
    248    9 %           com.InstanceOfCheck::main @ -2 (32 bytes)   made not entrant
string
After filtering out the non relevant parts, the following remains:
  0x0000000106479f6c: mov    r11d,DWORD PTR [rsi+0x8]  ; implicit exception: dispatches to 0x0000000106479fa1
  0x0000000106479f70: cmp    r11d,0xf80002da    ;   {metadata('java/lang/String')}
  0x0000000106479f77: jne    0x0000000106479f8f
  0x0000000106479f79: movabs rax,0x7955dfce0    ;   {oop("string")}
  0x0000000106479f83: add    rsp,0x10
  0x0000000106479f87: pop    rbp
  0x0000000106479f88: test   DWORD PTR [rip+0xffffffffff73f072],eax        # 0x0000000105bb9000
                                                ;   {poll_return}
  0x0000000106479f8e: ret    
The first instruction loads the class address of the 'o' argument in register r11d. And the second instruction it is compared with the String class address. If the address is not equal, we jump to the uncommon trap since we have only passed String arguments to the 'doSomething' method. So it seems that doing an instanceof check using a final class is very cheap. And we get exactly the same Assembly as direct address comparison in the Java code:
    public static String doSomething(Object o) {
        if (o.getClass() == String.class) {
            return "string";
        } else {
            return "nostring";
        }
    }
Which gives the following relevant Assembly:
 0x000000010e0b296c: mov    r11d,DWORD PTR [rsi+0x8]  ;*invokevirtual getClass
                                                ; - com.InstanceOfCheck::doSomething@1 (line 14)
                                                ; implicit exception: dispatches to 0x000000010e0b29b1
  0x000000010e0b2970: cmp    r11d,0xf80002da    ;   {metadata('java/lang/String')}
  0x000000010e0b2977: jne    0x000000010e0b298f  ;*if_acmpne
                                                ; - com.InstanceOfCheck::doSomething@6 (line 14)

  0x000000010e0b2979: movabs rax,0x7955db818    ;   {oop("string")}
  0x000000010e0b2983: add    rsp,0x10
  0x000000010e0b2987: pop    rbp
  0x000000010e0b2988: test   DWORD PTR [rip+0xffffffffff72a672],eax        # 0x000000010d7dd000
                                                ;   {poll_return}
  0x000000010e0b298e: ret    

Non final but concrete class

In the previous example we saw that the JIT perfectly optimized an instanceof check comparing with a final class. But what about a non final concrete class like e.g. LinkedList? Will the JIT be able to optimize that? For this we modify the example:
public class InstanceOfCheck {
    
    public static void main(String[] args) {
        List l = new LinkedList();
        for (int k = 0; k < 100000; k++) {
            doSomething(l);
        }
        System.out.println(doSomething(l));
    }

    public static String doSomething(Object o) {
        if (o.getClass() == LinkedList.class) {
            return "linkedlist";
        } else {
            return "nolinkedlist";
        }
    }
}
If we extract only the relevant Assembly:
  0x000000010307ccec: mov    r11d,DWORD PTR [rsi+0x8]  ;*invokevirtual getClass
                                                ; - com.InstanceOfCheck::doSomething@1 (line 15)
                                                ; implicit exception: dispatches to 0x000000010307cd31
  0x000000010307ccf0: cmp    r11d,0xf8008c83    ;   {metadata('java/util/LinkedList')}
  0x000000010307ccf7: je     0x000000010307cd0f  ;*if_acmpne
                                                ; - com.InstanceOfCheck::doSomething@6 (line 15)

  0x000000010307ccf9: movabs rax,0x7955dd8d0    ;   {oop("nostring")}
  0x000000010307cd03: add    rsp,0x10
  0x000000010307cd07: pop    rbp
  0x000000010307cd08: test   DWORD PTR [rip+0xffffffffff6642f2],eax        # 0x00000001026e1000
                                                ;   {poll_return}
  0x000000010307cd0e: ret    
This Assembly looks very similar to the String version. The JIT sees that we only have passed arguments of type LinkedList and therefor it optimizes this with a cheap class address comparison and adds an uncommon trap in case of a different argument being passed.

Non concrete class check?

What would happen if we would check instanceof using a non concrete class like List. In this case it isn't possible to do a cheap class comparison:
public class InstanceOfCheck {
    
    public static void main(String[] args) {
        List l = new LinkedList();
        for (int k = 0; k < 100000; k++) {
            doSomething(l);
        }
        System.out.println(doSomething(l));
    }

    public static String doSomething(Object o) {
        if (o instanceof List) {
            return "list";
        } else {
            return "nolist";
        }
    }
}
When we extract the relevant Assembly we get the following:
  0x000000010807e66c: mov    r11d,DWORD PTR [rsi+0x8]  ; implicit exception: dispatches to 0x000000010807e6a1
  0x000000010807e670: cmp    r11d,0xf8008c83    ;   {metadata('java/util/LinkedList')}
  0x000000010807e677: jne    0x000000010807e68f
  0x000000010807e679: movabs rax,0x7955ddf58    ;   {oop("list")}
  0x000000010807e683: add    rsp,0x10
  0x000000010807e687: pop    rbp
  0x000000010807e688: test   DWORD PTR [rip+0xffffffffff61f972],eax        # 0x000000010769e000
                                                ;   {poll_return}
  0x000000010807e68e: ret    
This is a nice surprise! Because we have only passed LinkedLists as arguments, the instanceof with List as argument is implemented as a cheap pointer comparison with LinkedList class-address. If anything else than a LinkedList is passed, an uncommon trap is executed. So this means that even in this case the JIT is able to optimize!

How does hyperthreading work.

Introduction In the last few months, there were 2 occurrences where people were talking about the implementation of hyperthreading; the In...