When a non static method is called on an interface or class, an `invokeinterface` or
`invokevirtual` bytecode is emitted. This makes it possible to do polymorphic calls;
so at runtime determining which method implementation should be executed.
The following useless method is used in this post.
public static int sizePlusOne(Collection c) {
return c.size() + 1;
}
The sizePlusOne function calls the size method on the passed collection and increments
the result by one.
If we view the bytecode using 'javap -c SomeClass' then we get the following output:
public static int sizePlusOne(java.util.Collection);
Code:
0: aload_0
1: invokeinterface #18, 1 // InterfaceMethod java/util/Collection.size:()I
6: iconst_1
7: iadd
8: ireturn
The problem is that a `invokeinterface` or `invokevirtual` is a dynamic dispatch instead
of a static dispatch. A dynamic dispatch involves more complexity because a vtable lookup
is required and it prevents inlining since the concrete method implementation isn't known.
What the JIT does is speculate that there will only be 1 or 2 types at most and instead of
doing a dynamic dispatch, there are some cheap guards and static dispatching. To test this
we'll be using Hotspot 1.8.0_91. Keep in mind that in the future the optimizations could be
implemented differently.
For this post we assume that:
- sizePlusOne method is called equally with the different types. There is not a dominant type.
- there are many implementations loaded for interface. Having only a single loaded implementation
deserves a post on its own.
- only the C2 code is relevant
Monomorphic callsite
The simplest version of the optimization is a monomorphic call site: so only a single type
being used.
To demonstrate the monomorphic callsite, we use the following example:
public class Devirtualization_MonoMorphic {
public static void main(String[] args) {
ArrayList arrayList = new ArrayList();
int result = 0;
for (int k = 0; k < 100_000; k++) {
result += sizePlusOne(arrayList);
}
System.out.println("result:" + result);
}
public static int sizePlusOne(Collection c) {
return c.size() + 1;
}
}
So we loop 100k times and execute the 'sizePlusOne' method with an ArrayList.
And we run it with the following settings to get the Assembly:
-XX:+UnlockDiagnosticVMOptions
-XX:PrintAssemblyOptions=intel
-XX:-TieredCompilation
-XX:-Inline
-XX:-BackgroundCompilation
-XX:CompileCommand=print,*Devirtualization_MonoMorphic.sizePlusOne
Inlining is disabled so we can focus on the content of the sizePlusOne method. This will not prevent inlining
of simple accessors like ArrayList/LinkedList.size. For more information see '-XX:-InlineAccessors' and
credits go to Andrew Haley for pointing this out. Background compilation is disabled so that compilation isn't
done asynchronous and tiered compilation disabled since for this blogpost only the C2 emitted Assembly will
be discussed.
When the program is run, the following Assembly is printed.
CompilerOracle: print *Devirtualization_MonoMorphic.sizePlusOne
Compiled method (c2) 247 9 com.devirtualization.Devirtualization_MonoMorphic::sizePlusOne (9 bytes)
total in heap [0x000000010be7e3d0,0x000000010be7e678] = 680
relocation [0x000000010be7e4f0,0x000000010be7e508] = 24
main code [0x000000010be7e520,0x000000010be7e580] = 96
stub code [0x000000010be7e580,0x000000010be7e598] = 24
oops [0x000000010be7e598,0x000000010be7e5a0] = 8
metadata [0x000000010be7e5a0,0x000000010be7e5b0] = 16
scopes data [0x000000010be7e5b0,0x000000010be7e5e0] = 48
scopes pcs [0x000000010be7e5e0,0x000000010be7e660] = 128
dependencies [0x000000010be7e660,0x000000010be7e668] = 8
nul chk table [0x000000010be7e668,0x000000010be7e678] = 16
Loaded disassembler from /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/lib/hsdis-amd64.dylib
Decoding compiled method 0x000000010be7e3d0:
Code:
[Disassembling for mach='i386:x86-64']
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} {0x000000011417a560} 'sizePlusOne' '(Ljava/util/Collection;)I' in 'com/devirtualization/Devirtualization_MonoMorphic'
# parm0: rsi:rsi = 'java/util/Collection'
# [sp+0x20] (sp of caller)
0x000000010be7e520: mov DWORD PTR [rsp-0x14000],eax
0x000000010be7e527: push rbp
0x000000010be7e528: sub rsp,0x10 ;*synchronization entry
; - com.devirtualization.Devirtualization_MonoMorphic::sizePlusOne@-1 (line 21)
0x000000010be7e52c: mov r11d,DWORD PTR [rsi+0x8] ; implicit exception: dispatches to 0x000000010be7e55d
0x000000010be7e530: cmp r11d,0xf8003231 ; {metadata('java/util/ArrayList')}
0x000000010be7e537: jne 0x000000010be7e54a ;*invokeinterface size
; - com.devirtualization.Devirtualization_MonoMorphic::sizePlusOne@1 (line 21)
0x000000010be7e539: mov eax,DWORD PTR [rsi+0x10]
0x000000010be7e53c: inc eax ;*iadd
; - com.devirtualization.Devirtualization_MonoMorphic::sizePlusOne@7 (line 21)
0x000000010be7e53e: add rsp,0x10
0x000000010be7e542: pop rbp
0x000000010be7e543: test DWORD PTR [rip+0xffffffffff6fdab7],eax # 0x000000010b57c000
; {poll_return}
0x000000010be7e549: ret
0x000000010be7e54a: mov rbp,rsi
0x000000010be7e54d: mov esi,0xffffffde
0x000000010be7e552: nop
0x000000010be7e553: call 0x000000010be47120 ; OopMap{rbp=Oop off=56}
;*invokeinterface size
; - com.devirtualization.Devirtualization_MonoMorphic::sizePlusOne@1 (line 21)
; {runtime_call}
0x000000010be7e558: call 0x000000010a9a2e44 ; {runtime_call}
0x000000010be7e55d: mov esi,0xfffffff6
0x000000010be7e562: nop
0x000000010be7e563: call 0x000000010be47120 ; OopMap{off=72}
;*invokeinterface size
; - com.devirtualization.Devirtualization_MonoMorphic::sizePlusOne@1 (line 21)
; {runtime_call}
0x000000010be7e568: call 0x000000010a9a2e44 ;*invokeinterface size
; - com.devirtualization.Devirtualization_MonoMorphic::sizePlusOne@1 (line 21)
; {runtime_call}
0x000000010be7e56d: hlt
0x000000010be7e56e: hlt
0x000000010be7e56f: hlt
0x000000010be7e570: hlt
0x000000010be7e571: hlt
0x000000010be7e572: hlt
0x000000010be7e573: hlt
0x000000010be7e574: hlt
0x000000010be7e575: hlt
0x000000010be7e576: hlt
0x000000010be7e577: hlt
0x000000010be7e578: hlt
0x000000010be7e579: hlt
0x000000010be7e57a: hlt
0x000000010be7e57b: hlt
0x000000010be7e57c: hlt
0x000000010be7e57d: hlt
0x000000010be7e57e: hlt
0x000000010be7e57f: hlt
[Exception Handler]
[Stub Code]
0x000000010be7e580: jmp 0x000000010be6bf60 ; {no_reloc}
[Deopt Handler Code]
0x000000010be7e585: call 0x000000010be7e58a
0x000000010be7e58a: sub QWORD PTR [rsp],0x5
0x000000010be7e58f: jmp 0x000000010be46d00 ; {runtime_call}
0x000000010be7e594: hlt
0x000000010be7e595: hlt
0x000000010be7e596: hlt
0x000000010be7e597: hlt
OopMapSet contains 2 OopMaps
#0
OopMap{rbp=Oop off=56}
#1
OopMap{off=72}
That is a lot output. Lets remove everything not relevant and add some comments:
0x000000010be7e52c: mov r11d,DWORD PTR [rsi+0x8]
;; Loads the address of the class of 'c' into register r11
0x000000010be7e530: cmp r11d,0xf8003231
;; Compare that address with the address of the ArrayList class
0x000000010be7e537: jne 0x000000010be7e54a
;; if the classes are different, jump to an uncommon trap
0x000000010be7e539: mov eax,DWORD PTR [rsi+0x10]
;; copies the size field of the ArrayList object into eax
0x000000010be7e53c: inc eax ;*iadd
;; Add one to eax.
If this would be converted to pseudo Java code, we would get something like this:
if(collection.class != ArrayList.class){
uncommonTrap()
}
int result = c.size;
result++;
Conclusion
With only a single type at the call site, the Assembly only has a simple type-guard and then
a static dispatch. So the dynamic dispatch is replaced by a static dispatch and inlining of
the method is possible.
Bimorphic callsite
The previous example has only 1 concrete type at the callsite. But what happens when a second
type is added.
public class Devirtualization_BiMorphic {
public static void main(String[] args) {
ArrayList arrayList = new ArrayList();
LinkedList linkedList = new LinkedList();
int result = 0;
for (int k = 0; k < 100_000; k++) {
result += sizePlusOne(arrayList);
}
for (int k = 0; k < 100_000; k++) {
result += sizePlusOne(linkedList);
}
System.out.println("result:" + result);
}
public static int sizePlusOne(Collection c) {
return c.size() + 1;
}
}
And we run with the following settings:
-XX:+UnlockDiagnosticVMOptions
-XX:PrintAssemblyOptions=intel
-XX:-TieredCompilation
-XX:-Inline
-XX:-BackgroundCompilation
-XX:CompileCommand=print,*Devirtualization_BiMorphic.sizePlusOne
First the Assembly of the the monomorphic version is emited, and as soon as the LinkedList is encountered,
the uncommon trap is executed and eventually new Assembly emitted:
Compiled method (c2) 291 11 com.devirtualization.Devirtualization_BiMorphic::sizePlusOne (9 bytes)
total in heap [0x000000010c339510,0x000000010c3397f8] = 744
relocation [0x000000010c339630,0x000000010c339648] = 24
main code [0x000000010c339660,0x000000010c3396c0] = 96
stub code [0x000000010c3396c0,0x000000010c3396d8] = 24
oops [0x000000010c3396d8,0x000000010c3396e0] = 8
metadata [0x000000010c3396e0,0x000000010c339700] = 32
scopes data [0x000000010c339700,0x000000010c339730] = 48
scopes pcs [0x000000010c339730,0x000000010c3397e0] = 176
dependencies [0x000000010c3397e0,0x000000010c3397e8] = 8
nul chk table [0x000000010c3397e8,0x000000010c3397f8] = 16
Decoding compiled method 0x000000010c339510:
Code:
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} {0x00000001146395f0} 'sizePlusOne' '(Ljava/util/Collection;)I' in 'com/devirtualization/Devirtualization_BiMorphic'
# parm0: rsi:rsi = 'java/util/Collection'
# [sp+0x20] (sp of caller)
0x000000010c339660: mov DWORD PTR [rsp-0x14000],eax
0x000000010c339667: push rbp
0x000000010c339668: sub rsp,0x10 ;*synchronization entry
; - com.devirtualization.Devirtualization_BiMorphic::sizePlusOne@-1 (line 28)
0x000000010c33966c: mov r10d,DWORD PTR [rsi+0x8] ; implicit exception: dispatches to 0x000000010c3396ad
0x000000010c339670: cmp r10d,0xf8003231 ; {metadata('java/util/ArrayList')}
0x000000010c339677: je 0x000000010c339687
0x000000010c339679: cmp r10d,0xf8008c83 ; {metadata('java/util/LinkedList')}
0x000000010c339680: jne 0x000000010c339698 ;*invokeinterface size
; - com.devirtualization.Devirtualization_BiMorphic::sizePlusOne@1 (line 28)
0x000000010c339682: mov eax,DWORD PTR [rsi+0x10] ;*getfield size
; - java.util.LinkedList::size@1 (line 326)
; - com.devirtualization.Devirtualization_BiMorphic::sizePlusOne@1 (line 28)
0x000000010c339685: jmp 0x000000010c33968a ;*invokeinterface size
; - com.devirtualization.Devirtualization_BiMorphic::sizePlusOne@1 (line 28)
0x000000010c339687: mov eax,DWORD PTR [rsi+0x10] ;*synchronization entry
; - com.devirtualization.Devirtualization_BiMorphic::sizePlusOne@-1 (line 28)
0x000000010c33968a: inc eax ;*iadd
; - com.devirtualization.Devirtualization_BiMorphic::sizePlusOne@7 (line 28)
0x000000010c33968c: add rsp,0x10
0x000000010c339690: pop rbp
0x000000010c339691: test DWORD PTR [rip+0xfffffffffe841969],eax # 0x000000010ab7b000
; {poll_return}
0x000000010c339697: ret
0x000000010c339698: mov rbp,rsi
0x000000010c33969b: mov esi,0xffffffc6
0x000000010c3396a0: data32 xchg ax,ax
0x000000010c3396a3: call 0x000000010c306120 ; OopMap{rbp=Oop off=72}
;*invokeinterface size
; - com.devirtualization.Devirtualization_BiMorphic::sizePlusOne@1 (line 28)
; {runtime_call}
0x000000010c3396a8: call 0x000000010b840e44 ; {runtime_call}
0x000000010c3396ad: mov esi,0xfffffff6
0x000000010c3396b2: nop
0x000000010c3396b3: call 0x000000010c306120 ; OopMap{off=88}
;*invokeinterface size
; - com.devirtualization.Devirtualization_BiMorphic::sizePlusOne@1 (line 28)
; {runtime_call}
0x000000010c3396b8: call 0x000000010b840e44 ;*invokeinterface size
; - com.devirtualization.Devirtualization_BiMorphic::sizePlusOne@1 (line 28)
; {runtime_call}
0x000000010c3396bd: hlt
0x000000010c3396be: hlt
0x000000010c3396bf: hlt
[Exception Handler]
[Stub Code]
0x000000010c3396c0: jmp 0x000000010c32af60 ; {no_reloc}
[Deopt Handler Code]
0x000000010c3396c5: call 0x000000010c3396ca
0x000000010c3396ca: sub QWORD PTR [rsp],0x5
0x000000010c3396cf: jmp 0x000000010c305d00 ; {runtime_call}
0x000000010c3396d4: hlt
0x000000010c3396d5: hlt
0x000000010c3396d6: hlt
0x000000010c3396d7: hlt
OopMapSet contains 2 OopMaps
#0
OopMap{rbp=Oop off=72}
#1
OopMap{off=88}
Again a lot of output. Lets trim it down a bit:
0x000000010c33966c: mov r10d,DWORD PTR [rsi+0x8]
;; Loads the address of the class of 'c' into register r10d
0x000000010c339670: cmp r10d,0xf8003231
;; Compare that address with the address of the ArrayList class
0x000000010c339677: je 0x000000010c339687
;; if it is an ArrayList, jump to 0x000000010c339687
0x000000010c339679: cmp r10d,0xf8008c83
;; it was not an ArrayList, lets compare with address of LinkedList class
0x000000010c339680: jne 0x000000010c339698
;; if jump to uncommon trap
0x000000010c339682: mov eax,DWORD PTR [rsi+0x10]
;; copies the size field of the LinkedList instance into eax
0x000000010c339685: jmp 0x000000010c33968a
0x000000010c339687: mov eax,DWORD PTR [rsi+0x10] ;*synchronization entry
;; copes the size field of the ArrayList instance into eax
0x000000010c33968a: inc eax
;; Add one to eax.
If we would convert this back pseudo Java we would get something like this:
int result
if(c.class == ArrayList.class){
result = c.size;
} else if(c.class == LinkedList.class){
result = c.size;
} else {
uncommonTrap();
}
result++;
Conclusion
In case of a bimorphic callsite, we don't need to pay the price for a dynamic dispatch
and inlining isn't prevented. However we do get 1 type-guard for one of the types and 2
type-guards for the other.
Megamorpic callsite
Lets add a third type of Collection and see what happens:
public class Devirtualization_MegaMorphic {
public static void main(String[] args) {
ArrayList arrayList = new ArrayList();
LinkedList linkedList = new LinkedList();
HashSet hashSet = new HashSet();
int result = 0;
for (int k = 0; k < 100_000; k++) {
result += sizePlusOne(arrayList);
}
for (int k = 0; k < 100_000; k++) {
result += sizePlusOne(linkedList);
}
for (int k = 0; k < 100_000; k++) {
result += sizePlusOne(hashSet);
}
System.out.println("result:" + result);
}
public static int sizePlusOne(Collection c) {
return c.size() + 1;
}
}
If we run with the following settings
-XX:+UnlockDiagnosticVMOptions
-XX:PrintAssemblyOptions=intel
-XX:-TieredCompilation
-XX:-Inline
-XX:-BackgroundCompilation
-XX:CompileCommand=print,*Devirtualization_MegaMorphic.sizePlusOne
Then the following Assembly is eventually emitted:
Compiled method (c2) 268 14 com.devirtualization.Devirtualization_MegaMorphic::sizePlusOne (9 bytes)
total in heap [0x000000010ea7d550,0x000000010ea7d788] = 568
relocation [0x000000010ea7d670,0x000000010ea7d680] = 16
main code [0x000000010ea7d680,0x000000010ea7d6c0] = 64
stub code [0x000000010ea7d6c0,0x000000010ea7d6d8] = 24
oops [0x000000010ea7d6d8,0x000000010ea7d6e0] = 8
metadata [0x000000010ea7d6e0,0x000000010ea7d6e8] = 8
scopes data [0x000000010ea7d6e8,0x000000010ea7d708] = 32
scopes pcs [0x000000010ea7d708,0x000000010ea7d768] = 96
dependencies [0x000000010ea7d768,0x000000010ea7d770] = 8
handler table [0x000000010ea7d770,0x000000010ea7d788] = 24
Decoding compiled method 0x000000010ea7d550:
Code:
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} {0x0000000116d7a678} 'sizePlusOne' '(Ljava/util/Collection;)I' in 'com/devirtualization/Devirtualization_MegaMorphic'
# parm0: rsi:rsi = 'java/util/Collection'
# [sp+0x20] (sp of caller)
0x000000010ea7d680: mov DWORD PTR [rsp-0x14000],eax
0x000000010ea7d687: push rbp
0x000000010ea7d688: sub rsp,0x10 ;*synchronization entry
; - com.devirtualization.Devirtualization_MegaMorphic::sizePlusOne@-1 (line 33)
0x000000010ea7d68c: nop
0x000000010ea7d68d: movabs rax,0xffffffffffffffff
0x000000010ea7d697: call 0x000000010ea45f60 ; OopMap{off=28}
;*invokeinterface size
; - com.devirtualization.Devirtualization_MegaMorphic::sizePlusOne@1 (line 33)
; {virtual_call}
0x000000010ea7d69c: inc eax ;*iadd
; - com.devirtualization.Devirtualization_MegaMorphic::sizePlusOne@7 (line 33)
0x000000010ea7d69e: add rsp,0x10
0x000000010ea7d6a2: pop rbp
0x000000010ea7d6a3: test DWORD PTR [rip+0xffffffffff742957],eax # 0x000000010e1c0000
; {poll_return}
0x000000010ea7d6a9: ret ;*invokeinterface size
; - com.devirtualization.Devirtualization_MegaMorphic::sizePlusOne@1 (line 33)
0x000000010ea7d6aa: mov rsi,rax
0x000000010ea7d6ad: add rsp,0x10
0x000000010ea7d6b1: pop rbp
0x000000010ea7d6b2: jmp 0x000000010ea6ece0 ; {runtime_call}
0x000000010ea7d6b7: hlt
0x000000010ea7d6b8: hlt
0x000000010ea7d6b9: hlt
0x000000010ea7d6ba: hlt
0x000000010ea7d6bb: hlt
0x000000010ea7d6bc: hlt
0x000000010ea7d6bd: hlt
0x000000010ea7d6be: hlt
0x000000010ea7d6bf: hlt
[Exception Handler]
[Stub Code]
0x000000010ea7d6c0: jmp 0x000000010ea6bf60 ; {no_reloc}
[Deopt Handler Code]
0x000000010ea7d6c5: call 0x000000010ea7d6ca
0x000000010ea7d6ca: sub QWORD PTR [rsp],0x5
0x000000010ea7d6cf: jmp 0x000000010ea46d00 ; {runtime_call}
0x000000010ea7d6d4: hlt
0x000000010ea7d6d5: hlt
0x000000010ea7d6d6: hlt
0x000000010ea7d6d7: hlt
Again a lot of output. If we extract only the relevant part, we get:
0x000000010ea7d68d: movabs rax,0xffffffffffffffff
0x000000010ea7d697: call 0x000000010ea45f60 ; OopMap{off=28}
;*invokeinterface size
; - com.devirtualization.Devirtualization_MegaMorphic::sizePlusOne@1 (line 33)
; {virtual_call}
0x000000010ea7d69c: inc eax ;*iadd
; - com.devirtualization.Devirtualization_MegaMorphic::sizePlusOne@7 (line 33)
The code above contains the expensive virtual call to Collection.size.
Conclusion
The JIT has given up on devirtualizing the callsite after it has 3 encountered different implementations.
Recommended reads
I would start out with
this excellent post of Aleksey Shipilëv. He goes into much more detail.
Very good
post from Richard Warburton.
And very informative
post from Martin Thompson.
Geen opmerkingen:
Een reactie posten