woensdag 25 januari 2017

final static boolean & JIT

For this post we are going to look at the cost of having a final static boolean in the code. They can be very useful to enable or disable certain behavior e.g. tracing, logging etc. The question is what kind of performance implications it has.

The reason for making this post is that I didn't know the implications and I asked the question on the Mechanical Sympathy Mailinglist. So I would like to thank the people on this mailing list for answering my question.

For this post we have the following assumptions:

  • we only care about the output of the C2 compiler
  • we are using Java hotspot 1.8.0_91

Constant expression

Let's start with the most basic case where the final static field is initialized using a constant expression:
public class StaticFinal_ConstantExpression {

    public static void main(String[] args) {
        int result = 0;
        for (int k = 0; k < 100_000; k++) {
            result += doMath(k);
        }
        System.out.println(result);
    }

    final static boolean ENABLED = true;

    public static int doMath(int a) {
        if (ENABLED) {
            return a + 1;
        } else {
            return a - 1;
        }
    }
}
The actual logic in the 'doMath' isn't terribly exciting. The main purpose provide easy to understand bytecode or Assembly.

When we check the bytecode for the 'doMath' method using 'javap -c StaticFinal_ConstantExpression.class' we get the following:

  public static int doMath(int);
    Code:
       0: iload_0
       1: iconst_1
       2: iadd
       3: ireturn
If we would convert this back to Java we would get:
public static int doMath(int a) {
 return a + 1;
}
The Javac has propagated the ENABLED constant and completely removed the dead code. We don't even to look at the Assembly.

Be careful with final statics and constant expressions; if the value is changed and one or more classes that read this value are not recompiled, they will not see the new value.

Non constant expression

In the previous example there was a hard coded constant value for ENABLED. In practice you often want something more flexible, e.g. using some kind of System property. So let's change the ENABLED initialization so it gets its value from a System property 'enabled'.
public class StaticFinal_NonConstantExpression {

    public static void main(String[] args) {
        int result = 0;
        for (int k = 0; k < 100_000; k++) {
            result += doMath(k);
        }
        System.out.println(result);
    }

    final static boolean ENABLED = Boolean.getBoolean("enabled");

    public static int doMath(int a) {
        if (ENABLED) {
            return a + 1;
        } else {
            return a - 1;
        }
    }
}
And if we display the relevant bytecode using 'javap -c StaticFinal_NonConstantExpression.class', we get the following.
  static final boolean ENABLED;
 
  public static int doMath(int);
    Code:
       0: getstatic     #6                  // Field ENABLED:Z
       3: ifeq          10
       6: iload_0
       7: iconst_1
       8: iadd
       9: ireturn
      10: iload_0
      11: iconst_1
      12: isub
      13: ireturn

  static {};
    Code:
       0: ldc           #7                  // String enabled
       2: invokestatic  #8                  // Method java/lang/Boolean.getBoolean:(Ljava/lang/String;)Z
       5: putstatic     #6                  // Field ENABLED:Z
       8: return
We can see that the 'doMath' still contains the check and the logic for both branches. The Javac has not made any optimizations since it doesn't know which value ENABLED is going to be at runtime.

Lets go a level deeper and see what kind of Assembly we are going to get. To display the Assembly, we'll use the following parameters

-XX:+UnlockDiagnosticVMOptions
-XX:PrintAssemblyOptions=intel
-XX:-TieredCompilation
-XX:-Inline
-XX:CompileCommand=print,*.doMath
-Denabled=true
Tiered compilation is disabled since we are only interested in the C2 output. Inlining is disabled to prevent the 'doMath' method getting inlined into the main loop. Also we set the enabled system property to true.

When we run we get the following Assembly

Compiled method (c2)     248    8             com.constant_folding.StaticFinal_NonConstantExpression::doMath (14 bytes)
 total in heap  [0x00000001083a7a90,0x00000001083a7c60] = 464
 relocation     [0x00000001083a7bb0,0x00000001083a7bb8] = 8
 main code      [0x00000001083a7bc0,0x00000001083a7be0] = 32
 stub code      [0x00000001083a7be0,0x00000001083a7bf8] = 24
 oops           [0x00000001083a7bf8,0x00000001083a7c00] = 8
 metadata       [0x00000001083a7c00,0x00000001083a7c08] = 8
 scopes data    [0x00000001083a7c08,0x00000001083a7c18] = 16
 scopes pcs     [0x00000001083a7c18,0x00000001083a7c58] = 64
 dependencies   [0x00000001083a7c58,0x00000001083a7c60] = 8
Loaded disassembler from /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/lib/hsdis-amd64.dylib
Decoding compiled method 0x00000001083a7a90:
Code:
[Disassembling for mach='i386:x86-64']
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} {0x00000001d08e24b8} 'doMath' '(I)I' in 'com/constant_folding/StaticFinal_NonConstantExpression'
  # parm0:    rsi       = int
  #           [sp+0x20]  (sp of caller)
  0x00000001083a7bc0: sub    rsp,0x18
  0x00000001083a7bc7: mov    QWORD PTR [rsp+0x10],rbp  ;*synchronization entry
                                                ; - com.constant_folding.StaticFinal_NonConstantExpression::doMath@-1 (line 16)

  0x00000001083a7bcc: mov    eax,esi
  0x00000001083a7bce: inc    eax                ;*iadd
                                                ; - com.constant_folding.StaticFinal_NonConstantExpression::doMath@8 (line 17)

  0x00000001083a7bd0: add    rsp,0x10
  0x00000001083a7bd4: pop    rbp
  0x00000001083a7bd5: test   DWORD PTR [rip+0xfffffffffff74425],eax        # 0x000000010831c000
                                                ;   {poll_return}
  0x00000001083a7bdb: ret    
  0x00000001083a7bdc: hlt    
  0x00000001083a7bdd: hlt    
  0x00000001083a7bde: hlt    
  0x00000001083a7bdf: hlt    
[Exception Handler]
[Stub Code]
  0x00000001083a7be0: jmp    0x000000010839af60  ;   {no_reloc}
[Deopt Handler Code]
  0x00000001083a7be5: call   0x00000001083a7bea
  0x00000001083a7bea: sub    QWORD PTR [rsp],0x5
  0x00000001083a7bef: jmp    0x0000000108375d00  ;   {runtime_call}
  0x00000001083a7bf4: hlt    
  0x00000001083a7bf5: hlt    
  0x00000001083a7bf6: hlt    
  0x00000001083a7bf7: hlt    
OopMapSet contains 0 OopMaps
Lot of output. Let's remove everything that isn't relevant:
  0x00000001083a7bcc: mov    eax,esi
   ;; copy the content of 'a' into eax
  0x00000001083a7bce: inc    eax        
   ;; increase eax by one
The JIT has propagated the ENABLED constant and removed the dead code.

If we run with '-Denabled=false', we'll get similar Assembly:

  0x000000010b4eb7cc: mov    eax,esi
  0x000000010b4eb7ce: dec    eax                ;*isub
                                                ; - com.constant_folding.StaticFinal_NonConstantExpression::doMath@12 (line 19)
So also in this case the JIT has propagated the constant and removed the dead code.

Original size of bytecode matters

So it seems that we can use static final with non constant expression to disable or enable certain behavior. Unfortunately this isn't true. Inlining can still be prevented because the choice to inline is determined based on the original bytecode size. To demonstrate this we'll use the following code:
public class StaticFinal_OriginalSizeMatters {

    public static void main(String[] args) {
        int result = 0;
        for (int k = 0; k < 1_000_000; k++) {
            result += doMath(k);
        }
        System.out.println(result);
    }

    final static boolean ENABLED = Boolean.getBoolean("enabled");

    public static int doMath(int a) {
        if (ENABLED) {
            System.out.print("n");
            System.out.print("e");
            System.out.print("v");
            System.out.print("e");
            System.out.print("r");
            return a + 1;
        } else {
            return a - 1;
        }
    }
}
When we run with using:
-XX:+UnlockDiagnosticVMOptions
-XX:+PrintInlining
-XX:FreqInlineSize=50
-Denabled=false
We'll see the following output:
@ 12   com.constant_folding.StaticFinal_OriginalSizeMatters::doMath (54 bytes)   callee is too large
@ 27  java/io/PrintStream::println (not loaded)   not inlineable
@ 12   com.constant_folding.StaticFinal_OriginalSizeMatters::doMath (54 bytes)   callee is too large
@ 27  java/io/PrintStream::println (not loaded)   not inlineable
@ 12   com.constant_folding.StaticFinal_OriginalSizeMatters::doMath (54 bytes)   hot method too big
So even though ENABLED is false, the method is still too fat to get inlined because the original bytecode is used.

Conclusion

A final static boolean with a constant expression is completely free. The Javac will do the constant propagation and dead code elimination and there is no price to pay.

A final static boolean with a non constant expression will be fully optimized by the JIT. However inlining can be prevented because the original size of the bytecode determines if something gets inlined; not what the JIT made out of it.

2 opmerkingen:

  1. If i understood your point correctly, you can divide your "decider" method ('doMath' in your case) into two: one for ENABLED=true (let's call it 'doMathEnabled') and one for ENABLED=false ('doMathDisabled')

    Thus, in case of ENABLED=true 'doMath' and 'doMathEnabled' will be inlined without the influence of the size of 'doMathDisabled'.

    BeantwoordenVerwijderen
  2. Thank you for your reply.

    Good point, it should allow the inlining of the 'doMath' methods.

    However. now the calling method could be prevented from being inlined itself into an other method. Effectively the problem has been moved.

    Another other problem is that you don't want to litter the caller code with if(enabled)then this else that. If the 'doMath' method is used at more than 1 place, you need to solve the problem more than once.

    BeantwoordenVerwijderen

Will temporary assignment to local variable impact performance?

This post is written as a consequence of the following stack overflow question . The question is if temporary assigning the results of some ...