So is there any performance difference between the following 2 snipits:
int a = triple.getA(); int b = triple.getB(); int c = triple.getC(); return a + b + c;and:
return triple.getA() + triple.getC() + triple.getC();The JIT is pretty clever. If there would be any performance gain by removing temporary assignment to local variables, it will happily get rid of it. However, there is one caveat; the inline threshold for frequently executed methods depends on the bytecode size of the method; that is the original size of the bytecode. So if you make the bytecode of a method larger than needed, it can prevent inlining. And inlining is one of the most crucial optimizations because it provides the ability for further optimizations as we'll see.
Benchmark
The following JMH benchmark will determine if there is any performance impact.package org.sample; import org.openjdk.jmh.annotations.Benchmark; import org.openjdk.jmh.annotations.BenchmarkMode; import org.openjdk.jmh.annotations.CompilerControl; import org.openjdk.jmh.annotations.Fork; import org.openjdk.jmh.annotations.Measurement; import org.openjdk.jmh.annotations.Mode; import org.openjdk.jmh.annotations.OperationsPerInvocation; import org.openjdk.jmh.annotations.OutputTimeUnit; import org.openjdk.jmh.annotations.Scope; import org.openjdk.jmh.annotations.State; import org.openjdk.jmh.annotations.Warmup; import java.util.concurrent.TimeUnit; import static org.openjdk.jmh.annotations.CompilerControl.Mode.DONT_INLINE; @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @Warmup(iterations = 1) @Measurement(iterations = 1) @Fork(warmups = 1, value = 1) @OperationsPerInvocation(InlineBenchmark.OPERATIONS) public class InlineBenchmark { public static final int OPERATIONS = 1000 * 1000; @State(Scope.Benchmark) static public class Triple { int a; int b; int c; int getA() { return a; } int getB() { return b; } int getC() { return c; } } static int large(Triple triple) { int a = triple.getA(); int b = triple.getB(); int c = triple.getC(); return a + b + c; } static int small(Triple triple) { return triple.getA() + triple.getB() + triple.getC(); } @CompilerControl(DONT_INLINE) @Benchmark @Fork(jvmArgs = "-XX:FreqInlineSize=20") public int benchmark_small(Triple triple) { int v = 0; for (int k = 0; k < OPERATIONS; k++) { v = small(triple); } return v; } @CompilerControl(DONT_INLINE) @Benchmark @Fork(jvmArgs = "-XX:FreqInlineSize=20") public long benchmark_large_with_low_inline_size(Triple triple) { int v = 0; for (int k = 0; k < OPERATIONS; k++) { v = large(triple); } return v; } @CompilerControl(DONT_INLINE) @Benchmark @Fork(jvmArgs = "-XX:FreqInlineSize=21") public long benchmark_large_with_high_inline_size(Triple triple) { int v = 0; for (int k = 0; k < OPERATIONS; k++) { v = large(triple); } return v; } }There are 2 important methods. The 'small' method that does not use temporary local variables:
static int small(Triple triple) { return triple.getA() + triple.getB() + triple.getC(); }And the 'large' method that does use temporary local variables:
static int large(Triple triple) { int a = triple.getA(); int b = triple.getB(); int c = triple.getC(); return a + b + c; }
There are 3 benchmark methods:
- benchmark_short_sum: this will call the 'small' method in a loop. It is configured with the same inline size as the benchmark_long_sum_with_low_inline_size.
- benchmark_long_sum_with_low_inline_size: this will call the 'large' method in a loop. The inline size has been set just above the bytecode size of the method. This will disable the inlining of the 'large' method.
- benchmark_long_sum_with_high_inline_size: this will call the 'large' method in a loop. The inline size has been set to the bytecode size of the method. This will enable the inlining of the 'large' method.
static int large(org.sample.InlineBenchmark$Triple); Code: 0: aload_0 1: invokevirtual #2 // Method org/sample/InlineBenchmark$Triple.getA:()I 4: istore_1 5: aload_0 6: invokevirtual #3 // Method org/sample/InlineBenchmark$Triple.getB:()I 9: istore_2 10: aload_0 11: invokevirtual #4 // Method org/sample/InlineBenchmark$Triple.getC:()I 14: istore_3 15: iload_1 16: iload_2 17: iadd 18: iload_3 19: iadd 20: ireturn static int small(org.sample.InlineBenchmark$Triple); Code: 0: aload_0 1: invokevirtual #2 // Method org/sample/InlineBenchmark$Triple.getA:()I 4: aload_0 5: invokevirtual #3 // Method org/sample/InlineBenchmark$Triple.getB:()I 8: iadd 9: aload_0 10: invokevirtual #4 // Method org/sample/InlineBenchmark$Triple.getC:()I 13: iadd 14: ireturnSo there is definitely a difference because the javac did not optimize the code (this is the task of the JIT).
In the above benchmark, the benchmark_large_with_high_inline_size has been configured with FreqInlineSize=21, which allows the 'large' method to be inlined and the benchmark_large_with_low_inline_size has been configured with FreqInlineSize=20 which prevents the 'large' method from being inlined.
Results:
Benchmark Mode Cnt Score Error Units InlineBenchmark.benchmark_large_with_high_inline_size avgt ≈ 10⁻⁶ ns/op InlineBenchmark.benchmark_large_with_low_inline_size avgt 1.306 ns/op InlineBenchmark.benchmark_small avgt ≈ 10⁻⁶ ns/opAs can be seen, benchmark_large_with_high_inline_size is a lot faster than the benchmark_large_with_low_inline_siz. We can also see that the 'small' method could still be inlined with a FreqInlineSize=20, where the 'large' method could not.
The reason why benchmark_large_with_high_inline_size and benchmark_small are so fast is because of inlining. The JIT could see what is happening inside the method in the loop. This prodived the opportunity for further optimization. In this case, there is no reason to execute the logic of the inlined method 1M times since the result will be the same for every iteration; just 1 time is sufficient.
Conclusions
Temporary assigning parameters to local variables can reduce performance when the inline threshold of the bytecode size of the method is exceeded. If the inline threshold isn't exceeded, there doesn't seem to be a performance impact.This is not an excuse to remove all temporary assignments to local variables because local variables can increase clarity. Only when you have determined that a method is a bottleneck, you should optimize. Otherwise, clean code is preferred above premature optimizations.
Geen opmerkingen:
Een reactie posten