- Leading Netty effort at Red Hat
- Vert.x / NIO / Performance
- Author of Netty in Action
- @normanmaurer
- github.com/normanmaurer
https://www.flickr.com/photos/st3f4n/3805014022/
May be hard for the JIT to unroll
int i = 0;
for (;;) {
if (array.length == i) {
break;
}
doSomething(array[i++]);
}
array.length == 4
Easy to unroll
int i = 0;
for (int i = 0; i < array.length; i++) {
doSomething(array[i]);
}
unrolled loop
doSomething(array[0]);
doSomething(array[1]);
doSomething(array[2]);
doSomething(array[3]);
Limit for inline hot methods is 325 byte operations per default… |
Before inlined
public void methodA() {
... // Do some work A (1)
methodB();
}
private void methodB() {
... // Do some more work B (2)
}
methodB is inlined into methodA
public void methodA() {
... // Do some work A (3)
... // Do some more work B (4)
}
Record inline tasks
java \
-XX:+PrintCompilation \ (1)
-XX:+UnlockDiagnosticVMOptions \ (2)
-XX:+PrintInlining \ (3)
.... > inline.log
1 | Prints out when JIT compilation happens |
2 | Is needed to use flags like -XX:+PrintInlining |
3 | Prints what methods were inlined |
inline.log content example
@ 42 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe::read (191 bytes) inline (hot) (1)
@ 42 io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe::read (327 bytes) hot method too big (2)
@ 4 io.netty.channel.socket.nio.NioSocketChannel::config (5 bytes) inline (hot)
@ 1 io.netty.channel.socket.nio.NioSocketChannel::config (5 bytes) inline (hot)
@ 12 io.netty.channel.AbstractChannel::pipeline (5 bytes) inline (hot)
1 | Method is hot and was inlined |
2 | Method is hot and was too big for inlining :( |
You want to have your hot methods inlined, for max performance when possible! |
Original code which was cause of too big for inlining
private final class NioMessageUnsafe extends AbstractNioUnsafe {
public void read() {
final SelectionKey key = selectionKey();
if (!config().isAutoRead()) {
int interestOps = key.interestOps();
if ((interestOps & readInterestOp) != 0) {
// only remove readInterestOp if needed
key.interestOps(interestOps & ~readInterestOp);
}
}
... // rest of the method
}
...
}
Factor out less likely branch out in new method
private final class NioMessageUnsafe extends AbstractNioUnsafe {
private void removeReadOp() {
SelectionKey key = selectionKey();
int interestOps = key.interestOps();
if ((interestOps & readInterestOp) != 0) {
// only remove readInterestOp if needed
key.interestOps(interestOps & ~readInterestOp);
}
}
public void read() {
if (!config().isAutoRead()) {
removeReadOp();
}
... // rest of the method
}
...
}
inline.log content after change
@ 42 io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe::read (288 bytes) inline (hot) (1)
1 | Method is now small enough for inlining :) |
https://www.flickr.com/photos/brears/2685223351
http://www.cliffc.org/blog/2007/11/06/clone-or-not-clone/
You can inline by hand |
https://www.flickr.com/photos/tracheotomy_bob/5430166369
Only do this if you really need the last percent of performance. This can result in a maintainance nightmare! |
https://www.flickr.com/photos/stavos52093/13807616194/
Ouch :(
public class YourClass {
@Override
public void finalize() {
// cleanup resources and destroy them
}
}
Better :)
public class YourClass {
public void destroy() {
// cleanup resources and destroy them
}
}
Overriding finalize() is 99% of the time a very bad idea! |
https://www.flickr.com/photos/counteragent/2190576349
https://www.flickr.com/photos/tracheotomy_bob/5430166369
Also pre-fetch works out quite well for array based datastructures, also cache lines etc. |
Roll your own linked-list datastructure
public class Entry {
Entry next; (1)
Entry prev;
public void methodA() {}
}
public class EntryList {
Entry head; (2)
Entry tail;
public void add(Entry entry) {}
}
1 | Entry itself can be used as node |
2 | head and tail of the list |
Each Entry can hold be added to one EntryList. |
https://www.flickr.com/photos/ryanr/142455033/
There is @Contended in Java8 but needs to be enabled via switch and not allow to pack multiple objects in one cache-line. This is part of JEP-142 |
The following code snippets and diagram is based on Martin Thompsons blog posts False Sharing and False Sharing && Java 7. |
https://twitter.com/normanmaurer/status/465821479012401152
Credit where credit is due. |
False-sharing Test
public final class FalseSharing implements Runnable {
private static AtomicLong[] longs = new AtomicLong[NUM_THREADS];
....
public static void runTest() throws InterruptedException {
Thread[] threads = new Thread[NUM_THREADS];
for (int i = 0; i < threads.length; i++)
threads[i] = new Thread(new FalseSharing(i));
for (Thread t : threads) { t.start(); }
for (Thread t : threads) { t.join(); }
}
public void run() {
long i = ITERATIONS + 1;
while (0 != --i) { longs[arrayIndex].set(i); }
}
}
AtomicLong without padding
public final class FalseSharing implements Runnable {
private static AtomicLong[] longs = new AtomicLong[NUM_THREADS];
static {
for (int i = 0; i < longs.length; i++) {
longs[i] = new AtomicLong();
}
}
...
}
AtomicLong with padding
public final class FalseSharing implements Runnable {
private static PaddedAtomicLong[] longs = new PaddedAtomicLong[NUM_THREADS];
static {
for (int i = 0; i < longs.length; i++) {
longs[i] = new PaddedAtomicLong();
}
}
...
public static long sumPaddingToPreventOptimisation(final int index) {
PaddedAtomicLong v = longs[index];
return v.p1 + v.p2 + v.p3 + v.p4 + v.p5 + v.p6;
}
public static class PaddedAtomicLong extends AtomicLong {
public volatile long p1, p2, p3, p4, p5, p6 = 7L;
}
}
http://mechanical-sympathy.blogspot.de/2011/07/false-sharing.html
public static Unsafe getUnsafe() {
Class cc = sun.reflect.Reflection.getCallerClass(2);
if (cc.getClassLoader() != null)
throw new SecurityException("Unsafe");
return theUnsafe;
}
https://www.flickr.com/photos/oldpatterns/6100020538
Can do a lot of things faster then abstractions because of eliminate range checks and allow to access stuff directly via memory addresses etc. |
Caution: @lagergren may hunt you down ;)
http://zeroturnaround.com/rebellabs/dangerous-code-how-to-be-unsafe-with-java-classes-objects-in-memory/
SPSC ⇒ Single Producer Single Consumer |
SPMC ⇒ Single Producer Multi Consumer |
MPSC ⇒ Multi Producer Single Consumer |
MPMC ⇒ Multi Producer Multi Consumer |
Usage pattern is imprtant! |
All concurrent queues in the JDK are MPMC |
https://www.flickr.com/photos/kirisryche/4350193154
JAQ-InABox provides some nice alternatives. Say thanks to @nitsanw. |
Cheap volatile writes, we all love cheap when it comes to performance! |
https://www.flickr.com/photos/vmabney/9431857312
Only works for Single Writers a.k.a Single Producers. More details on Atomic*.lazySet is a performance win for single writers post by @nitsanw. |
Slow nextIdx(…) method based on modulo operation
private final Object[] array = new Object[128];
private int nextIdx(int index) {
return (index + 1) % array.length;
}
Fast nextIdx(…) method based on bitwise operation
private final Object[] array = new Object[128];
private final int mask = array.length - 1;
private int nextIdx(int index) {
return (index + 1) & mask;
}
Bitwise operations are most of the times faster then Arithmetic. |
Only works for Single Writers a.k.a Single Producers. More details on Atomic*.lazySet is a performance win for single writers post by @nitsanw. |
https://www.flickr.com/photos/dolmansaxlil/5347064183
Slides generated with Asciidoctor and DZSlides backend |
Original slide template - Dan Allen & Sarah White |
All pictures licensed with Creative Commons Attribution orCreative Commons Attribution-Share Alike |