High Performance Networking on the JVM - Lessons learned

Norman Maurer, Principal Software Engineer @ Red Hat Inc

  • Leading Netty effort at Red Hat
  • Vert.x / NIO / Performance
  • Author of Netty in Action
  • @normanmaurer
  • github.com/normanmaurer

General

Blocking-IO ?

For high-performance with many concurrent connections you WANT to use NIO or NIO.2!
lazy
https://www.flickr.com/photos/theleetgeeks/3110958031

What is the issue with blocking-IO ?

Socket options general…

Often most interesting socket options

There are others but some are not exposed by java nio itself.

GC-Pressure - Run collector, run…

gc pressure
http://25.media.tumblr.com/tumblr_me2eq0PnBx1rtu0cpo1_1280.jpg
Every time I hear allocation / deallocation of objects is a no-brainer a kitten dies!

Solve (partial) GC-Pressure

But only cache/pool where it really makes sense as long-living objects may be bad either in terms of GC
Rule of thumb: use static if it’s immutable and used often. If its mutable only pool / cache if allocation costs are high!

GC-Pressure - For real ?

If you never saw GC-Pressure you not pushed your system hard enough


Famous words of myself

Source of GC-Pressure

Bad
channelIdle(ctx, new IdleStateEvent(
    IdleState.READER_IDLE, readerIdleCount ++, currentTime - lastReadTime));
Good
channelIdle(ctx, IdleStateEvent.READER_IDLE_EVENT);
See Netty issue #973 for more details.

Garbage-Collector matters

Stop the world == worst enemy!

Buffers - allocate and deallocate

But zero-out byte[] of heap buffers is not for free either…

Buffers - Memory fragmentation

memory
Can't insert int here as we need 4 slots :(
A good pool will handle this!

Pool buffers for rescue…

Pooling pays off for direct and heap buffers!
pooled buffers
https://blog.twitter.com/2013/netty-4-at-twitter-reduced-gc-overhead

Read / Write the right way

gathering scattering
Especially useful if "message" is assembled out of header and payload.

Direct buffers for sockets

Use direct buffers for operations on SocketChannel

Why?


Confused developer
Internally OpenJDK/Oracle JDK will copy the buffer to a direct buffer if you use a heap buffer

Syscalls - Huh why should I care?

Syscalls are expensive, use them with care.
Many methods on the SocketChannel map directly to a syscall.
syscall
https://www.flickr.com/photos/theshadowknows/2995004692

Memory copies - Not free either

Use ByteBuffer.slice() and ByteBuffer.duplicate() whenever possible.
copy
https://www.flickr.com/photos/pasukaru76/4350792315

Zero-Memory-Copy - FileChannel

Only possible if you not need to transform the data during transfer!

Back-pressure - otherwise fun with OOM!

interestedOps(…) == queue on network stack

back_pressure

http://memecrunch.com/meme/270NI/sparta-bird/image.png
But not call interestedOps(…) too often, it’s expensive. See #1024

OP_WRITE

Remember most of the times the Channel is writable.

Don’t block the IO Thread

DNS lookup will block :(
InetSocketAddress remote = ...
remote.getAddress().getHostname();
Logging can be a culprint too … Async logging may be the key.

Blocking in action

blocking

RED == BAD

SelectionKey

… Whether or not it blocks, and for how long is implementation-dependent…


Javadocs of SelectionKey

SelectionKey - Huh ?

homerfacepalm

Shit just got real...

SelectionKey usage optimized

Bad
public void suspendRead() {
  if ((ops & ey.interestOps()) != 0) {
    key.interestOps(key.interestOps() & ~OP_READ);
  }
}
Good
int ops = key.interestOps();
if ((ops & OP_READ) != 0) {
  key.interestOps(ops & ~OP_READ);
}

Be memory efficient

When write a System that handles 100k of concurrent connections every saved memory count for long-living objects


Hint of myself

Atomic*FieldUpdater helps

Ugly but helps
private static final AtomicLongFieldUpdater<TheDeclaringClass> ATOMIC_UPDATER =
        AtomicLongFieldUpdater.newUpdater(TheDeclaringClass.class, "atomic");

private volatile long atomic;

public void yourMethod() {
    ATOMIC_UPDATER.compareAndSet(this, 0, 1);
}
Some more details on the topic on my blog post Lesser-known-concurrent-classes-Part-1

Datastructures / algorithms matter

Volatile

Volatile access - Optimized

Bad
private volatile Selector selector;

public void method() {
  selector.select();
  ....
  selector.selectNow();
}
Good
private volatile Selector selector;

public void method() {
  Selector selector = this.selector;
  selector.select();
  ....
  selector.selectNow();
}

Use Java7 or newer

Allocation / Deallocation of ByteBuffers is a lot faster these days

Well defined Thread-Model

Choose correct protocol

There is always a trade-off.

Pipelining is awesome

If you write your custom protocol think about Pipelining.

Prefer Binary protocol over Text protocol

binary

https://www.flickr.com/photos/chiselwright/5169469959/

Too hard ? There are abstractions.

Want to know more about performance?

Attend my other talk later today!

win

http://memegenerator.net/instance/43005548....

[.topic.source]
== References

NOTE: Slides generated with Asciidoctor and DZSlides backend

NOTE: Original slide template - Dan Allen & Sarah White

NOTE: All pictures licensed with `Creative Commons Attribution` or +
`Creative Commons Attribution-Share Alike`

[.topic.ending, hrole="name"]
== Norman Maurer

[.footer]
[icon-twitter]'{zwsp}' @normanmaurer