High performance networking on the JVM

Lessons Learned

Norman Maurer

  • Red Hat (JBoss) - EAP Core Team
  • Former contractor for Apple Inc
  • Author - Netty in Action
  • Apache Software Foundation Member
  • Netty  / Vert.x Core-Developer and all things NIO
  • Java and Scala
  • Twitter: @normanmaurer
  • Github: https://github.com/normanmaurer
  • General

    • As always, only optimize if you really need!
    • 1000 concurrent connections != high-scale
    • If you only need to handle a few hundred connections use Blocking IO!
    • Make use of a profiler to find issues, and not best-guess...
    • Always test before and after changes, and not forget to warmup!

    What you want


    For high-performance with many concurrent connections you WANT to use NIO or NIO.2

    What you NOT want

    Create one Thread per connection and let the OS try to deal with thousands of threads



    If you want.... good luck ;)

    Socket Options General

    • Some socket options can have great impact
    • This is true for bad and good impact
    • Only touch them if you know what they do

    Most interesting Socket Options

    • TCP_NO_DELAY
    • SO_SNDBUF
    • SO_RCVBUF

    GC - Pressure

    Allocate / Deallocate the shit out of it!

    Solve GC-Pressure 

    • Try to minimize allocation / deallocation of objects
    • Use static instances where ever possible
    • Ask yourself do I really need to create the instance
    • BUT, only cache/pool where it makes sense as long-living objects may have bad impact on GC  as well


    Rule of thumb: use static if it's immutable and used often. If its mutable only pool / cache if allocation costs are high!

    GC-Pressure

    Every time I hear allocation / deallocation of objects is a no-brainer a kitten dies!

    Gc - Pressure

    But I never had GC-Pressure ....




    Well, you not pushed your system hard enough!

    Source of GC-Pressure in Action

     https://github.com/netty/netty/issues/973

    BAD
    channelIdle(ctx, new IdleStateEvent(IdleState.READER_IDLE, 
                                readerIdleCount ++, currentTime - lastReadTime)); 
                            


    BETTER!
    channelIdle(ctx, IdleStateEvent.READER_IDLE_EVENT);

    Garbage-Collector matters

    • The Garbage-Collector really matters
    • Use a CMS-based collector or G1 if you want high-troughput
    • Size different areas depending on your application / access pattern

    Stop-the-world GC is your worst enemy if you want to push data hard

    Garbage Collector 

    • Tune the GC is kind of an "black art"
    • Be sure you understand what you are doing
    • GC-Tuning params are different per App

    Buffers

    • Allocate / Deallocate from direct buffers is expensive
    • Allocate/ Deallocate from heap buffers is cheap

    Free up memory of direct buffers is expensive
    Unfortunately zero-out the byte array of heap buffers is not for free too

    BufferPooling to the rescue

    • Pool buffers if you need to create a lot of them
    • This is especially true for direct buffers
    • There is also Unsafe.... but its " unsafe " ;)

    Memory fragmentation

    • Memory fragmentation is bad, as you will waste memory
    • More often GC to remove fragmentation.


    Can't insert int here as we need 4 slots!

    Gathering writes / Scattering Reads

    • Use Gathering writes / Scattering reads
    • Especially useful for protocols that can be assembled out of multiple buffers


    IMPORTANT: Gathering writes only works without memory leak since java7 and late java6.

    USe Direct buffers For Sockets

    • Use direct buffers when you do operations on sockets

    WHY ?

    Internally the JDK* will copy the buffer content to a direct buffer if you not use one

    Minimize SyscalLs


    Only call Channel.write(...) / Channel.read(...) if you really need!


    • Also true for other operations that directly hit the OS
    • Batch up things, but as always there is a tradeoff.

    MEMORY copies are NOT for Free

    ByteBuffer expose operations like slice(), duplicate() for a good reason


    USE THEM!

    Zero-Memory-COPY a.k.a FileChannel


    • Many Operation Systems support it
    • Helps to write File content to a Channel in an efficient way

    Only possible if you not need to transform the data during transfer!

    Throttle READS / Writes /Accepts

    • Otherwise you will have fun with OOM
    • interestedOps(..) update to the rescue!
    • This will push the "burden" to the network stack

    But not call interestedOps(...) too often, its expensive!


    https://github.com/netty/netty/issues/1024

    Don't register for OP_WRITE

    • Don't register for OP_WRITE on the Selector by default
    • Only do if you could not write the complete buffer
    • Remove OP_WRITE from interestedOps() after you was able to write

    Remember most of the times the Channel is writable!

    Don't Block!

    • Don't block the Thread that handles the IO
    • You may be surprised what can block

    I look at you DNS resolution!


    If you really need to block move it to an extra ThreadPool

    Blocking in Action




    RED != Good!

    selectionKey Operations



    SelectionKey.interestedOps(....);

    This method may be invoked at any time. Whether or not it blocks, and for how long is implementation-dependent

    Optimize 


    BAD
    public void suspendRead() {
        key.interestOps(key.interestOps() & ~OP_READ);
    }



    BETTER!
    public void suspendRead() {
        int ops = key.interestOps();
        if ((ops & OP_READ) != 0) {
            key.interestOps(ops & ~OP_READ);
        }
    }

    BE Memory efficient 

    When write a System that handles 100k of concurrent connections every saved memory count for long-living objects

    Memory Efficient Atomic

    • AtomicReference  => AtomicReferenceFieldUpdater
    • AtomicBoolean => AtomicIntegerFieldUpdater
    • AtomicLong => AtomicIntegerFieldUpdater
    • AtomicInteger=> AtomicIntegerFieldUpdater

    It's ugly, but sometimes you just have to do it!

    Datastructures matter

    • Think about what data-structure fits best
    • Linked vs. Array based
    • Access pattern ?!?

    volatile

    • Volatile reads are cheap.... But still not for free
    • Cache volatile variables if possible to minimize access

    Optimize

    BAD
    private volatile Selector selector;
    
        public void method()  .... {
            selector.select();
            ....
        }


    BETTER

     private volatile Selector selector;
    
        public void method()  .... {
            Selector selector = this.selector;
            selector.select();
            ....
        }

    Minimize stackdepth

    • Deep stacks are our enemies, because they are expensive
    • Use tail-recursive calls if possible 

    WHY?


    Everything that needs to be stored till the call is done needs memory...

    Use JDK7 if possible


    Allocation / Deallocation of ByteBuffers is a lot faster now...


    • Also has some other goodies like: NIO.2 , UDP Multicast, SCTP

    Well defined Thread-Model


    • It makes development easier
    • Reduce context-switching
    • Reduce the need for synchronization in many cases

    Choose the correct Protocol

    • UDP
    • TCP
    • SCTP
    • UDT

    It's always a trade-off!

    Pipelining is Awesome

    • Allow to send / receive more then one message before response
    • This minimize send / receive operations 
    • Popular protocols which support Pipelining: HTTP, SMTP, IMAP


    If you write your own protocol think about Pipelining! 

    Don't want to Hassle

    There are a few frameworks to rescue....
    • Netty
    • Vert.x
    • Xnio
    • Grizzly
    • Apache Mina

    Want to Learn More

    Attend my talk about Netty 4 tomorrow ;)

    Questions?

    Thanks