Netty - Best Practices a.k.a Faster == Better

Agenda

HTTP / Pipelining
Writing gracefully
Buffers best-practises
EventLoop

No Pipelining Optimization

public class HttpHandler extends SimpleChannelInboundHandler<HttpRequest> {
  @Override
  public void channelRead(ChannelHandlerContext ctx, HttpRequest req) {
    ChannelFuture future = ctx.writeAndFlush(createResponse(req)); (1)
    if (!isKeepAlive(req)) {
      future.addListener(ChannelFutureListener.CLOSE); (2)
    }
  }
}

1	Write to the Channel and flush out to the Socket.
2	After written close Socket

Pipelining to safe syscalls!

public class HttpPipeliningHandler extends SimpleChannelInboundHandler<HttpRequest> {
  @Override
  public void channelRead(ChannelHandlerContext ctx, HttpRequest req) {
    ChannelFuture future = ctx.write(createResponse(req)); (1)
    if (!isKeepAlive(req)) {
      future.addListener(ChannelFutureListener.CLOSE); (2)
    }
  }
  @Override
  public void channelReadComplete(ChannelHandlerContext ctx) {
    ctx.flush(); (3)
  }
}

1	Write to the `Channel` (No syscall!) but don’t flush yet
2	Close socket when done writing
3	Flush out to the socket.

Validate headers or not ?

Validate headers for US-ASCII

ChannelPipeline pipeline = channel.pipeline();
pipeline.addLast(new HttpRequestDecoder(4096, 8192, 8192));

HttpResponse response = new DefaultHttpResponse(HttpVersion.HTTP_1_1, HttpStatus.OK);

Not validate headers for US-ASCII

ChannelPipeline pipeline = channel.pipeline();
pipeline.addLast(new HttpRequestDecoder(4096, 8192, 8192, false));

HttpResponse response = new DefaultHttpResponse(HttpVersion.HTTP_1_1, HttpStatus.OK, false);

Validation takes time and most of the times it is not needed directly in the decoder/encoder.

Static header names and values via HttpHeaders.newEntity(…).

private static final CharSequence X_HEADER_NAME = HttpHeaders.newEntity("X-Header"); (1)
private static final CharSequence X_VALUE = HttpHeaders.newEntity("Value");

pipeline.addLast(new SimpleChannelInboundHandler<FullHttpRequest>() {
  @Override
  public void channelRead(ChannelHandlerContext ctx, FullHttpRequest req) {
    FullHttpResponse response = new FullHttpResponse(HTTP_1_1, OK);
    response.headers().set(X_HEADER_NAME, X_VALUE); (2)
    ...
    ctx.writeAndFlush(response);
  }
});

1	Create `CharSequence` for often used header names and values.
2	Add to `HttpHeader` of `FullHttpResponse`

Created CharSequence is faster to encode and faster to find in HttpHeaders.

write(msg) , flush() and writeAndFlush(msg)

write(msg) ⇒ pass through pipeline

flush() ⇒ gathering write of previous written msgs

writeAndFlush() ⇒ short-cut for write(msg) and flush()

Limit flushes as much as possible as syscalls are quite expensive.

But also limit write(...) as much as possible as it need to traverse the whole pipeline.

May write too fast!

public class BlindlyWriteHandler extends ChannelInboundHandlerAdapter {
  @Override
  public void channelActive(ChannelHandlerContext ctx) throws Exception {
    while(needsToWrite) { (1)
        ctx.writeAndFlush(createMessage());
    }
  }
}

1	Writes till `needsToWrite` returns `false`.

Risk of OutOfMemoryError if writing too fast and having slow receiver!

Correctly write with respect to slow receivers

public class GracefulWriteHandler extends ChannelInboundHandlerAdapter {
  @Override
  public void channelActive(ChannelHandlerContext ctx) {
    writeIfPossible(ctx.channel());
  }
  @Override
  public void channelWritabilityChanged(ChannelHandlerContext ctx) {
    writeIfPossible(ctx.channel());
  }

  private void writeIfPossible(Channel channel) {
    while(needsToWrite && channel.isWritable()) { (1)
      channel.writeAndFlush(createMessage());
    }
  }
}

1	Make proper use of `Channel.isWritable()` to prevent OutOfMemoryError

Configure high and low write watermarks

Set sane WRITE_BUFFER_HIGH_WATER_MARK and WRITE_BUFFER_LOW_WATER_MARK

Server

ServerBootstrap bootstrap = new ServerBootstrap();
bootstrap.childOption(ChannelOption.WRITE_BUFFER_HIGH_WATER_MARK, 32 * 1024);
bootstrap.childOption(ChannelOption.WRITE_BUFFER_LOW_WATER_MARK, 8 * 1024);

Client

Bootstrap bootstrap = new Bootstrap();
bootstrap.option(ChannelOption.WRITE_BUFFER_HIGH_WATER_MARK, 32 * 1024);
bootstrap.option(ChannelOption.WRITE_BUFFER_LOW_WATER_MARK, 8 * 1024);

Issues with using non pooled buffers

Use unpooled buffers with caution!

Allocation / Deallocation is slow
Free up direct buffers == PITA!

Use pooled buffers!

Bootstrap bootstrap = new Bootstrap();
bootstrap.option(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT);
ServerBootstrap bootstrap = new ServerBootstrap();
bootstrap.childOption(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT);

Use Pooling of buffers to reduce allocation / deallocation time!

Pooling pays off for direct and heap buffers!

https://blog.twitter.com/2013/netty-4-at-twitter-reduced-gc-overhead

Always use direct ByteBuffer when writing to SocketChannel

OpenJDK and Oracle JDK copy otherwise to direct buffer by itself!

Only use heap buffers if need to operate on byte[]` in ChannelOutboundHandler! By default direct ByteBuf` will be returned by ByteBufAllocator.buffer(...).

Take this as rule of thumb

Find pattern in ByteBuf

SlowSearch :(

ByteBuf buf = ...;
int index = -1;
for (int i = buf.readerIndex(); index == -1 && i <  buf.writerIndex(); i++) {
  if (buf.getByte(i) == '\n') {
    index = i;
  }
}

FastSearch :)

ByteBuf buf = ...;
int index = buf.forEachByte(new ByteBufProcessor() {
  @Override
  public boolean process(byte value) {
    return value != '\n';
  }
});

Messages with Payload? Yes please…

ByteBuf payload ⇒ extend DefaultByteBufHolder

reference-counting for free
release resources out-of-the-box

http://www.flickr.com/photos/za3tooor/65911648/

File transfer ?

Use zero-memory-copy for efficient transfer of raw file content

Channel channel = ...;
FileChannel fc = ...;
channel.writeAndFlush(new DefaultFileRegion(fc, 0, fileLength));

This only works if you don’t need to modify the data on the fly. If so use ChunkedWriteHandler and NioChunkedFile.

Never block the EventLoop!

Thread.sleep()

CountDownLatch.await() or any other blocking operation from java.util.concurrent

Long-lived computationally intensive operations

Blocking operations that might take a while (e.g. DB query)

http://www.flickr.com/photos/za3tooor/65911648/

Re-use EventLoopGroup if you can!

Bootstrap bootstrap = new Bootstrap().group(new NioEventLoopGroup());
Bootstrap bootstrap2 = new Bootstrap().group(new NioEventLoopGroup());

Share EventLoopGroup between different Bootstraps

EventLoopGroup group = new NioEventLoopGroup();
Bootstrap bootstrap = new Bootstrap().group(group);
Bootstrap bootstrap2 = new Bootstrap().group(group);

Sharing the same EventLoopGroup allows to keep the resource usage (like Thread-usage) to a minimum.

Proxy like application with context-switching issue

public class ProxyHandler extends ChannelInboundHandlerAdapter {
  @Override
  public void channelActive(ChannelHandlerContext ctx) { (1)
    final Channel inboundChannel = ctx.channel();
    Bootstrap b = new Bootstrap();
    b.group(new NioEventLooopGroup()); (2)
    ...
    ChannelFuture f = b.connect(remoteHost, remotePort);
    ...
  }
}

1	Called once a new connection was accepted
2	Use a new `EventLoopGroup` instance to handle the connection to the remote peer

Don’t do this! This will tie up more resources than needed and introduce extra context-switching overhead.

Proxy like application which reduce context-switching to minimum

public class ProxyHandler extends ChannelInboundHandlerAdapter {
  @Override
  public void channelActive(ChannelHandlerContext ctx) { (1)
    final Channel inboundChannel = ctx.channel();
    Bootstrap b = new Bootstrap();
    b.group(inboundChannel.eventLoop()); (2)
    ...
    ChannelFuture f = b.connect(remoteHost, remotePort);
    ...
  }
}

1	Called once a new connection was accepted
2	Share the same `EventLoop` between both Channels. This means all IO for both connected Channels are handled by the same Thread.

Always share EventLoop in those Applications

Operations from inside ChannelHandler

public class YourHandler extends ChannelInboundHandlerAdapter {
  @Override
  public void channelActive(ChannelHandlerContext ctx) {
    // BAD (most of the times)
    ctx.channel().writeAndFlush(msg); (1)

    // GOOD
    ctx.writeAndFlush(msg); (2)
   }
}

1	`Channel.*` methods ⇒ the operation will start at the tail of the `ChannelPipeline`
2	ChannelHandlerContext.* methods => the operation will start from this `ChannelHandler to flow through the `ChannelPipeline`.

Use the shortest path as possible to get the maximal performance.

Share ChannelHandlers if stateless

@ChannelHandler.Shareable (1)
public class StatelessHandler extends ChannelInboundHandlerAdapter {
  @Override
  public void channelActive(ChannelHandlerContext ctx) {
    logger.debug("Now client from " + ctx.channel().remoteAddress().toString());
   }
}

public class MyInitializer extends ChannelInitializer<Channel> {
  private static final ChannelHandler INSTANCE = new StatelessHandler();
  @Override
  public void initChannel(Channel ch) {
    ch.pipeline().addLast(INSTANCE);
  }
}

1	Annotate `ChannelHandler` that are stateless with `@ChannelHandler.Shareable` and use the same instance accross Channels to reduce GC.

Remove ChannelHandler once not needed anymore

public class OneTimeHandler extends ChannelInboundHandlerAdapter {
  @Override
  public void channelActive(ChannelHandlerContext ctx) {
    doOneTimeAction();
    ctx.channel().pipeline().remove(this); (1)
   }
}

1	Remove `ChannelHandler` once not needed anymore.

This keeps the ChannelPipeline as short as possible and so eliminate overhead of traversing as much as possible.

Pass custom events through `ChannelPipeline`

Your custom events

public enum CustomEvents {
  MyCustomEvent
}

public class CustomEventHandler extends ChannelInboundHandlerAdapter {
  @Override
  public void userEventTriggered(ChannelHandlerContext ctx, Object evt) {
    if (evt == MyCustomEvent) { // do something}
  }
}

ChannelPipeline pipeline = channel.pipeline();
pipeline.fireUserEventTriggered(MyCustomEvent);

Good fit for handshake notifications and more

Use proper buffer type in MessageToByteEncoder

public class EncodeActsOnByteArray extends MessageToByteEncoder<YourMessage> {
  public EncodeActsOnByteArray() { super(false); } (1)
  @Override
  public encode(ChannelHandlerContext ctx, YourMessage msg, ByteBuf out) {
    byte[] array = out.array(); (2)
    int offset = out.arrayOffset() + out.writerIndex();
    out.writeIndex(out.writerIndex() + encode(msg, array, offset)); (3)
  }
  private int encode(YourMessage msg, byte[] array, int offset, int len) { ... }
}

1	Ensure heap buffers are used when pass into `encode(...)` method. This way you can access the backing array directly
2	Access the backing array and also calculate offset
3	Update writerIndex to reflect written bytes

This saves extra byte copies.

To auto-read or not to auto-read

By default Netty will keep on reading data from the Channel once something is ready.

Need more fine grained control ?

channel.config().setAutoRead(false); (1)
channel.read(); (2)
channel.config().setAutoRead(true); (3)

1	Disable auto read == no more data will be read automatically from this `Channel`.
2	Tell the `Channel` to do one read operation once new data is ready
3	Enable again auto read == Netty will automatically read again

This can also be quite useful when writing proxy like applications!

Native stuff in Netty 4

OpenSSL based SslEngine to reduce memory usage and latency.

Native transport for Linux using Epoll ET for more performance and less CPU usage.

Native transport also supports SO_REUSEPORT and TCP_CORK :)

Switching to native transport is easy

Using NIO transport

Bootstrap bootstrap = new Bootstrap().group(new NioEventLoopGroup());
bootstrap.channel(NioSocketChannel.class);

Using native transport

Bootstrap bootstrap = new Bootstrap().group(new EpollEventLoopGroup());
bootstrap.channel(EpollSocketChannel.class);

Want to know more?

Buy my book Netty in Action and make me RICH.

http://www.manning.com/maurer

$ KA-CHING $

References

Netty - http://netty.io

Slides generated with Asciidoctor and DZSlides backend

Original slide template - Dan Allen & Sarah White

All pictures licensed with Creative Commons Attribution or
Creative Commons Attribution-Share Alike

Netty - Best Practices a.k.a Faster == Better

Agenda

No Pipelining Optimization

Pipelining to safe syscalls!

Validate headers or not ?

Static header names and values via HttpHeaders.newEntity(…).

write(msg) , flush() and writeAndFlush(msg)

May write too fast!

Correctly write with respect to slow receivers

Configure high and low write watermarks

Issues with using non pooled buffers

Use Pooling of buffers to reduce allocation / deallocation time!

Always use direct ByteBuffer when writing to SocketChannel

Find pattern in ByteBuf

Messages with Payload? Yes please…

File transfer ?

Never block the EventLoop!

Re-use EventLoopGroup if you can!

Proxy like application with context-switching issue

Proxy like application which reduce context-switching to minimum

Operations from inside ChannelHandler

Share ChannelHandlers if stateless

Remove ChannelHandler once not needed anymore

Pass custom events through ChannelPipeline

Use proper buffer type in MessageToByteEncoder

To auto-read or not to auto-read

Native stuff in Netty 4

Switching to native transport is easy

Want to know more?

References

Norman Maurer

Pass custom events through `ChannelPipeline`