Jumbo Frames and Nagle’s Algorithm (RFC 896)

I really do not like increasing the MTU that large..Some reasons below…

There are some areas to be careful with jumbo frames. Remember there is the Nagle algorithm (RFC 896) where basically (among all other things) a unit is not transferred from one peer to another until the unit is (almost) full or 200-500ms (or more in case) has passed.

Nagle’s algorithm is also applicable to jumbo frames (must be implemented by the network hw – where they must be ok with jumbo frames – better to check). That is almost a standard for congestion control. The idea is not to send many many large but half filled units. If we are doing something like:

write()
write()
read()

where 1 and 2 is related to each other and other end did not reply yet. Then if MTU is 9000 bytes, the TCP MSS increases to ~9000-40 bytes which becomes (generally) the new value against which the Nagle algorithm checks. If the write were > 1460 bytes but less than 8960 bytes it would become delayed by Nagle. So the application should be aware of the jumbo frames and either use it carefully, or disable the Nagle algorithm.

This behaviour can be disabled by the TCP_NODELAY socket option but that would need to be done at the application level.

In case of unconditional timeout or performance problems, to diagnose (at the Linux side) we can get tcpdumps on the interconnects and be able to see what is really going on. Also one needs to check whether the network equipment (switch, nic) are suitable for jumbo frames. We can also do traceroute tests to see what is going wrong..

Related Posts

Leave a Reply Cancel reply