Linked by Thom Holwerda on Thu 22nd Jan 2015 23:04 UTC
Linux

100Gb network adapters are coming, said Jesper Brouer in his talk at the LCA 2015 kernel miniconference. Driving such adapters at their full wire speed is going to be a significant challenge for the Linux kernel; meeting that challenge is the subject of his current and future work. The good news is that Linux networking has gotten quite a bit faster as a result - even if there are still problems to be solved.

Order by: Score:
Very Interesting
by Alfman on Fri 23rd Jan 2015 01:00 UTC
Alfman
Member since:
2011-01-28

I agree that work needs to be done to reduce linux kernel overhead for high interrupt workloads. However I'm wondering if now wouldn't be a good time to officially push to standardize on jumbo packets throughout the internet. 1500 byte packets have been holding us back for a while now in an ethernet standard that was set decades ago, and it creates lots of unnecessary load on modern CPUs, routers, and switching equipment.

Consider a 100MB video call or data stream that takes about 69k 1500B packets to transmit after factoring in packet overhead. This same 100MB stream would require only 11k 9000B jumbo packets. That's 84% of the switching/interrupt load that is freed on each and every router in the path just by switching to jumbo packets. Even simple webpages these days are significantly larger than 1500bytes and would see tremendous benefit.

Edited 2015-01-23 01:02 UTC

Reply Score: 7

RE: Very Interesting
by tony on Fri 23rd Jan 2015 02:40 UTC in reply to "Very Interesting"
tony Member since:
2005-07-06

I agree that work needs to be done to reduce linux kernel overhead for high interrupt workloads. However I'm wondering if now wouldn't be a good time to officially push to standardize on jumbo packets throughout the internet. 1500 byte packets have been holding us back for a while now in an ethernet standard that was set decades ago, and it creates lots of unnecessary load on modern CPUs, routers, and switching equipment.

Consider a 100MB video call or data stream that takes about 69k 1500B packets to transmit after factoring in packet overhead. This same 100MB stream would require only 11k 9000B jumbo packets. That's 84% of the switching/interrupt load that is freed on each and every router in the path just by switching to jumbo packets. Even simple webpages these days are significantly larger than 1500bytes and would see tremendous benefit.


Routers and switches aren't generally affected by small frame sizes, as they don't experience "interrupts" like end hosts do. Even at the very smallest packets sizes, most data center switches and Internet routers can handle line rate forwarding. CPU-based routers do have interrupts, but they're generally used for things like branch office and don't experience nearly the high bandwidth that would benefit from larger frame sizes.

End-host interfaces that interact with the Internet are pretty much stuck at 1500 bytes, as there's too many paths between point A and point B that would squeeze the MTU back down to 1500 (and lots of networks break pMTU).

So that leaves back end connections like iSCSI, NFS, backups, etc. These are network that are controlled and known, so MTU of the end points isn't a problem. On modern, multi-core CPUs and NICs with checksum offloading, vMDQs, etc., there hasn't been much of an advantage to moving to jumbo frames. A lot of places don't even bother any more.

That could change with the migration to 25 Gbit, 40 Gbit, 50 Gbit, and 100 Gbit networks. But for a while at least, we're still going to be stuck with 1500 MTU for Internet-facing end-host interfaces.

Edited 2015-01-23 02:41 UTC

Reply Score: 5

RE[2]: Very Interesting
by Alfman on Fri 23rd Jan 2015 06:52 UTC in reply to "RE: Very Interesting"
Alfman Member since:
2011-01-28

tony,

Routers and switches aren't generally affected by small frame sizes, as they don't experience "interrupts" like end hosts do. Even at the very smallest packets sizes, most data center switches and Internet routers can handle line rate forwarding.


That's not really accurate though. The routing table has to be consulted for every packet, which means it has a lot more work to do when packets are small.

At 1Gbps and 55 byte packets, a router would have to do 2.3M lookups per second. At 10Gbps, a router would have to do 22.7M lookups per second. At 100Gbps, a router would have to do 227.3M lookups per second.

A brand new Cisco 7600 router, which will run you about ~$5K can handle 16 Gb/s and 6.5 Mp/s of throughput per module. So without dividing the traffic among additional units, it would not even be able to handle a 10Gb/s stream full of small packets today, much less the 16Gb/s stream it's rated for. Bandwidth isn't the bottleneck, it's packet routing speed. The bigger the packets, the more bandwidth a router can handle. By upping packet size to 9KB, this router could theoretically handle up to 468Gbps without touching it's route lookup speed.

So while we could make routers that could handle arbitrarily small packets by over-provisioning the packet rate, that comes at exponentially greater cost. Conversely if increasing packet size can save costs, then maybe that's what we should be doing.


So that leaves back end connections like iSCSI, NFS, backups, etc. These are network that are controlled and known, so MTU of the end points isn't a problem. On modern, multi-core CPUs and NICs with checksum offloading, vMDQs, etc., there hasn't been much of an advantage to moving to jumbo frames. A lot of places don't even bother any more.


I have to disagree. Firewalls, NAS drives, tablets, desktops, etc, these are all negatively affected by having to reassemble data fragments <1500B at a time. Yes we can add more transistors to offload this overhead, but if jumbo packets can actually help eliminate the overhead without throwing more processing power at the problem, then it's a good thing IMHO.

Edited 2015-01-23 06:55 UTC

Reply Score: 5

RE[3]: Very Interesting
by tony on Fri 23rd Jan 2015 07:49 UTC in reply to "RE[2]: Very Interesting"
tony Member since:
2005-07-06

tony,

"Routers and switches aren't generally affected by small frame sizes, as they don't experience "interrupts" like end hosts do. Even at the very smallest packets sizes, most data center switches and Internet routers can handle line rate forwarding.


That's not really accurate though. The routing table has to be consulted for every packet, which means it has a lot more work to do when packets are small.



At 1Gbps and 55 byte packets, a router would have to do 2.3M lookups per second. At 10Gbps, a router would have to do 22.7M lookups per second. At 100Gbps, a router would have to do 227.3M lookups per second.
"

Actually it is accurate. Routers/switches tend to use CAM/TCAMs or something similar. (T)CAM is a type of memory that is special because it can do a lookup of a destination MAC, IP, or network prefix in a single clock cycle, no matter how big the forwarding table (Forwarding Information Base, FIB) is. Most TCAM ASICs are built so they can sustain forwarding even at the smallest packet sizes.

TCAMs are great for that, but they're expensive and power hungry, which is why they're only used in specialized networking equipment and typically only have a couple hundred thousand entries (instead of like, 2 GBytes, which could hold a lot more entries).

http://en.wikipedia.org/wiki/Content-addressable_memory


A brand new Cisco 7600 router, which will run you about ~$5K can handle 16 Gb/s and 6.5 Mp/s of throughput per module. So without dividing the traffic among additional units, it would not even be able to handle a 10Gb/s stream full of small packets today, much less the 16Gb/s stream it's rated for. Bandwidth isn't the bottleneck, it's packet routing speed. The bigger the packets, the more bandwidth a router can handle. By upping packet size to 9KB, this router could theoretically handle up to 468Gbps without touching it's route lookup speed.

So while we could make routers that could handle arbitrarily small packets by over-provisioning the packet rate, that comes at exponentially greater cost. Conversely if increasing packet size can save costs, then maybe that's what we should be doing.


Hardware routers and switches use distributed forwarding in the various line cards. The supervisor module is responsible for learning routes and Layer 2 adjacencies via various protocols (OSPF, BGP, ARP) and then building a RIB (routing information base). The RIB is then compiled into a FIB (forwarding information base) that is downloaded into the line cards. For efficiency, a forwarding entry for a given destination IP, MAC, or network prefix is only installed on line cards that are connected to those particular networks.

The line cards then forward packets at line rate because every lookup checks the entire forwarding table in a single cycle, again because of the TCAM memory.

http://www.cisco.com/c/en/us/support/docs/switches/catalyst-6500-se...

So that leaves back end connections like iSCSI, NFS, backups, etc. These are network that are controlled and known, so MTU of the end points isn't a problem. On modern, multi-core CPUs and NICs with checksum offloading, vMDQs, etc., there hasn't been much of an advantage to moving to jumbo frames. A lot of places don't even bother any more.

I have to disagree. Firewalls, NAS drives, tablets, desktops, etc, these are all negatively affected by having to reassemble data fragments


Exactly, which is why MTU is kept at 1500. Anything higher is either dropped or fragmented.

Reply Score: 5

RE[4]: Very Interesting
by Alfman on Fri 23rd Jan 2015 09:00 UTC in reply to "RE[3]: Very Interesting"
Alfman Member since:
2011-01-28

tony,

Actually it is accurate.


You are still overlooking the actual specs of these routers.

Here's a 20Gbps model, it does 30mpps.
http://andovercg.com/datasheets/cisco-20G-ethernet-services-cards-7...

If you think it can saturate it's link with 55byte packets, then you'd be wrong. 30mpps * 55B = 13.2Gbps. 13.2Gbps < 20Gbps Q.E.D.

So this statement was not really accurate: "Routers and switches aren't generally affected by small frame sizes".


Not that it really matters because because everything that's bandwidth intensive should be using larger packets anyways. That's the point, larger packets can dramatically decrease the load on these routers. Or put another way, larger packets let us multiply the bandwidth per core. So larger packets are an easy way to scale bandwidth without increasing a router's processing power.


Exactly, which is why MTU is kept at 1500. Anything higher is either dropped or fragmented.


That's precisely what needs to be fixed in order to support jumbo packets. It seems logical to deploy jumbo frames at the same time as IPv6. IPv6 will takes up more overhead and so do other tunnels like VPNs or GRE. Without a bump in packet size we're actually loosing room for the actual payload over time.

Edited 2015-01-23 09:19 UTC

Reply Score: 3

RE[5]: Very Interesting
by tony on Fri 23rd Jan 2015 14:53 UTC in reply to "RE[4]: Very Interesting"
tony Member since:
2005-07-06

tony,

"Actually it is accurate.


You are still overlooking the actual specs of these routers.

Here's a 20Gbps model, it does 30mpps.
http://andovercg.com/datasheets/cisco-20G-ethernet-services-cards-7...

If you think it can saturate it's link with 55byte packets, then you'd be wrong. 30mpps * 55B = 13.2Gbps. 13.2Gbps < 20Gbps Q.E.D.

So this statement was not really accurate: "Routers and switches aren't generally affected by small frame sizes".

"

Fair point, but at 1500 bytes, that card can forward full line rate. It only runs into trouble at about the 650 packet size, assuming my math is correct. Jumbo frames wouldn't help on that hardware. There's no forwarding performance benefit to that router by moving to frames larger than 1500 bytes. And that router is fairly old technology as well.

[/q]


Not that it really matters because because everything that's bandwidth intensive should be using larger packets anyways. That's the point, larger packets can dramatically decrease the load on these routers. Or put another way, larger packets let us multiply the bandwidth per core. So larger packets are an easy way to scale bandwidth without increasing a router's processing power.


I think some of this disagreement might come with the terminology we're using here. Typically when we talk about "processing power", it's in reference to a CPU. Modern switches and routers have CPUs (control plane) of course, but the CPUs don't actually forward packets. Some smaller-end routers do of course (like a home router), but for the most part routers and switches have long moved to distributed forwarding.

Much older switches/routers did actually forward via a main CPU. Something like an older Catalyst switch with a "route processor" is an example.

Packet processing is something different, which is usually referred to as forwarding rate, and completely independent of a router/switch's CPU. At 1500 bytes, most (modern) routers and switches are typically well within their line rate for forwarding rates. Jumbo frames wouldn't help there.

Instead, jumbo frames are meant to help the end hosts, but again with the advent of checksum offloading (one of the larger CPU hits) and multi-core, jumbo frames haven't typically provided a huge benefit at 1 or 10 Gbit like it might have for single-core, non-offloading 100 mbit and 1 Gigabit systems 10 or more years ago.

The performance implications for a CPU-based forwarder (such as a Linux host) are very, very different than performance implications for a router or switch that does distributed ASIC forwarding.

Exactly, which is why MTU is kept at 1500. Anything higher is either dropped or fragmented.

" That's precisely what needs to be fixed in order to support jumbo packets. It seems logical to deploy jumbo frames at the same time as IPv6. IPv6 will takes up more overhead and so do other tunnels like VPNs or GRE. Without a bump in packet size we're actually loosing room for the actual payload over time.

"
At this point for end hosts talking to the Internet, the rates of traffic isn't really affected by the smaller MTU. Servers today are barely saturating 10 Gbit links, and often much of that is storage traffic (which can be jumbo, where as Internet facing cannot usually). My guess is a modern home system with a 300 Mbit Internet connection would have about zero benefit in terms of throughput or CPU overhead if it could communicate with the entire Internet at 9000 bytes versus the current ~1500 byte limit.

I haven't run any tests, but tests like this from Chris Wahl show zero difference (and even performance drawbacks sometimes) to jumbo frames on 1 Gbit in a server environment. http://wahlnetwork.com/2013/03/25/do-jumbo-frames-improve-vmotion-p...

So for endpoints that communicate at even slower speeds, jumbo probably isn't much help there.

That could change with speeds higher than 10 Gbit, however. But only for endpoints. Switches that are communicating

Reply Score: 3

RE[6]: Very Interesting
by Alfman on Fri 23rd Jan 2015 22:48 UTC in reply to "RE[5]: Very Interesting"
Alfman Member since:
2011-01-28

tony,

Fair point, but at 1500 bytes, that card can forward full line rate. It only runs into trouble at about the 650 packet size, assuming my math is correct. Jumbo frames wouldn't help on that hardware. There's no forwarding performance benefit to that router by moving to frames larger than 1500 bytes. And that router is fairly old technology as well.


I'm glad you are seeing my point. Going back to what I was saying before, increasing the packet size allows us to get more bandwidth for the given processing power of a router core. In other words, it should make bandwidth (at 100Gbps and beyond) much cheaper.

Packet processing is something different, which is usually referred to as forwarding rate, and completely independent of a router/switch's CPU. At 1500 bytes, most (modern) routers and switches are typically well within their line rate for forwarding rates. Jumbo frames wouldn't help there.


On my own network Jumbo packets do make a noticeable difference with file transfers, etc. My desktop NIC does hardware offload but I don't think my NAS drives or laptops do. It's not just physical hardware that would benefit, even VM infrastructure (aka Amazon EC2) could benefit greatly by sending fewer/larger packets over more/smaller ones.

IMHO everything's pointing to 1500B being too small for most of today's payloads, it's just seems like we ought to be moving in a direction that corrects this 1500B limitation rather than just compensating for it with hardware that can process more and more 1500B packets.


So for endpoints that communicate at even slower speeds, jumbo probably isn't much help there.

That could change with speeds higher than 10 Gbit, however. But only for endpoints. Switches that are communicating


You say this, but the packets per second bottlenecks show up again when you aggregate the small packets from many peers. In order to support higher bandwidths, there are two options: continue investing in ever faster router cores to forward small packets, or just increase the packet size to something more appropriate for today's large payloads.

The only benefit of 1500B is legacy compatibility. That is significant, but since network operators need to upgrade to IPv6 routers anyways, it's a perfect time to finally overcome to the 1500B limitation as well.

Reply Score: 3

RE[7]: Very Interesting
by tony on Fri 23rd Jan 2015 23:43 UTC in reply to "RE[6]: Very Interesting"
tony Member since:
2005-07-06

tony,

"Fair point, but at 1500 bytes, that card can forward full line rate. It only runs into trouble at about the 650 packet size, assuming my math is correct. Jumbo frames wouldn't help on that hardware. There's no forwarding performance benefit to that router by moving to frames larger than 1500 bytes. And that router is fairly old technology as well.


I'm glad you are seeing my point. Going back to what I was saying before, increasing the packet size allows us to get more bandwidth for the given processing power of a router core. In other words, it should make bandwidth (at 100Gbps and beyond) much cheaper.
"

It is not larger packet sizes that will make networking cheaper for routers and switches. The primary cost is the optics, signaling, cabling, etc., and getting signalling to higher and higher levels. Actual processing of packets is extremely cheap thanks to advances in merchant silicon like Broadcom's Trident II (which can handle line rate at the smallest packet sizes).

Again, scaling and performance issues for routers/switches are vastly different than for servers/CPU-based routers.

So there is zero benefit to routers and switches to have larger frames (especially since a larger MTU doesn't mean they won't deal with much smaller frames anyway).



"Packet processing is something different, which is usually referred to as forwarding rate, and completely independent of a router/switch's CPU. At 1500 bytes, most (modern) routers and switches are typically well within their line rate for forwarding rates. Jumbo frames wouldn't help there.


On my own network Jumbo packets do make a noticeable difference with file transfers, etc. My desktop NIC does hardware offload but I don't think my NAS drives or laptops do. It's not just physical hardware that would benefit, even VM infrastructure (aka Amazon EC2) could benefit greatly by sending fewer/larger packets over more/smaller ones.

IMHO everything's pointing to 1500B being too small for most of today's payloads, it's just seems like we ought to be moving in a direction that corrects this 1500B limitation rather than just compensating for it with hardware that can process more and more 1500B packets.


So for endpoints that communicate at even slower speeds, jumbo probably isn't much help there.

That could change with speeds higher than 10 Gbit, however. But only for endpoints. Switches that are communicating


You say this, but the packets per second bottlenecks show up again when you aggregate the small packets from many peers. In order to support higher bandwidths, there are two options: continue investing in ever faster router cores to forward small packets, or just increase the packet size to something more appropriate for today's large payloads.

The only benefit of 1500B is legacy compatibility. That is significant, but since network operators need to upgrade to IPv6 routers anyways, it's a perfect time to finally overcome to the 1500B limitation as well.
"

The 1500 byte limit isn't a hardware limitation, as Most hardware deployed now can do >1500 bytes, it's a limitation of both convention and MTU discovery. For larger than 1500 byte frames to work, every device in the path would need to be configured to handle larger frames, from core routers and switches, to the DSL and cable modems, to the server and client end points (the later which are almost always set to 1500 byte MTU). Plus, you've got far too many places that break pMTU, which would cause lots of traffic black holes.

For backhaul private networks, it's possible to do jumbo. I think Internet2 does it. For the regular public Internet? There's just little no benefit for the end hosts, and no benefit for the network devices in between.

Edited 2015-01-23 23:44 UTC

Reply Score: 3

RE[5]: Very Interesting
by Lennie on Fri 23rd Jan 2015 20:19 UTC in reply to "RE[4]: Very Interesting"
Lennie Member since:
2007-09-22

Certain types of (D)DOS-attacks use lots of small packets. It can be helpful for an attacker because it needs less bandwidth. But usually they'll just use an amplification attack where by they sent a lot of data.

Reply Score: 2

RE[4]: Very Interesting
by Lennie on Fri 23rd Jan 2015 18:51 UTC in reply to "RE[3]: Very Interesting"
Lennie Member since:
2007-09-22

In IPv6 Path MTU Discovery is handled by the endpoints and thus supposed to be mandatory.

So when enough IPv6 has been deployed, so in theory we can increase the standard MTU pretty easily.

But then the fun starts, convincing manufacturers to change the default on their products.

Which won't work well when you change the NIC before the switch.

So you probably want to start increasing the default MTU on switches first.

Reply Score: 3

RE: Very Interesting
by galvanash on Fri 23rd Jan 2015 02:40 UTC in reply to "Very Interesting"
galvanash Member since:
2006-01-25

I agree that work needs to be done to reduce linux kernel overhead for high interrupt workloads. However I'm wondering if now wouldn't be a good time to officially push to standardize on jumbo packets throughout the internet. 1500 byte packets have been holding us back for a while now in an ethernet standard that was set decades ago, and it creates lots of unnecessary load on modern CPUs, routers, and switching equipment.


My understanding is that the issue has always been about error rates going up significantly with jumboframes. TCP/UDP checksums become less effective as the frame size increases, so the undetected error rate goes up fast. To conteract that you now need to do some other form of error checking (CRC or something), and that is generally more expensive for the general CPU only case than the packet overhead was... catch 22.

Consider a 100MB video call or data stream that takes about 69k 1500B packets to transmit after factoring in packet overhead. This same 100MB stream would require only 11k 9000B jumbo packets. That's 84% of the switching/interrupt load that is freed on each and every router in the path just by switching to jumbo packets. Even simple webpages these days are significantly larger than 1500bytes and would see tremendous benefit.


TSO (segment offloading in the NIC) takes care of the vast majority of the interrupt overhead (at the endpoints at least), and routers and whatnot have the hardware ICs to throw at the problem for the most part.

Not to say it isn't still an issue, it is, but it isn't as big of an issue as it once was and has been partially solved through other mechanisms.

Reply Score: 4

RE[2]: Very Interesting
by Alfman on Fri 23rd Jan 2015 07:48 UTC in reply to "RE: Very Interesting"
Alfman Member since:
2011-01-28

galvanash,

My understanding is that the issue has always been about error rates going up significantly with jumboframes. TCP/UDP checksums become less effective as the frame size increases, so the undetected error rate goes up fast. To conteract that you now need to do some other form of error checking (CRC or something), and that is generally more expensive for the general CPU only case than the packet overhead was... catch 22.


That's worth considering, however the computers I oversee are often left on weeks at a time with zero Ethernet errors. I can't say errors are impossible, but in my experience with proper cabling it's extremely rare. Granted, the story may be different across a WAN. Anyways it's discussed here:
http://noahdavids.org/self_published/CRC_and_checksum.html

For what it's worth, my cable modem shows this:
Total Unerrored Codewords 2113042807
Total Correctable Codewords 3437
Total Uncorrectable Codewords 16
So it looks like there were 3453 errors in 19 days, 99.5% of which were correctable.


Not to say it isn't still an issue, it is, but it isn't as big of an issue as it once was and has been partially solved through other mechanisms.


It's true that offload engines can help reduce the CPU overhead but we have to use the same sort of tricks in every component on the network to compensate for unnecessarily high packet rates. It just doesn't seem ideal to me.

Reply Score: 3

jumbo frames are an idea born of bad math
by TechGeek on Fri 23rd Jan 2015 03:25 UTC
TechGeek
Member since:
2006-01-14

Consider that out of 1500 bytes that only about 65 are overhead, Jumbo frames are not a all that useful. Jumbo frames cant speed up the traffic. It can only cut down on the amount of overhead, which is at most about 5%. Add in the problems of error checking and equipment support, and you are doing a lot of work to pass a few more bytes.

In reality, you can go from about 100 MB/s to maybe 105 MB/s. Not really much of a gain. I would be much happier seeing 10 Gb getting cheap.

Reply Score: 3

bryanv Member since:
2005-08-26

The idea behind jumbo frames isn't to make the physical interconnect faster. (You can't go faster than 100Mb/s on a 100Mb/s network)

But to reduce the impact of _cpu_, the network stack, the software processing layers, etc as a bottleneck.

That said, you're right. Modern equipment (2 - 4 years old) won't see much of a boost. But, if you're in a situation where a 1-5% gain will let you limp existing hardware for another year or two without violating QOS constraints, then it may be worth going jumbo.

Reply Score: 3

Video recording of the talk
by Lennie on Fri 23rd Jan 2015 03:31 UTC
Lennie
Member since:
2007-09-22
v BSD vs Linux
by rrrrichardkideo on Fri 23rd Jan 2015 22:20 UTC
what about 10G
by bnolsen on Sun 25th Jan 2015 21:51 UTC
bnolsen
Member since:
2006-01-06

I'd be happy to see 10G at the consumer level. One of many axes I have to grind with intel is that their chipsets still don't support 10G natively. And the cost of 10G adapters is still silly high, the cost of 10G switches even more silly high.

Reply Score: 2

v 1
by Anonymous on Thu 29th Jan 2015 11:42 UTC