http://blog.ipspace.net/2014/09/ipv6-neighbor-discovery-nd-and.html
A few days ago Garrett Wollman published his exasperating experience running IPv6 on large L2 subnets with Juniper Ex4200 switches, concluding that “… much in IPv6 design and implementation has been botched by protocol designers and vendors …” (some of us would forcefully agree) making IPv6 “…simply unsafe to run on a production network…”
The resulting debate on Hacker News is quite interesting (and Andrew Yourtchenko is trying hard to keep it close to facts) and definitely worth reading… but is ND/MLD really as broken as some people claim it is?
What? What multicast?
Just in case you’re not familiar with the intricate details of IPv6: MLD stands for Multicast Listener Discovery, and if you want to ask, “why do I need multicast to get neighbor discovery”, you’re not alone. After all, ARP in IPv4 works quite well with broadcast MAC address… or so people think until they’re faced with a reality of a broadcast storm on an oversized IPv4 subnet.
I’ve heard of people running 10K+ hosts in the same subnet. Let’s be diplomatic and say that they’re somewhat stretching the design limits of Ethernet and IPv4.
IPv6 protocol designers tried to solve the problem of every host on the subnet getting involved in ARP processing by using a range of L2+L3 multicast groups. IPv6 address is hashed into a multicast IPv6 address (which is further cached into a multicast MAC address), and the ND queries (ARP requests in IPv4 lingo) are sent to the IPv6 multicast/MAC multicast address associated with the target IPv6 address, instead of broadcast MAC address. You’ll find more details in RFC 4861 and in my IPv6 webinars.
In a properly designed network with properly implemented host stacks using NICs that are capable of hardware-based multicast filters, the idea actually reduces the unnecessary load on the end-hosts.
You did notice the many conditionals in the previous paragraph, did you?
Going a step further, the protocol designers tried to reduce the link load on switched L2 networks – MLD snooping (almost identical to IGMP snooping) allows L2 switches to optimize the distribution tree for multicast MAC addresses, delivering the ND queries only to the hosts that expect them (without MLD snooping, the ND queries are delivered to all hosts and filtered out by NIC).
If you want MLD snooping to work, you obviously have to prompt the hosts to report what they expect to receive – an IPv6 router attached to a L2 subnet has to generate periodic MLD queries to help the L2 switches discover multicast-enabled end hosts.
What went wrong?
It’s hard to figure out what exactly went wrong with the CSAIL network, because we’re probably lacking loads of details, but there are a few easy conclusions to make:
- A design with numerous VLANs spread all over the place and connected to a single set of L3 switches is a single failure domain (I couldn’t resist writing this down);
- Don’t blame the protocol if your vendor lacks feature parity on pretty baseline features, or is late in implementing them. Several comments on the original article mentioned using DHCPv6 to get rid of multiple IPv6 privacy addresses per host, but of course you need a working DHCPv6 relay for that;
- Don’t blame the protocol when you hit implementation bugs;
- IPv6 deployment is like any other major new technology deployment: do a proper design, run a pilot project, and then gradually deploy the new technology (expecting some hiccups on the way). Everyone who attended my Enterprise IPv6 101 webinar is well aware of that ;)
- Don’t expect to just turn on IPv6 if you’re already stretching the limits of your existing technologies or your gear (see also this email if you need a hefty dose of sarcasm).
- Expect to do some tuning when implementing designs that are outside of the “normal” usage (for whatever value of “normal”). You wouldn’t expect to run 1000 routers in an OSPF area without some heavy Tuning-Fu, would you?
- Older platforms with $0.02 CPUs might get overloaded when processing MLD replies (assuming I understand the numbers from the blog post correctly, EX4200 experienced problems at around 300 MLD replies per second);
- Control-plane protection (CoPP) - assuming it works for IPv6 on your platform - is there for a reason;
- CPU overload (or any other control-plane glitch) will result in STP breakdown and forwarding loops (hint: MLAG might help – at least you won’t get the forwarding loops);
- Check the table sizes of your switches during your IPv6 readiness audit (you did plan to do an audit before deploying IPv6, did you?). EX4200 is not one of the worst offenders – EX4500 and EX4550 have just 1K IPv6 ND entries (all three switches also have dismal IPv6 forwarding table sizes).
You’ll find table sizes for most data center switches produced by major vendors (including EX-series from Juniper) in my Data Center Fabrics webinar.
On the topic of table sizes:
- You do need ND entry in a L3 switch for every active address (active = communicating with the L3 switch) on every IPv6 host – that would be at least 2 addresses per host, and potentially many more due to the way some mobile devices implement privacy extensions;
- You SHOULD NOT need an IPv6 multicast entry for every ND multicast group – after all, these groups are link-local, so there’s no need for L3 switch to keep their state.
- Likewise, you L2 switches SHOULD NOT use IPv6 entries (assuming they have them) for IPv6 multicast filters; as they’re L2 switches, they should use the same multicast MAC forwarding hardware they use for IPv4 (or any other protocol).
On a tangential note, I'm not claiming that the privacy addresses and numerous addresses per interface are the best idea ever. Likewise, I'm not exactly a fan of SLAAC in tightly-controlled enterprise networks, but these opinions aren't exactly relevant in the context of this blog post.
Workarounds
If everything else fails, it’s time for a MacGyver stunt (note: don’t do this at home). If you’re not using MLD snooping on L2 switches, there’s no need for frequent MLD queries – change MLD query interval to whatever maximum value you can.
Also, if you don’t use IPv6 multicast, I don't see any need to run MLD on the LAN – turn it off if you can… or maybe I'm missing something obvious, in which case please write a comment.
For more information on IPv6 multicast, ND and MLD, go through this presentation. Also note that we wouldn't have this problem if we would use L3-only IPv6 forwarding.