NAT is evil, but not bad

2011-09-20: Edited to add section about IPv6 options; minor cleanup; references added.

This is kind of a follow-on from my post about the subnet addressing design differences between IPv4 and IPv6. Recently, Tom Hollingsworth started a little Twitter conversation about NAT where i mentioned that i liked NAT for the purpose of decoupling my internal and external address spaces; 140-character limits got in my way there, and i realised i needed to clarify my logic more, so this is my attempt to do that.  I’m very interested in feedback – have i missed something important?

A bit of context

I’ve never worked for a service provider and i don’t work in large data centres at the moment.  So i don’t have in mind huge, publicly-addressed networks.  I have in mind “corporate” or “enterprise” networks, which might include campus networks on one site with a few thousand ports, or organisations spread across 40 or 50 sites. In such organisations, the “data centre” might comprise something like 4 or 5 racks, usually on one or two sites, with maybe 100-200 gigabit ports or so.

Exposing only what is necessary 

If i have a network of, say, 2000 devices, including desktops, servers, printers, tablets, mobiles, etc. there are a variety of different access requirements.  The servers which largely serve clients on the LAN or internal WAN have limited web access requirements.  Some clients might talk to local servers for most of their applications.  For other clients (especially mobile devices), accessing the web (and perhaps email) is the only thing they need to do.  Another whole range of devices (printers, security cameras, etc.) have no need for inbound or outbound Internet traffic at all – if they need updates or configuration changes, that usually happens through a local management server.

For performance, bandwidth control, security, and auditing purposes, web browsing on most of these devices is forced through a local proxy server. Doing this eliminates most reasons for client devices to directly contact any system in the outside world. This significantly changes the security posture of the devices in question (cf. Greg Ferro’s comments in Packet Pushers #47 about inline load balancers allowing the web servers they balance to have no default route).  Of course, that’s not perfect security, and we still have to be careful that we’re doing the right checks in the proxy server, but it cuts out a whole range of possible attack vectors, with the result that only a tiny portion of a corporate network actually needs to be addressable globally.  This is not in itself justification for NAT, but rather justification for exposure of only a small external address range.

Internal addressing plans

I haven’t yet seen a corporate IP addressing plan that didn’t use the organisational unit, or the geographical location, or both.  In many cases, they are the only real world entity represented by the 2nd or 3rd IPv4 octet, even if there are not 256 organisational units or locations.  This is a little inefficient, and I’m sure that if everyone thought in binary, we could pack things in there and save 3 or 4 bits in many cases, but for the most part it’s a good practice because it saves support costs by allowing everyone to use 8-bit boundaries.  (I suspect when we go to IPv6 people will work on 16-bit boundaries, and burn even more bits on internal subnet addressing.)

The relevance of this to the NAT question is that most corporate networks would prefer that the internal structure of the network is not disclosed when client PCs contact outside addresses during day-to-day tasks, and NAT achieves this rather nicely. Of course, any determined attacker can learn lots about clients by passively watching their traffic, but funneling client traffic through a NAT gateway is one component of the solution.

NAT not a security mechanism?

It’s almost a truism in the networking industry that “NAT is not a security mechanism”.  This is at least somewhat true: a great deal can still be discovered about a host behind a NAT gateway using passive packet sniffing, and if a vulnerable service is exposed through a port forward, then all bets are off.  But in one sense, saying that NAT is not a security mechanism is a misrepresentation, because NAT provides a significant level of protection against active attacks.

For example, if a Windows PC’s file sharing service is open on the internal network but it’s behind a NAT gateway, it cannot be compromised by external hosts through a buffer overrun vulnerability in its SMB protocol handler.  Similarly, if a server has an ssh daemon which allows password-based access, it cannot be compromised by the (very common) ssh password brute-forcing worms that infest the Internet if it’s behind a NAT gateway which does not port-forward to that ssh daemon.  So whilst NAT is not a tool designed to provide security, the address space conservation that it’s designed for also provides some security against common types of attack as a useful by-product.

Most of the discussion about hating on NAT in Packet Pushers episode #61 (starting at about the 40 minute mark) was set in the context of a web hosting or large data centre environment (to which the issue of public vs. private address space does not apply), and assumed that those who deploy NAT do so along with thoughtless port forwarding and without suitable DMZ design. [1]  But NAT and poor network security design need not go hand-in-hand.

NAT fails closed, not open

One aspect of NAT makes it desirable from a security perspective, and this is why the majority of SOHO routers in the world are deployed with NAT enabled by default: NAT is closed to outside access by default.  That is, unless you take active steps to open up outside access to ports and/or hosts behind a NAT gateway, their normal TCP and UDP ports cannot be accessed.  I don’t dispute the possibility of attacks which could exploit weaknesses in the packet forwarding algorithms used by NAT gateways in order to attack the hosts behind them, nor suggest that spear phishing or drive-by downloads are not a significant risk to those hosts, nor suggest that the security of the gateway itself is not essential.  But these are risks apply equally to hosts behind routed firewalls.

Designing for things to fail is part of good network design, and in many (most?) coprorate networks, it’s preferable to fail closed rather than open.  On a NAT gateway, if there is a failure in the routing or firewalling engine, only one host remains open to external attack: the gateway itself.  On the other hand, if a routed firewall’s ACLs fail to be applied for any reason – say, during a system restart after a software update – the default scenario for many operating systems is that their routing functions remain functional even if their firewall does not.  So in a failure scenario, NAT’s security posture is more desirable than that of a similarly-configured non-NATed network.

Similarly, if i make a mistake in specifying a netmask on an ACL in a routed network (as a colleague recently did on a client’s network), i might accidentally allow outside access to double the number of systems i intended to.  Using NAT means that i’m less likely to do this, because such ACLs usually only apply in an outbound direction.

NAT simplifies problems where scale overwhelms the administrator

This is the part where the networking high-flyers are going to start laughing at me.  But please, read and understand first.  There are factors in many organisations (usually at layer 8 or 9 of the OSI network model) that mean that we don’t always have access to the best people.  Finding someone with deep understanding of how all the components of a network hang together is actually hard to come by in many places.

For those of us who are left, NAT is a helpful tool in cutting down the size of a network design or management problem from immense to manageable.  If we can provide Internet access to a large number of systems using a much smaller number of external addresses, we will have a much greater chance of understanding the configuration and producing a good result for our employers and/or clients.

But the naysayers are still right…

In many cases, NAT is only an obscurity mechanism which is fundamentally a waste of time in terms of security.  It adds complexity to the troubleshooting process, often for no additional value. But NAT can and in many cases should be part of a network administrator’s toolkit, when applied rightly.

Thinking IPv6

How this applies to IPv6 is where i start to get uneasy.  The internal-external decoupling that NAT provides seems not to be on the radar for IPv6.  The suggestions i’ve seen so far are either to use unique local addressing internally and do one-to-one translation between these and provider independent addresses at the border router (which seems to me to provide no benefit at all over straight routed firewalling), or to use only unique local addresses and not bother with providing external addresses for corporate end-user PCs at all [2] (which will cease being practical as soon as the sales manager decides he or she needs Skype).

[1] When listening to that episode, one could be forgiven for thinking that connection tracking of FTP had never been invented…

[2] At about the 9:00 mark in the video.


Pondering subnet allocations

Edit, 2011-05-03: To all those poor souls who have been directed here by Google in their search for best practices on IPv4 and/or IPv6 subnet allocations (or worse, the HP A5500’s NAT capabilities), please accept my sincere apologies.  This page is more about asking questions than providing answers.

Edit, 2011-08-07: Network World has an interesting blog post by Jeff Doyle talking about issues in IPv6 address space design. Good reading.

This is my third go at writing this post.  I started in the middle of the night, because i woke up with IPv4 allocations and VLAN assignments running around in my head and couldn’t get back sleep.  After writing what seemed to me a reasonably coherent post, i accidentally hit the back button instead of the left arrow (surely they could have found somewhere better to put that on the ThinkPad keyboard).  Dismal failure 1 for the day.  After that i just threw a few notes in here as a draft and went back to bed.

I’m in the middle of a network redesign for a major client, a medium-sized K-12 private school.  We have about 70 switches, and a little over 2000 ports.  It’s nowhere near the scale of a university, enterprise data centre, or service provider network, but it requires significantly more design, planning, and implementation effort than your average small network.

The campus houses a few loosely-coupled related entities over about 25 or so buildings, all connected by Gigabit fibre.  A few years ago when we upgraded the phone system and switched to VoIP, i made an allocation plan for subnets using the IPv4 space.  We have quite a few VLANs, using /16 and /24 subnet sizes.

The network upgrade i’m working on has a number of goals: getting all client systems off the server VLAN (which has been progressing slowly over the last 18-24 months), providing redundant routing using a new pair of HP A5500 switches using IRF (HP/3Com’s equivalent to Cisco stacking), and moving routing between VLANs from an old cluster of Linux servers to the new switches.

At the same time, i’m planning a move from switch-based VLANs to building-based VLANs, and i thought to myself: since we’re going to need IPv6 on the outside network soon, i’d better make sure my new plan allows for IPv6 on the inside.  I want to keep the IPv6 structure pretty much identical to IPv4, since our subnet plan mirrors the physical structure of the network.

Selecting the subnet size on IPv6 is easy, since it’s pretty much fixed to /64 (insert appropriate mind-boggling about why we would want burn half of our addressing bits on the local subnet here), but there’s another complication: because there’s no NAT (yet), my IPv6 subnet plan must fit within our external address range.  This is a big difference for many (most?) organisations using IPv4 only: at the moment, we have complete logical decoupling of our internal and external address ranges; under IPv6 we must tie the two together.

This is my big concern with the lack of NAT in IPv6: it places constraints on internal network design that do not exist in the IPv4+NAT world.  I don’t dispute the wisdom of the designers in leaving out NAT – it is unquestionably a complicating hack.  But in my limited understanding of IPv6, i’m not aware of an equivalent to the useful part of IPv4 NAT (the internal/external address decoupling).  When we implement IPv6, i’m guessing that i’ll implement one-to-one address translation at our network edge to achieve equivalent functionality.

So what happens with our external address space?  I’m not fully clear on APNIC‘s IPv6 allocation rules, but as far as i can tell, an existing holder of an IPv4 /23 can expect a maximum IPv6 allocation of a /48.  This means that we would have 16 bits of subnets, which is exactly the same number of /24s we have available in the address space.  My first reaction to this was, “Sweet – i’ll use the exact same subnet numbers in hexadecimal, and i’ll have my IPv6 subnet plan.”  But i wonder whether that’s all there is to it.  And having exactly the same number of subnets at our disposal doesn’t seem like much of a leap forward in terms of protocol functionality…

What are other people’s thoughts?  Are the issues for IPv6 subnet allocation different from IPv4?  Is there a best practice for this sort of thing?