2011-09-20: Edited to add section about IPv6 options; minor cleanup; references added.
This is kind of a follow-on from my post about the subnet addressing design differences between IPv4 and IPv6. Recently, Tom Hollingsworth started a little Twitter conversation about NAT where i mentioned that i liked NAT for the purpose of decoupling my internal and external address spaces; 140-character limits got in my way there, and i realised i needed to clarify my logic more, so this is my attempt to do that. I'm very interested in feedback - have i missed something important?
A bit of context
I've never worked for a service provider and i don't work in large data centres at the moment. So i don't have in mind huge, publicly-addressed networks. I have in mind "corporate" or "enterprise" networks, which might include campus networks on one site with a few thousand ports, or organisations spread across 40 or 50 sites. In such organisations, the "data centre" might comprise something like 4 or 5 racks, usually on one or two sites, with maybe 100-200 gigabit ports or so.
Exposing only what is necessary
If i have a network of, say, 2000 devices, including desktops, servers, printers, tablets, mobiles, etc. there are a variety of different access requirements. The servers which largely serve clients on the LAN or internal WAN have limited web access requirements. Some clients might talk to local servers for most of their applications. For other clients (especially mobile devices), accessing the web (and perhaps email) is the only thing they need to do. Another whole range of devices (printers, security cameras, etc.) have no need for inbound or outbound Internet traffic at all - if they need updates or configuration changes, that usually happens through a local management server.
For performance, bandwidth control, security, and auditing purposes, web browsing on most of these devices is forced through a local proxy server. Doing this eliminates most reasons for client devices to directly contact any system in the outside world. This significantly changes the security posture of the devices in question (cf. Greg Ferro's comments in Packet Pushers #47 about inline load balancers allowing the web servers they balance to have no default route). Of course, that's not perfect security, and we still have to be careful that we're doing the right checks in the proxy server, but it cuts out a whole range of possible attack vectors, with the result that only a tiny portion of a corporate network actually needs to be addressable globally. This is not in itself justification for NAT, but rather justification for exposure of only a small external address range.
Internal addressing plans
I haven't yet seen a corporate IP addressing plan that didn't use the organisational unit, or the geographical location, or both. In many cases, they are the only real world entity represented by the 2nd or 3rd IPv4 octet, even if there are not 256 organisational units or locations. This is a little inefficient, and I'm sure that if everyone thought in binary, we could pack things in there and save 3 or 4 bits in many cases, but for the most part it's a good practice because it saves support costs by allowing everyone to use 8-bit boundaries. (I suspect when we go to IPv6 people will work on 16-bit boundaries, and burn even more bits on internal subnet addressing.)
The relevance of this to the NAT question is that most corporate networks would prefer that the internal structure of the network is not disclosed when client PCs contact outside addresses during day-to-day tasks, and NAT achieves this rather nicely. Of course, any determined attacker can learn lots about clients by passively watching their traffic, but funneling client traffic through a NAT gateway is one component of the solution.
NAT not a security mechanism?
It's almost a truism in the networking industry that "NAT is not a security mechanism". This is at least somewhat true: a great deal can still be discovered about a host behind a NAT gateway using passive packet sniffing, and if a vulnerable service is exposed through a port forward, then all bets are off. But in one sense, saying that NAT is not a security mechanism is a misrepresentation, because NAT provides a significant level of protection against active attacks.
For example, if a Windows PC's file sharing service is open on the internal network but it's behind a NAT gateway, it cannot be compromised by external hosts through a buffer overrun vulnerability in its SMB protocol handler. Similarly, if a server has an ssh daemon which allows password-based access, it cannot be compromised by the (very common) ssh password brute-forcing worms that infest the Internet if it's behind a NAT gateway which does not port-forward to that ssh daemon. So whilst NAT is not a tool designed to provide security, the address space conservation that it's designed for also provides some security against common types of attack as a useful by-product.
Most of the discussion about hating on NAT in Packet Pushers episode #61 (starting at about the 40 minute mark) was set in the context of a web hosting or large data centre environment (to which the issue of public vs. private address space does not apply), and assumed that those who deploy NAT do so along with thoughtless port forwarding and without suitable DMZ design. [1] But NAT and poor network security design need not go hand-in-hand.
NAT fails closed, not open
One aspect of NAT makes it desirable from a security perspective, and this is why the majority of SOHO routers in the world are deployed with NAT enabled by default: NAT is closed to outside access by default. That is, unless you take active steps to open up outside access to ports and/or hosts behind a NAT gateway, their normal TCP and UDP ports cannot be accessed. I don't dispute the possibility of attacks which could exploit weaknesses in the packet forwarding algorithms used by NAT gateways in order to attack the hosts behind them, nor suggest that spear phishing or drive-by downloads are not a significant risk to those hosts, nor suggest that the security of the gateway itself is not essential. But these are risks apply equally to hosts behind routed firewalls.
Designing for things to fail is part of good network design, and in many (most?) coprorate networks, it's preferable to fail closed rather than open. On a NAT gateway, if there is a failure in the routing or firewalling engine, only one host remains open to external attack: the gateway itself. On the other hand, if a routed firewall's ACLs fail to be applied for any reason - say, during a system restart after a software update - the default scenario for many operating systems is that their routing functions remain functional even if their firewall does not. So in a failure scenario, NAT's security posture is more desirable than that of a similarly-configured non-NATed network.
Similarly, if i make a mistake in specifying a netmask on an ACL in a routed network (as a colleague recently did on a client's network), i might accidentally allow outside access to double the number of systems i intended to. Using NAT means that i'm less likely to do this, because such ACLs usually only apply in an outbound direction.
NAT simplifies problems where scale overwhelms the administrator
This is the part where the networking high-flyers are going to start laughing at me. But please, read and understand first. There are factors in many organisations (usually at layer 8 or 9 of the OSI network model) that mean that we don't always have access to the best people. Finding someone with deep understanding of how all the components of a network hang together is actually hard to come by in many places.
For those of us who are left, NAT is a helpful tool in cutting down the size of a network design or management problem from immense to manageable. If we can provide Internet access to a large number of systems using a much smaller number of external addresses, we will have a much greater chance of understanding the configuration and producing a good result for our employers and/or clients.
But the naysayers are still right...
In many cases, NAT is only an obscurity mechanism which is fundamentally a waste of time in terms of security. It adds complexity to the troubleshooting process, often for no additional value. But NAT can and in many cases should be part of a network administrator's toolkit, when applied rightly.
Thinking IPv6
How this applies to IPv6 is where i start to get uneasy. The internal-external decoupling that NAT provides seems not to be on the radar for IPv6. The suggestions i've seen so far are either to use unique local addressing internally and do one-to-one translation between these and provider independent addresses at the border router (which seems to me to provide no benefit at all over straight routed firewalling), or to use only unique local addresses and not bother with providing external addresses for corporate end-user PCs at all [2] (which will cease being practical as soon as the sales manager decides he or she needs Skype).
[1] When listening to that episode, one could be forgiven for thinking that connection tracking of FTP had never been invented...
[2] At about the 9:00 mark in the video.