The Perils of a Virtual Data Center

13 06 2012

There is little debate that virtualization technology has been one of the key drivers for innovation in the enterprise data center. By abstracting services from hardware into manageable containers, virtualization technology has forced IT engineers to think outside of the box, allowing them to design systems that actually increase service availability while decreasing the physical footprint and cost. And these innovations haven’t stopped with advanced hypervisors or hardware chipset integrations. Innovations in the virtualization space have forced the whole of the IT industry into a new paradigm, bringing along a cascade of innovations in networking, storage, configuration management and application deployment. In short, virtualization ushered in a whole new ecosystem into the data center.

The only problem is that the security folks seem to have been left out of this ecosystem. According to the Gartner Group, 60% of all virtualized servers will be less secure than the physical servers they’ve replaced.[i] The reason, Gartner claims, is not that virtualization technologies are “inherently insecure”, but instead, virtualization is being deployed insecurely. Gartner goes on to enumerate risks in virtual infrastructure that boil down to a lack of distinction between the management and data planes and a dearth of tools to correctly monitor the virtual infrastructure. This observation, when extrapolated to include this new ecosystem of data center technologies layered above or below the hypervisor, illustrates exactly why the security community needs to become more involved in the actual process of building and designing data center architectures.

Flexible Topologies vs. Stringent Security

Security and network engineers have long relied on strictly controlled, hierarchical data center topologies where systems are architected to behave in a static fashion. In these environments, data flows can be understood by “following the wires” to and from nodes. Each of these nodes fulfils discrete functions that have well-documented methods for securing data and ensuring high availability. But once virtualization has been introduced into this environment, the structure and the assurances can disappear. Virtual web servers can in one instant intermingle with their backend database servers on a single hypervisor, and data flows between these virtual machines no longer have to pass through firewalls. In another instant, the same database servers could travel across the data center, causing massive amounts of east-west traffic, thereby disrupting the heuristics on the intrusion prevention and performance monitoring platforms.

Adding further confusion to these new topologies, new “fabric” technologies have been released that bypass the rigidly hierarchical Spanning Tree Protocol (STP).  STP ensured that data from one node to another followed a specifically defined path through the network core, which made monitoring traffic for troubleshooting or intrusion prevention simple. These fabrics (such as Juniper QFabric, Brocade VCS, and Cisco FabricPath) now allow top of rack switches to be configured in a full mesh, Clos or hypercube topology[ii] with all paths available for data transmission and return. This means that if an engineer wants to monitor traffic to or from a particular host, they will have to determine the current path of the traffic and either engineer a way to make this path deterministic (thereby defeating the purpose of the fabric technology) or hope that the path doesn’t change while they are monitoring.

The flexibility afforded by virtualization can cost dearly in security best practices. For instance, a typical security best practice is to “prune” vlans– removing these layer two networks from unnecessary switches in order to prevent unauthorized monitoring, man in the middle attacks, or network disruptions. But in the virtualized data center, this practice has become obsolete. Consider the act of moving a virtual server from one hypervisor to another. In order to accomplish this task, the vlan that the virtual machine lives on must exist as a network on an 802.1Q trunk line attached to both servers. If each of these servers is configured to handle any type of virtual machine within the data center, all vlans must exist on this trunk, and on all of the intermediary switches- producing substantial opportunities for security and technical failures, particularly in multitenant environments.

Prior to the introduction of data center fabrics, most network engineers segregated different types of traffic on different layer 2 domains, forcing all inter-vlan communication to ingress and egress through distribution layer routers. However, even this most basic of controls can be purposefully defeated with new technologies like VXLAN[iii] and NVGRE[iv]. These protocols allow hypervisor administrators to extend layer two domains from one hypervisor logically separated from another hypervisor with layer 3 boundaries by simply encapsulating the layer two traffic within another, abstracted layer two frame before being handed off to the layer 3 device. This obviates the security controls that the network provided, and even could even allow a vlan to be easily extended outside of a corporate network perimeter. This possibility illustrates yet another risk in virtualization technologies: separation of duties.

Management and monitoring

In the traditional data center, the separation of duties was easy to understand and segment. Network, systems, and storage engineers all worked together with minimal administrative overlap while security engineers formed a sort of protective membrane to ensure that this system stayed healthy. Yet in the virtual realm, all of these components intermingle and they can be logically managed from within the same few management interfaces. These interfaces allow systems engineers to make changes to networking or storage infrastructure, and network engineers to make changes to hypervisors or virtual machines, and so forth. As networks supporting virtual infrastructures have converged, storage management policies have blurred into the network management realm as switches and routers increasingly transport iSCSI or FCoE traffic in addition to traditional data or voice packets.

None of this would matter quite as much if these new technologies were easy to monitor. But without topographical consistency or separate management domains, monitoring becomes another interesting challenge. The old-fashioned data center typically relied upon at least three levels of monitoring: NetFlow for records of node-to-node data communications, SNMP, WMI or other heuristics-based analysis of hardware and software performance, and some type of centralized system and application logging mechanism such as Syslog or Windows Event Logs.

But in the virtualized data center, the whole doesn’t add up to the sum of its parts. While system logging remains unchanged, it stands alone as the only reliable way of monitoring system health. Meanwhile NetFlow and SNMP are crippled.  A few years ago, Netflow in virtualized environments was completely absent. VM to VM traffic that didn’t leave a hypervisor just simply didn’t create NetFlow records at all.  Responding to the issue, most vendors added some amount of Netflow accounting[v] for a price premium. However, the version implemented (v5) still does not support IPv6, MPLS, or VPLS flows. Furthermore, since PCI buses and PC motherboards are not designed for wire speed switching, there are reports that enabling NetFlow on hypervisor virtual switches can result in serious performance problems.[vi]

However, if NetFlow is successfully enabled on the virtual environment, there are still a few ways it can be broken. Once a VXLAN or NVGRE tunnel is established between two hypervisors, the data can be encrypted using SSL or IPSec. These flows will only be seen as layer three communications between two single hypervisors, even if in reality a dozen or more machines are speaking to an equivalent number on the remote hypervisor. The NVGRE/VXLAN problem combined with the new fabric architectures and the dynamic properties of virtual machines mean that heuristical analysis of virtual data center performance is much less feasible than in static data centers. In a static environment, security engineers can set thresholds for “typical” amounts of data transfers between various nodes. Once system administrators have the capability to dynamically distribute virtual machines across a data center, these numbers become meaningless, at least until a new baseline for traffic analysis is established.

A (Fabric)Path forward

So where does this leave the security engineers who’d like to at least try to keep a handle on the security of their most critical systems in a virtualized data center? Well, first, they can take comfort in that although these technologies are on the horizon, few companies have moved beyond the most basic of virtualization infrastructures. VXLAN and NVGRE are still IETF drafts, which means that even though Cisco, Microsoft and VMware already support them, the standards have not yet been widely adopted by other vendors. However, even if the equipment made by these vendors are in a data center, it’s likely that VXLAN or NVGRE are unnecessary for most organizations.[vii] Similarly, the new data center fabric architectures haven’t yet seen wide adoption because they cost enormous amounts of money[viii] and they require a massive data center overhaul- from equipment replacement to new fiber plants[ix].

Also, despite the numerous security gaps created by new virtualization technologies, data center equipment vendors are aware of the problems and engaging in finding solutions. Cisco released a bolt-on virtual switch[x] that can be utilized to separate the management of networking equipment from the virtual systems environment while also terminating VXLAN connections s to allow for security monitoring on VXLAN tunnels. A number of vendors have introduced virtual firewall appliances[xi] that can provide continuous protection even when virtual machines are moved out of the path of a physical firewall. Even SNMP/WMI monitoring gaps are being bridged by vendors who have developed virtualization-aware technologies[xii] that detect virtual machine locations and smooth baseline heuristics once a machine has migrated to another location.

So, all hope is not lost.  Depending on where the architecture team is in their research or implementation of these technologies, security engineers are likely to have an opportunity to get a seat at the table and will have a bourgeoning security toolkit at their disposal, which can help them to get a hold of the process before it gets out of hand.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: