1. Introduction
The traffic required by telecommunication space is growing exponentially, mainly driven
by the demands of multimedia traffic over wireless & wireline network space and cloud-based
services. New services in 5G with IoT (Internet of Things) and mobility area will
increase the demands more. This demands require network operators to have quicker
service launch not only for new advanced applications but also for necessary network
functions behind the services. In any cases, the network operators have to prepare
and support those network services with reasonable investment in timely and flexible
manners.
Traditionally, telecommunication networks are populated with a large and increasing
variety of proprietary hardware-based appli- ances. To launch a new network service
often requires yet another variety and finding the space and power to accommodate
these boxes is becoming increasingly difficult; compounded by the increasing costs
of energy, capital investment challenges and the rarity of skills necessary to design,
integrate and operate increa- singly complex hardware-based appliances. Moreover,
hardware- based appliances rapidly reach end of life, requiring much of the procure-design-integrate-deploy
cycle to be repeated with little or no revenue benefits. Worse, hardware lifecycles
are becoming shorter as technology and services innovation accelerates, inhi- biting
the roll out of new services and constraining innovation in an increasingly network-centric
connected world (1).
The network virtualization can give lots of benefits to the network operators for
logistics, cost reduction, quick service launch, flexibility with rapid service scale
up and down, efficient resource use by sharing the resources with multiple applications,
etc. However, it also brings some challenges to resolve, which are performance, co-existence
with hardware based equipment, management and orchestration of many virtual network
appliances, network function virtualization scaling, appropriate level of re- silience,
integration of multiple virtual appliances from different vendors, etc. The key objective
of network function virtualization is to achieve simplified, software-defined, and
virtualized service provider networks, by leveraging standard IT virtualization tech-
nology to consolidate multiple network functions onto industry standard high volume
servers, switches and storage subsystems, which can be located in data centers, network
nodes and in the end user premises.
In order to provide with such network services, single network equipment or virtual
appliance is hardly to do all the required features alone as it requires lots of complex
features, such as switching/routing, firewall, network address translation, deep packet
inspection, anti-virus, video optimizer, etc. These functions are likely connected
for specific services and may require very close operation to deliver the traffic
from one to another directly. This service chaining can be used by network operators
to set up suites or catalogs of connected services that enable the use of a single
network connection for many services, with different characteristics. In virtualization
environments, this virtual appliances in a service chain can be located within single
server or multiple servers.
This article reviews the overall architecture of NFV (Network Function Virtualization)
framework defined by ETSI (European Telecommunication Standards Institute), and checks
the perfor- mance bottlenecks of each layer in NFV infrastructure (NFVI) and possible
solutions, then, considers ways to reduce the per- formance bottlenecks of virtual
machine (VM) to VM communi- cations for service chaining within single physical server.
2. Performance challenges in NFVI
2.1 NFV architecture
ETSI defines a high level NFV framework in Fig. 1, enabling virtual network functions (VNF) to be deployed and executed on an NFVI.
There are three domains, virtual network function (VNF), NFVI and NFV management and
orchestration (2).
그림. 1. 네트워크 기능 가상화 구조
Fig. 1. High-level NFV framework
The VNF is a software implementation of a network function capable of running over
the NFVI and typically mapped to one virtual machine (VM) in an NFVI but may also
be split into multiple VNF components loaded on separate VMs with different scaling
requirements.
NFVI includes the diversity of COTS (Common Off-The-Shelf) hardware resources, such
as computing, storage and network, and how these can be wrapped with a software layer
that abstracts and logically partitions them, supporting the execution of the VNFs.
Although there is no formal definition of COTS servers, it is generally accepted that
these are servers commonly deployed in the IT industry, equipped with general-purpose
processors.
NFV management and orchestration covers the orchestration and lifecycle management
of physical and/or software resources that support the infrastructure virtualization
and the lifecycle management of VNFs.
2.2 Performance challenges
In the NFV framework, there are performance challenges in each level in Fig. 2.
그림. 2. NFVI에서 발생하는 성능문제들
Fig. 2. Performance challenges in NFVI
The first one is a driver level bottleneck because interrupt- based operating system
drivers running on the server platform are not designed to receive and send packets
at a very high rate. Interrupt is an efficient way to handle IO (Input Output) requests
with reasonable cost of context switching of CPU (Central Pro- cessing Unit) in general
computing area. However, for the very high IO intensive applications like VNF to handle
tens of million packets per second or even more, CPU can be overwhelmed by the cost
of too many context switchings. To give full CPU cycles to packet processing, the
poll mode device drivers, typi- cally in user space by bypassing kernel without the
context switching, need to be considered, as suggested in DPDK (Data Plane Development
Kit) (3), which was initiated by Intel Corp. as their development toolkit for high performance
network processing but became open source project now (i.e., dpdk.org).
The second bottleneck is in the virtual switch of hypervisor. The virtual switch is
typically developed using operating system’s kernel network stacks. The verified assumption
of operating system kernel’s network stack is it is too slow, mainly due to interrupt,
locking, preemption, etc. The virtual switch with kernel network stack means its operation
is bound to the kernel’s network performance and scalability which show poor job of
exploiting the potential packet switching performance (4). By- passing the virtual switch is one of the ways to avoid, like single-root IO
virtualization (SR-IOV) which allows direct access to network interface card (NIC)
from VMs for high performance and reduced latency. However, some virtualization benefits
are lost or more complicated, VM instantiation and (live) migration for example (5), by bypassing the virtual switch and increases hardware dependency as NIC needs to
support the bypassing feature. Another approach is to improve the virtual switch per-
formance itself, bypassing the kernel and giving full CPU cycles to packet process,
like the first one’s, i.e., DPDK. The Open vSwitch, an open source project for the
virtual switch develop- ment, has been developed with kernel features, generally for-
warding packets via kernel network stack and it has been ported to DPDK as “Open vSwitch
with DPDK” (5). It is a feasible solution to avoid the bottleneck, keeping the virtualization benefits.
However, it also has certain dependencies on the operating system and hypervisor vendors.
The third one is in the communication between host operating system with the virtual
switch and guest operating system with VNFs. If the virtual NIC drivers (vNIC) used
by host and guest operating system are developed within operating system kernel, it
has the same performance bottlenecks as the first one. In full virtualization mode
(VNF is unaware that it is being virtualized and doesn’t require any changes to work
in this configuration, so, the hypervisor needs to emulate hardware to handle IO requests
from VNF), VNF uses its virtualization-unaware native driver, and it causes lots of
processing burdens in hypervisor side to emulate and communication overheads between
them. To improve the performance of the full IO virtualization model, IO paravirtualization
scheme (VNF has a virtualization aware device driver to optimize the communication)
has been widely considered. Virtio (6) is a de facto standard for the virtual IO driver in IO paravirtualization.
The last one is VNF itself. The VNF may have its performance and scalability issue
probably due to poor designs or lack of performance considerations for virtualization
environments. It is a VNF vendor specific and also related to the first and third
one above. Beside all the bottlenecks above, VNF itself should have proper performance
capability to handle upcoming high traffic requests.
It says the main performance bottlenecks come from the hyper- visor side, and that
is why hypervisor bypassing technologies, like SR-IOV, multi-root IO virtualization
(MR-IOV), or PCI pass-through, have been considered to overcome the performance bottlenecks.
3. Performance enhancement for VM to VM com- munications
3.1 VM to VM communication
As a network service is typically composed of multiple network functions by multiple
VNFs as a service chain in Fig. 3, it increases the demand for direct communication between VNFs in effective way without
performance leaks. Understanding the network bandwidth required by applications or
services is getting bigger and the traffic handled by single VNF is also being increased
a lot, there should be considerations to achieve huge traffic process between VNFs.
10 Gbps (Gigabit per second) and 40 Gbps NICs are already available and commercially
used. It means the traffic coming from NIC can be tens of Gbps or even more.
그림. 3. 가상머신간의 서비스체이닝
Fig. 3. Service chaining between VMs as a use case
For the explanation in the rest of sections, let’s take an example use case, having
network traffic with 50 Gbps full- duplex and VM traffic unidirectional with 50 Gbps
in Fig. 3. The 50 Gbps traffic is processed via (network) → NIC → (hyper- visor) → VM1 → (hypervisor)
→ VM2 → (hypervisor) → NIC → (network). In this use case, the hypervisor has to handle
150 Gbps full-duplex traffic (300 Gbps half-duplex traffic), regarding it has three
logical 50 Gbps full-duplex ports to process, one for NIC and two for each VM. It
is clearly a huge challenge in hypervisor side.
As single VNF is typically mapped into single VM, this article assumes it for simplicity.
VM to VM communication can be done within a physical server or between different physical
servers. This article handles the former case only, letting the latter be in future
considerations.
3.2 Direct hardware access technologies
Section 2.2 implies the main performance bottlenecks come from the hypervisor side.
This section handles ways to bypass the hypervisor and their limitations. The basic
concept is for VM to directly access to NICs by bypassing the hypervisor without performance
penalty, having almost same performance as that of bare metal. It requires specific
hardware drivers in VM side and NICs also have to support the technology with switching
capa- bility to handle the traffic between VMs. Thanks to the hardware- based switching
capability in NIC, it shows very high perfor- mance comparing to software-based switch
of hypervisor and it doesn’t require CPU resource for the switching, so, the CPU resource
can be allocated to hypervisor or other virtual machines to have more processing or
more virtual machines.
A well-known technology is SR-IOV (7), defined by PCI-SIG (Peripheral Component Interconnect Special Interest Group), and
it allows VMs, once created by the hypervisor, to share a piece of hardware without
involving the hypervisor directly for all activities. For example, a physical NIC
can expose a number of virtual functions that can be “attached” to VMs. The VMs will
see attached virtual functions as if it was a physical card. In other words, SR-IOV
allows the creation of multiple “shadow” cards of a physical NIC. Each shadow card
has its own MAC (Medium Access Control) address (8).
It introduces two function types, Physical Functions (PFs) and Virtual Functions (VFs).
PFs are full PCIe NIC functions with the SR-IOV and VFs are smaller PCIe functions
that contain the resources necessary for data movement but have a carefully minimized
set of configuration resources. This architecture is for an IO device to support multiple
VFs, minimizing the hardware cost of each additional function by sharing one IO device
and its capability with multiple virtual machines. The traffic flow with SR-IOV for
the use case is (network) → (NIC L2 switching) → (NIC VF 1) → VM1 → (NIC VF1) → (NIC
L2 switching) → (NIC VF2) → VM2 → (NIC VF2) → (NIC L2 switching) → (network) as described
in Fig. 4. The difference between Fig. 3 and Fig. 4 is who does the switching for the traffic flow. With SR-IOV, it is done by SR-IOV
NIC with hardware-based swit- ching capability.
As VMs access the NIC directly, the VM to VM traffic can be handled in the NIC as
well, where an issue may come from due to limitation of IO bus of the NIC. The PCIe
(Peripheral Components Interface Express) is widely used to interface with NIC at
the moment. The maximum bandwidth of PCIe is different upon the generation of PCIe
and its width, and there may be vendor specific processing overhead in PCIe board.
As an example of it, the maximum bandwidth of a PCIe slot (Gen3 with x8 width) is
less than 62 Gbps (9). Considering the internal overheads, an NIC with the PCIe interface has approximately
one 10 Gbps and one 40 Gbps network ports, or five 10 Gbps ports at best configuration,
whose total maximum bandwidth of the NIC is 50 Gbps. For simplicity, 50 Gbps is used
as a PCIe bandwidth limits in this article.
그림. 4. SR-IOV를 이용한 VM 간 통신
Fig. 4. VM to VM communication with SR-IOV
In this configuration of the use case defined in Fig. 3 of Section 3.1 (with 50 Gbps full-duplex traffic to/from network), the NIC has to
handle 100 Gbps full-duplex traffic via PCIe for two VMs, while handling 50 Gbps via
network port. The issue is the 100 Gbps traffic is beyond one PCIe bus limit, which
means there should be two PCIe slots to handle the traffic and the additional one
is only for internal switching to VMs not for network ports, pictured in Fig. 5. It is due to the limitation of PCIe bus to communicate with VMs. Understanding high
band- width NIC with direct access technologies is highly expensive, it may not be
feasible way to adapt for high bandwidth VM to VM traffic process.
As long as the total traffic PCIe can handle is less than the total traffic VM needs
to handle, the PCIe NIC needs to be added. For the example above, if one more VM is
added with certain network traffic, the total bandwidth of VMs (100 Gbps for existing
two VMs + additional traffic for new VM) exceeds the total bandwidth of two PCIe (100
Gbps as 50 Gbps per PCIe as an above example). It means there should be one more NIC
(total three NICs) to handle the traffic for the added VM, no matter what there are
still network ports available in the existing NICs.
One more additional consideration is the number of PCIe slots in a server, which is
several even with high end server (for example, Dell’s PowerEdge R840 has 4 or 6 PCIe
slots), so, consuming of the PCIe ports affects to the service variety as the PCIe
ports can be used for other hardware components not only for network traffic.
Basically the issue comes from huge VM to VM (East-West) traffic as it shares PCIe
bandwidth along with North-South traffic where direct hardware access technologies
could initially intend to address.
그림. 5. VM간 통신을 위해 2개의 NIC 사용
Fig. 5. Two NICs for VM to VM communications
3.3 Shared memory virtual NIC
One of ideas to handle huge traffic between VMs is to use shared memory for this purpose
which is a conventional idea from inter-process communication. It requires a specific
shared memory driver in each VM as a virtual NIC (vNIC) and two VMs can communicate
via the shared memory with high rate, without hypervisor’s intervention for data transfer.
Fig. 6 explains the traffic flows in the use case with direct hardware access technologies
and the shared memory virtual NIC driver. Memory bus supposes to have approximately
100 ~ 200 Gbps capacity with DDR4 (10), which can be used for high traffic handling for VM to VM communication. The bandwidth
limits of DRAM are different upon the version, clock speed, memory bus width, etc.
Also, it is in half-duplex mode, so, the limits need to be adjusted.
In this approach pictured in Fig. 6, only incoming and outgoing (North-South) traffic via NIC consumes PCIe bandwidth
and the traffic between VMs won’t consume it, so, there are more rooms for North-South
traffic over PCIe. It can be used with or without direct hardware access technologies.
If it can be used along with hypervisor, the specific shared memory vNIC driver in
host operating system is required as well.
There are many mechanisms to be suggested and considered, Fido, XenSocket, XenLoop,
Xway, etc. Fido is a VM to VM shared memory based mechanism, reducing VM to VM com-
munication overheads (hypervisor intervention and data copies) and enabling zero-copy
data transfer across multiple virtual machines on the same physical system (11). Operating under the kernel network stack (TCP/IP), the virtual machines can be connected
in collaboration with XenStore (a centralized key value store in dom0 of Xen) of hypervisor
and exchange data via its memory mapping module in control of its signaling module.
It doesn’t require any changes of upper layer and applications.
그림. 6. 공유메모리를 이용한 VM간 통신
Fig. 6. VM to VM communication with shared memory
XenSocket is a shared memory construct that provides a POSIX socket-based mechanism
for high-throughput VM to VM communications on the same physical system, avoiding
the over- head of multiple hypercalls (equivalent system calls for hypervisor to let
hypervisor handle the requests) and memory page table updates (12). It implies it uses the socket-like APIs to communi- cate for application’s perspective,
and underneath this socket-like APIs, there is an implementation to use shared memory
for VM to VM data transfer, which compiles into a kernel module. However, the applications
with the existing socket interface calls need to be changed for the socket-like APIs.
XWAY is a mechanism for VM to VM communication with high performance and binary compatibility
with the applications based on the standard TCP socket interface, by bypassing kernel
network stack, avoiding page flipping overhead and providing a direct communication
path between VMs in the same machine (13). To do this, it setups direct data path using shared memory between VMs instead of
Xen hypervisor’s relaying and uses its own protocols called XWAY protocol instead
of kernel network stacks for higher transfer, letting XWAY switch redirect the requests
to XWAY protocol for data requests or to kernel TCP stack for control requests. It
allows direct communication for XWAY-aware virtual machines in same physical server
and uses the existing TCP/IP kernel stacks for non-XWAY virtual machine by hooking
over kernel TCP stack. It shows binary compatibility for existing applications with
standard socket interface.
XenLoop couples shared memory based VM to VM communi- cation with transparent traffic
interception beneath the network layer and a soft-state domain discovery mechanism
to satisfy higher performance and user level transparency (14). XenLoop layer, underneath TCP/IP stack, intercepts every outgoing packet from the
upper layer in order to inspect its header to determine the packet’s destination,
using Netfilter hook. Then, upon the destination, it can forward to XenLoop-aware
virtual machine using its own data channel which is setup dynamically by handshaking,
or forward to XenLoop-unaware VM using the existing Xen driver (i.e., Netfront driver).
Service discovery for XenLoop-aware virtual machines is done with XenStore like Fido’s.
그림. 7. Fido, XenSocket, XWAY 및 XenLoop의 구조
Fig. 7. Overall architecture of Fido, XenSocket, XWAY and XenLoop
The overall architecture of these mechanisms is described in Fig. 7. As Fido and XenLoop operate underneath kernel network stack (i.e., at device driver
level), these mechanisms have focused on how to speed up the shared memory based communication
efficiently at driver level, keeping user level transparency. Xen- Socket and XWAY
have focused on their transport mechanisms instead of the existing kernel TCP/IP stacks
at socket interface level. TCP/IP stack assumes unreliable lower layers, so, it has
many considerations to make it reliable and this considerations can be overheads for
reliable environment. It implies bypassing TCP/IP for VM to VM communication on single
server may make sense as internal shared memory within a server can be regarded as
reliable channel (beside security considerations). All schemes are based on Linux
and its kernel as it is open source. To use those schemes for other non-open source
operating systems, like Windows, it needs to be developed and having product level
mechanism is likely the real issue for commercial deployment. Although the mechanisms
(Fido, XenSocket, XWAY and XenLoop) showed better performance than traditional Linux
network stacks’, the implementations are based on kernel. The proven assumption is
kernel is not suitable for high performance IO processing not only for its networking
stack but also for its behaviors like locking, preemption, interrupts, etc. So, to
handle more than tens of Gbps traffic or even more (hundreds of Gbps), bypassing kernel
is actively considered, using packet processing in user space with poll-mode driver.
Even though the shared memory scheme can show high bandwidth for VM to VM communications,
the problem is the specific shared memory vNIC driver is a proprietary, so, their
use is likely limited and the vendors of VNFs may need to be same or to use the same
network middleware. It may also loose some virtualization benefits like direct hardware
access technolo- gies as it bypasses the hypervisor. It has same bus limitation of
shared memory, but there are more slots in a server (for example, Dell’s PowerEdge
R840 has upto 48 memory slots), so, the fair distribution of shared memory over memory
slots for VM to VM communications would be necessary.
3.4 Other technologies
There are other acceleration technologies to be considered. TCP offload engine (TOE)
at NIC enables NIC to send to the host processor only reassembled data rather than
each packet associ- ated with TCP connection and can reduce PCI bus overheads. It
is mainly to reduce the processing overhead of CPU as CPU overhead for TCP/IP processing
has been increased and it affects to the poor performance of the server. It mainly
targets the TCP/IP offloading for application level software (i.e., the software on
top of TCP/IP layer; socket interface). Knowing most VNF vendors have their own Layer
2-4 stacks with TCP/IP and add their own specialties and features there, TOE’s use
is likely limited by VNF vendors. It can show better PCI bandwidth efficiency if it
is combined with direct hardware access tech- nology. However, there is still bandwidth
limitation of the PCI bus for huge traffic of the VM to VM communications.
MR-IOV is a standard mechanism defined by PCI-SIG. It can be regarded as an extension
of SR-IOV for multiple root com- plex environment (i.e., multiple servers), so, it
would be best fit for blade environment. It enables the use of single IO device by
multiple servers and multiple virtual machines simultaneously. It requires the IO
device needs to be MR-IOV capable and there should be a MR-IOV switching fabric between
the IO devices and servers. The IO device supposes to be placed in a separate chassis
and servers require a bus extender card to connect to the switch. It can be used for
VM to VM communication on both of single or different servers. However, due to its
complexity and multi-vendor involvements (NIC, switching and management software),
its actual implementation may take time.
Beside the hardware-based approaches, there is a software- based approach to accelerate
data plane processing itself for NFVI and VNF, mentioned in this article multiple
times. Its approach can accelerate each layer in NFV architecture in software perspective,
enabling better VM to VM communication as well. There are two major open committees,
DPDK and ODP (Open Data Plane), and the basic concept is to bypass operating system
network stack and to retrieve raw data from the NIC through a poll mode driver (8).
DPDK is an open source development toolkit consisting of libraries to accelerate packet
processing workload running on variety of CPU architectures (mainly x86 processors.
ARM64 support via cross compile at the moment) and it supports most of NICs currently
available (3).
ODP is to provide a common set of APIs for application portability across diverse
range of networking platforms (SoCs and servers) that offer various types of hardware
acceleration (15), letting hardware vendors develop its actual implementation for what and how the
APIs are realized. ODP-Linux and ODP- DPDK have been implemented as a reference. Both
of DPDK and ODP work at device driver level and don’t have protocol stacks, so, some
commercial software vendors offer their protocol stacks on top of DPDK/ODP layer.
ODP is a member of OFP (Open Fast Path) who is also an open source committee. OFP
is to create and develop an open source fast path TCP/IP stack, designed to run in
Linux user space (16). OFP operates on top of ODP as a protocol stack, in collaboration with Linux kernel
network stack (i.e., if the packets OFP can not handle probably due to not-implemented
yet for example, the packets go to Linux kernel and let its network stack handle it).
FD.io, as a Linux Foundation project, is a community with multiple projects in software-based
packet processing towards the creation of high-throughput, low-latency and resource-efficient
IO services suitable to many processor architecture (x86, ARM and PowerPC) and development
environment (bare metal, VM, con- tainer) (17). It uses DPDK for device driver layer. Vector Packet Processing (VPP) library, donated
by Cisco, is a key as the code in VPP is already running in commercial products and
is modular, allowing easy plug-in without changes to the underlying code basis and
running in user space of Linux.
All these activities say Linux kernel stack is not suitable for high performance network
processing for both of bare metal and virtualization environment, and developing network
functions in user space by bypassing kernel is the feasible solution at the moment.
It can also applied to East-West traffic (VM to VM) as well as North-South’ (VM to/from
NIC).
There are on-going projects, called vDPA (vHost Data Path Acceleration) and XDP(eXpress
Data Path) to accelerate data path. The basic idea of vDPA (18), mainly driven by DPDK, is to separate the data path (directly between the virtual
machine and hardware device) and control path (through hypervisor), using the standard
IO driver, virtio. It needs virtio’s transport (called ) capable IO device. The virtio
driver in the virtual machine can exchange the data with the hardware directly via
the virtio transport, without hypervisor’s involvement. The control path events (e.g.
device start/stop) in the virtual machine are still trapped and handled by hypervisor.
XDP, driven by an open source project called IO Visor (19) of Linux Foundation projects, is to accelerate the packet process inside kernel not
bypassing it. The basic idea is not to replace the kernel stack but to provide simpler
and faster alternative way in kernel, by kernel hooks using eBPF (extended Berkley
Packet Filter: a highly flexible and efficient one in Linux kernel allowing to execute
bytecode at various hook points in a safe manner. It can be executed at the lowest
point of software stack) in collaboration with the existing kernel network stack.
As the data from device driver can be checked at the lowest point by eBPF, it can
be used to accelerate VM to VM communication by processing and forwarding to the destination
at the level.
4. Conclusion
There is no doubt the high performance hypervisor spending reasonably small resource
is one of the best options for NFVI, but the reality is hard to overcome at the moment.
Direct hardware access technologies, such as SR-IOV, MR-IOV, PCI pass-through, etc.,
are acceptably used in virtualization environment as it can avoid the performance
bottlenecks in hypervisor level. This article has reviewed the direct hardware access
technologies in VM to VM communication environment and its limitations based on the
commercially available standard interface, PCIe, then checked the shared memory driver
as an option to avoid PCIe limitations, and reviewed other technologies.
It is getting clearer that packet processing with poll mode driver in Linux user space
is a feasible way to process high performance traffic as many of open committees assume
it. Also, letting hypervisor be separated from data path is actively con- sidered
as hypervisor’s intervention for data path causes lots of overheads requiring system
resource.
The use case, huge traffic for VM to VM communication within single server used in
this article, would be a specific and may not be required for enterprise level virtualization.
However, as network intensive equipment and applications are getting more popular
and they are in a way to virtualization, it is getting more realistic. Direct hardware
access technology like SR-IOV, can be used for VM to VM communications (East-South
traffic) as well as VM to/from NIC traffic (North-South traffic), with its hardware-based
switching capability, as long as its PCIe bus bandwidth can cover. Shared memory scheme
for direct VM to VM communications can be en efficient way to achieve the goal even
though it is still proprietary. As long as there are requirements for the use case
(huge traffic between VMs), there should be a way to overcome even though it may be
a non-standard or interim solution.