Mobile QR Code QR CODE : The Transactions P of the Korean Institute of Electrical Engineers

Korean Journal of Air-Conditioning and Refrigeration Engineering

ISO Journal TitleTrans. P of KIEE

Indexed by
Korea Citation Index(KCI)

Main Menu

Journal Search

[

Research article

]

The Transactions P of the Korean Institute of Electrical Engineers

KIEEP Vol. 68, No. 3, p.134-141

ISSN (print) :

1229-800X

ISSN (online) :

2586-7792

Received : 23 May 2019Accepted : 7 June 2019

DOI :

http://doi.org/10.5370/KIEEP.2019.68.3.134

Performance considerations for VM to VM communication in network function virtualization infrastructure

네트워크 가상화 구조에서 VM간 통신을 위한 성능 향상에 대한 고찰

김용근 (Yongkeun Kim) ^†iD

^†Corresponding Author : KoreaQuest, Inc. E-mail: mkim@koreaquest.net

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.kiee.or.kr).

Abstract

Network function virtualization (NFV) is a big trend for network operators because it can give lots of benefits to the operators for logistics, cost reduction, quick service launch, flexibility with rapid service scale up/down, efficient resource use by sharing the resources with multiple applications, etc. This article reviews the overall NFV architecture, its benefits and challenges, and checks the performance bottlenecks of each layer in NFV infrastructure (NFVI) and possible solutions, then, considers ways to reduce the performance bottlenecks of virtual machine (VM) to VM communications for service chaining within a physical server.

Key words

Virtualization, NFV, NFVI, VM, Performance, PCI

1. Introduction

The traffic required by telecommunication space is growing exponentially, mainly driven by the demands of multimedia traffic over wireless & wireline network space and cloud-based services. New services in 5G with IoT (Internet of Things) and mobility area will increase the demands more. This demands require network operators to have quicker service launch not only for new advanced applications but also for necessary network functions behind the services. In any cases, the network operators have to prepare and support those network services with reasonable investment in timely and flexible manners.

Traditionally, telecommunication networks are populated with a large and increasing variety of proprietary hardware-based appli- ances. To launch a new network service often requires yet another variety and finding the space and power to accommodate these boxes is becoming increasingly difficult; compounded by the increasing costs of energy, capital investment challenges and the rarity of skills necessary to design, integrate and operate increa- singly complex hardware-based appliances. Moreover, hardware- based appliances rapidly reach end of life, requiring much of the procure-design-integrate-deploy cycle to be repeated with little or no revenue benefits. Worse, hardware lifecycles are becoming shorter as technology and services innovation accelerates, inhi- biting the roll out of new services and constraining innovation in an increasingly network-centric connected world ⁽¹⁾.

The network virtualization can give lots of benefits to the network operators for logistics, cost reduction, quick service launch, flexibility with rapid service scale up and down, efficient resource use by sharing the resources with multiple applications, etc. However, it also brings some challenges to resolve, which are performance, co-existence with hardware based equipment, management and orchestration of many virtual network appliances, network function virtualization scaling, appropriate level of re- silience, integration of multiple virtual appliances from different vendors, etc. The key objective of network function virtualization is to achieve simplified, software-defined, and virtualized service provider networks, by leveraging standard IT virtualization tech- nology to consolidate multiple network functions onto industry standard high volume servers, switches and storage subsystems, which can be located in data centers, network nodes and in the end user premises.

In order to provide with such network services, single network equipment or virtual appliance is hardly to do all the required features alone as it requires lots of complex features, such as switching/routing, firewall, network address translation, deep packet inspection, anti-virus, video optimizer, etc. These functions are likely connected for specific services and may require very close operation to deliver the traffic from one to another directly. This service chaining can be used by network operators to set up suites or catalogs of connected services that enable the use of a single network connection for many services, with different characteristics. In virtualization environments, this virtual appliances in a service chain can be located within single server or multiple servers.

This article reviews the overall architecture of NFV (Network Function Virtualization) framework defined by ETSI (European Telecommunication Standards Institute), and checks the perfor- mance bottlenecks of each layer in NFV infrastructure (NFVI) and possible solutions, then, considers ways to reduce the per- formance bottlenecks of virtual machine (VM) to VM communi- cations for service chaining within single physical server.

2. Performance challenges in NFVI

2.1 NFV architecture

ETSI defines a high level NFV framework in Fig. 1, enabling virtual network functions (VNF) to be deployed and executed on an NFVI. There are three domains, virtual network function (VNF), NFVI and NFV management and orchestration ⁽²⁾.

그림. 1. 네트워크 기능 가상화 구조

Fig. 1. High-level NFV framework

The VNF is a software implementation of a network function capable of running over the NFVI and typically mapped to one virtual machine (VM) in an NFVI but may also be split into multiple VNF components loaded on separate VMs with different scaling requirements.

NFVI includes the diversity of COTS (Common Off-The-Shelf) hardware resources, such as computing, storage and network, and how these can be wrapped with a software layer that abstracts and logically partitions them, supporting the execution of the VNFs. Although there is no formal definition of COTS servers, it is generally accepted that these are servers commonly deployed in the IT industry, equipped with general-purpose processors.

NFV management and orchestration covers the orchestration and lifecycle management of physical and/or software resources that support the infrastructure virtualization and the lifecycle management of VNFs.

2.2 Performance challenges

In the NFV framework, there are performance challenges in each level in Fig. 2.

그림. 2. NFVI에서 발생하는 성능문제들

Fig. 2. Performance challenges in NFVI

The first one is a driver level bottleneck because interrupt- based operating system drivers running on the server platform are not designed to receive and send packets at a very high rate. Interrupt is an efficient way to handle IO (Input Output) requests with reasonable cost of context switching of CPU (Central Pro- cessing Unit) in general computing area. However, for the very high IO intensive applications like VNF to handle tens of million packets per second or even more, CPU can be overwhelmed by the cost of too many context switchings. To give full CPU cycles to packet processing, the poll mode device drivers, typi- cally in user space by bypassing kernel without the context switching, need to be considered, as suggested in DPDK (Data Plane Development Kit) ⁽³⁾, which was initiated by Intel Corp. as their development toolkit for high performance network processing but became open source project now (i.e., dpdk.org).

The second bottleneck is in the virtual switch of hypervisor. The virtual switch is typically developed using operating system’s kernel network stacks. The verified assumption of operating system kernel’s network stack is it is too slow, mainly due to interrupt, locking, preemption, etc. The virtual switch with kernel network stack means its operation is bound to the kernel’s network performance and scalability which show poor job of exploiting the potential packet switching performance ⁽⁴⁾. By- passing the virtual switch is one of the ways to avoid, like single-root IO virtualization (SR-IOV) which allows direct access to network interface card (NIC) from VMs for high performance and reduced latency. However, some virtualization benefits are lost or more complicated, VM instantiation and (live) migration for example ⁽⁵⁾, by bypassing the virtual switch and increases hardware dependency as NIC needs to support the bypassing feature. Another approach is to improve the virtual switch per- formance itself, bypassing the kernel and giving full CPU cycles to packet process, like the first one’s, i.e., DPDK. The Open vSwitch, an open source project for the virtual switch develop- ment, has been developed with kernel features, generally for- warding packets via kernel network stack and it has been ported to DPDK as “Open vSwitch with DPDK” ⁽⁵⁾. It is a feasible solution to avoid the bottleneck, keeping the virtualization benefits. However, it also has certain dependencies on the operating system and hypervisor vendors.

The third one is in the communication between host operating system with the virtual switch and guest operating system with VNFs. If the virtual NIC drivers (vNIC) used by host and guest operating system are developed within operating system kernel, it has the same performance bottlenecks as the first one. In full virtualization mode (VNF is unaware that it is being virtualized and doesn’t require any changes to work in this configuration, so, the hypervisor needs to emulate hardware to handle IO requests from VNF), VNF uses its virtualization-unaware native driver, and it causes lots of processing burdens in hypervisor side to emulate and communication overheads between them. To improve the performance of the full IO virtualization model, IO paravirtualization scheme (VNF has a virtualization aware device driver to optimize the communication) has been widely considered. Virtio ⁽⁶⁾ is a de facto standard for the virtual IO driver in IO paravirtualization.

The last one is VNF itself. The VNF may have its performance and scalability issue probably due to poor designs or lack of performance considerations for virtualization environments. It is a VNF vendor specific and also related to the first and third one above. Beside all the bottlenecks above, VNF itself should have proper performance capability to handle upcoming high traffic requests.

It says the main performance bottlenecks come from the hyper- visor side, and that is why hypervisor bypassing technologies, like SR-IOV, multi-root IO virtualization (MR-IOV), or PCI pass-through, have been considered to overcome the performance bottlenecks.

3. Performance enhancement for VM to VM com- munications

3.1 VM to VM communication

As a network service is typically composed of multiple network functions by multiple VNFs as a service chain in Fig. 3, it increases the demand for direct communication between VNFs in effective way without performance leaks. Understanding the network bandwidth required by applications or services is getting bigger and the traffic handled by single VNF is also being increased a lot, there should be considerations to achieve huge traffic process between VNFs. 10 Gbps (Gigabit per second) and 40 Gbps NICs are already available and commercially used. It means the traffic coming from NIC can be tens of Gbps or even more.

그림. 3. 가상머신간의 서비스체이닝

Fig. 3. Service chaining between VMs as a use case

For the explanation in the rest of sections, let’s take an example use case, having network traffic with 50 Gbps full- duplex and VM traffic unidirectional with 50 Gbps in Fig. 3. The 50 Gbps traffic is processed via (network) → NIC → (hyper- visor) → VM1 → (hypervisor) → VM2 → (hypervisor) → NIC → (network). In this use case, the hypervisor has to handle 150 Gbps full-duplex traffic (300 Gbps half-duplex traffic), regarding it has three logical 50 Gbps full-duplex ports to process, one for NIC and two for each VM. It is clearly a huge challenge in hypervisor side.

As single VNF is typically mapped into single VM, this article assumes it for simplicity. VM to VM communication can be done within a physical server or between different physical servers. This article handles the former case only, letting the latter be in future considerations.

3.2 Direct hardware access technologies

Section 2.2 implies the main performance bottlenecks come from the hypervisor side. This section handles ways to bypass the hypervisor and their limitations. The basic concept is for VM to directly access to NICs by bypassing the hypervisor without performance penalty, having almost same performance as that of bare metal. It requires specific hardware drivers in VM side and NICs also have to support the technology with switching capa- bility to handle the traffic between VMs. Thanks to the hardware- based switching capability in NIC, it shows very high perfor- mance comparing to software-based switch of hypervisor and it doesn’t require CPU resource for the switching, so, the CPU resource can be allocated to hypervisor or other virtual machines to have more processing or more virtual machines.

A well-known technology is SR-IOV ⁽⁷⁾, defined by PCI-SIG (Peripheral Component Interconnect Special Interest Group), and it allows VMs, once created by the hypervisor, to share a piece of hardware without involving the hypervisor directly for all activities. For example, a physical NIC can expose a number of virtual functions that can be “attached” to VMs. The VMs will see attached virtual functions as if it was a physical card. In other words, SR-IOV allows the creation of multiple “shadow” cards of a physical NIC. Each shadow card has its own MAC (Medium Access Control) address ⁽⁸⁾.

It introduces two function types, Physical Functions (PFs) and Virtual Functions (VFs). PFs are full PCIe NIC functions with the SR-IOV and VFs are smaller PCIe functions that contain the resources necessary for data movement but have a carefully minimized set of configuration resources. This architecture is for an IO device to support multiple VFs, minimizing the hardware cost of each additional function by sharing one IO device and its capability with multiple virtual machines. The traffic flow with SR-IOV for the use case is (network) → (NIC L2 switching) → (NIC VF 1) → VM1 → (NIC VF1) → (NIC L2 switching) → (NIC VF2) → VM2 → (NIC VF2) → (NIC L2 switching) → (network) as described in Fig. 4. The difference between Fig. 3 and Fig. 4 is who does the switching for the traffic flow. With SR-IOV, it is done by SR-IOV NIC with hardware-based swit- ching capability.

As VMs access the NIC directly, the VM to VM traffic can be handled in the NIC as well, where an issue may come from due to limitation of IO bus of the NIC. The PCIe (Peripheral Components Interface Express) is widely used to interface with NIC at the moment. The maximum bandwidth of PCIe is different upon the generation of PCIe and its width, and there may be vendor specific processing overhead in PCIe board. As an example of it, the maximum bandwidth of a PCIe slot (Gen3 with x8 width) is less than 62 Gbps ⁽⁹⁾. Considering the internal overheads, an NIC with the PCIe interface has approximately one 10 Gbps and one 40 Gbps network ports, or five 10 Gbps ports at best configuration, whose total maximum bandwidth of the NIC is 50 Gbps. For simplicity, 50 Gbps is used as a PCIe bandwidth limits in this article.

그림. 4. SR-IOV를 이용한 VM 간 통신

Fig. 4. VM to VM communication with SR-IOV

In this configuration of the use case defined in Fig. 3 of Section 3.1 (with 50 Gbps full-duplex traffic to/from network), the NIC has to handle 100 Gbps full-duplex traffic via PCIe for two VMs, while handling 50 Gbps via network port. The issue is the 100 Gbps traffic is beyond one PCIe bus limit, which means there should be two PCIe slots to handle the traffic and the additional one is only for internal switching to VMs not for network ports, pictured in Fig. 5. It is due to the limitation of PCIe bus to communicate with VMs. Understanding high band- width NIC with direct access technologies is highly expensive, it may not be feasible way to adapt for high bandwidth VM to VM traffic process.

As long as the total traffic PCIe can handle is less than the total traffic VM needs to handle, the PCIe NIC needs to be added. For the example above, if one more VM is added with certain network traffic, the total bandwidth of VMs (100 Gbps for existing two VMs + additional traffic for new VM) exceeds the total bandwidth of two PCIe (100 Gbps as 50 Gbps per PCIe as an above example). It means there should be one more NIC (total three NICs) to handle the traffic for the added VM, no matter what there are still network ports available in the existing NICs.

One more additional consideration is the number of PCIe slots in a server, which is several even with high end server (for example, Dell’s PowerEdge R840 has 4 or 6 PCIe slots), so, consuming of the PCIe ports affects to the service variety as the PCIe ports can be used for other hardware components not only for network traffic.

Basically the issue comes from huge VM to VM (East-West) traffic as it shares PCIe bandwidth along with North-South traffic where direct hardware access technologies could initially intend to address.

그림. 5. VM간 통신을 위해 2개의 NIC 사용

Fig. 5. Two NICs for VM to VM communications

3.3 Shared memory virtual NIC

One of ideas to handle huge traffic between VMs is to use shared memory for this purpose which is a conventional idea from inter-process communication. It requires a specific shared memory driver in each VM as a virtual NIC (vNIC) and two VMs can communicate via the shared memory with high rate, without hypervisor’s intervention for data transfer. Fig. 6 explains the traffic flows in the use case with direct hardware access technologies and the shared memory virtual NIC driver. Memory bus supposes to have approximately 100 ~ 200 Gbps capacity with DDR4 ⁽¹⁰⁾, which can be used for high traffic handling for VM to VM communication. The bandwidth limits of DRAM are different upon the version, clock speed, memory bus width, etc. Also, it is in half-duplex mode, so, the limits need to be adjusted.

In this approach pictured in Fig. 6, only incoming and outgoing (North-South) traffic via NIC consumes PCIe bandwidth and the traffic between VMs won’t consume it, so, there are more rooms for North-South traffic over PCIe. It can be used with or without direct hardware access technologies. If it can be used along with hypervisor, the specific shared memory vNIC driver in host operating system is required as well.

There are many mechanisms to be suggested and considered, Fido, XenSocket, XenLoop, Xway, etc. Fido is a VM to VM shared memory based mechanism, reducing VM to VM com- munication overheads (hypervisor intervention and data copies) and enabling zero-copy data transfer across multiple virtual machines on the same physical system ⁽¹¹⁾. Operating under the kernel network stack (TCP/IP), the virtual machines can be connected in collaboration with XenStore (a centralized key value store in dom0 of Xen) of hypervisor and exchange data via its memory mapping module in control of its signaling module. It doesn’t require any changes of upper layer and applications.

그림. 6. 공유메모리를 이용한 VM간 통신

Fig. 6. VM to VM communication with shared memory

XenSocket is a shared memory construct that provides a POSIX socket-based mechanism for high-throughput VM to VM communications on the same physical system, avoiding the over- head of multiple hypercalls (equivalent system calls for hypervisor to let hypervisor handle the requests) and memory page table updates ⁽¹²⁾. It implies it uses the socket-like APIs to communi- cate for application’s perspective, and underneath this socket-like APIs, there is an implementation to use shared memory for VM to VM data transfer, which compiles into a kernel module. However, the applications with the existing socket interface calls need to be changed for the socket-like APIs.

XWAY is a mechanism for VM to VM communication with high performance and binary compatibility with the applications based on the standard TCP socket interface, by bypassing kernel network stack, avoiding page flipping overhead and providing a direct communication path between VMs in the same machine ⁽¹³⁾. To do this, it setups direct data path using shared memory between VMs instead of Xen hypervisor’s relaying and uses its own protocols called XWAY protocol instead of kernel network stacks for higher transfer, letting XWAY switch redirect the requests to XWAY protocol for data requests or to kernel TCP stack for control requests. It allows direct communication for XWAY-aware virtual machines in same physical server and uses the existing TCP/IP kernel stacks for non-XWAY virtual machine by hooking over kernel TCP stack. It shows binary compatibility for existing applications with standard socket interface.

XenLoop couples shared memory based VM to VM communi- cation with transparent traffic interception beneath the network layer and a soft-state domain discovery mechanism to satisfy higher performance and user level transparency ⁽¹⁴⁾. XenLoop layer, underneath TCP/IP stack, intercepts every outgoing packet from the upper layer in order to inspect its header to determine the packet’s destination, using Netfilter hook. Then, upon the destination, it can forward to XenLoop-aware virtual machine using its own data channel which is setup dynamically by handshaking, or forward to XenLoop-unaware VM using the existing Xen driver (i.e., Netfront driver). Service discovery for XenLoop-aware virtual machines is done with XenStore like Fido’s.

그림. 7. Fido, XenSocket, XWAY 및 XenLoop의 구조

Fig. 7. Overall architecture of Fido, XenSocket, XWAY and XenLoop

The overall architecture of these mechanisms is described in Fig. 7. As Fido and XenLoop operate underneath kernel network stack (i.e., at device driver level), these mechanisms have focused on how to speed up the shared memory based communication efficiently at driver level, keeping user level transparency. Xen- Socket and XWAY have focused on their transport mechanisms instead of the existing kernel TCP/IP stacks at socket interface level. TCP/IP stack assumes unreliable lower layers, so, it has many considerations to make it reliable and this considerations can be overheads for reliable environment. It implies bypassing TCP/IP for VM to VM communication on single server may make sense as internal shared memory within a server can be regarded as reliable channel (beside security considerations). All schemes are based on Linux and its kernel as it is open source. To use those schemes for other non-open source operating systems, like Windows, it needs to be developed and having product level mechanism is likely the real issue for commercial deployment. Although the mechanisms (Fido, XenSocket, XWAY and XenLoop) showed better performance than traditional Linux network stacks’, the implementations are based on kernel. The proven assumption is kernel is not suitable for high performance IO processing not only for its networking stack but also for its behaviors like locking, preemption, interrupts, etc. So, to handle more than tens of Gbps traffic or even more (hundreds of Gbps), bypassing kernel is actively considered, using packet processing in user space with poll-mode driver.

Even though the shared memory scheme can show high bandwidth for VM to VM communications, the problem is the specific shared memory vNIC driver is a proprietary, so, their use is likely limited and the vendors of VNFs may need to be same or to use the same network middleware. It may also loose some virtualization benefits like direct hardware access technolo- gies as it bypasses the hypervisor. It has same bus limitation of shared memory, but there are more slots in a server (for example, Dell’s PowerEdge R840 has upto 48 memory slots), so, the fair distribution of shared memory over memory slots for VM to VM communications would be necessary.

3.4 Other technologies

There are other acceleration technologies to be considered. TCP offload engine (TOE) at NIC enables NIC to send to the host processor only reassembled data rather than each packet associ- ated with TCP connection and can reduce PCI bus overheads. It is mainly to reduce the processing overhead of CPU as CPU overhead for TCP/IP processing has been increased and it affects to the poor performance of the server. It mainly targets the TCP/IP offloading for application level software (i.e., the software on top of TCP/IP layer; socket interface). Knowing most VNF vendors have their own Layer 2-4 stacks with TCP/IP and add their own specialties and features there, TOE’s use is likely limited by VNF vendors. It can show better PCI bandwidth efficiency if it is combined with direct hardware access tech- nology. However, there is still bandwidth limitation of the PCI bus for huge traffic of the VM to VM communications.

MR-IOV is a standard mechanism defined by PCI-SIG. It can be regarded as an extension of SR-IOV for multiple root com- plex environment (i.e., multiple servers), so, it would be best fit for blade environment. It enables the use of single IO device by multiple servers and multiple virtual machines simultaneously. It requires the IO device needs to be MR-IOV capable and there should be a MR-IOV switching fabric between the IO devices and servers. The IO device supposes to be placed in a separate chassis and servers require a bus extender card to connect to the switch. It can be used for VM to VM communication on both of single or different servers. However, due to its complexity and multi-vendor involvements (NIC, switching and management software), its actual implementation may take time.

Beside the hardware-based approaches, there is a software- based approach to accelerate data plane processing itself for NFVI and VNF, mentioned in this article multiple times. Its approach can accelerate each layer in NFV architecture in software perspective, enabling better VM to VM communication as well. There are two major open committees, DPDK and ODP (Open Data Plane), and the basic concept is to bypass operating system network stack and to retrieve raw data from the NIC through a poll mode driver ⁽⁸⁾.

DPDK is an open source development toolkit consisting of libraries to accelerate packet processing workload running on variety of CPU architectures (mainly x86 processors. ARM64 support via cross compile at the moment) and it supports most of NICs currently available ⁽³⁾.

ODP is to provide a common set of APIs for application portability across diverse range of networking platforms (SoCs and servers) that offer various types of hardware acceleration ⁽¹⁵⁾, letting hardware vendors develop its actual implementation for what and how the APIs are realized. ODP-Linux and ODP- DPDK have been implemented as a reference. Both of DPDK and ODP work at device driver level and don’t have protocol stacks, so, some commercial software vendors offer their protocol stacks on top of DPDK/ODP layer.

ODP is a member of OFP (Open Fast Path) who is also an open source committee. OFP is to create and develop an open source fast path TCP/IP stack, designed to run in Linux user space ⁽¹⁶⁾. OFP operates on top of ODP as a protocol stack, in collaboration with Linux kernel network stack (i.e., if the packets OFP can not handle probably due to not-implemented yet for example, the packets go to Linux kernel and let its network stack handle it).

FD.io, as a Linux Foundation project, is a community with multiple projects in software-based packet processing towards the creation of high-throughput, low-latency and resource-efficient IO services suitable to many processor architecture (x86, ARM and PowerPC) and development environment (bare metal, VM, con- tainer) ⁽¹⁷⁾. It uses DPDK for device driver layer. Vector Packet Processing (VPP) library, donated by Cisco, is a key as the code in VPP is already running in commercial products and is modular, allowing easy plug-in without changes to the underlying code basis and running in user space of Linux.

All these activities say Linux kernel stack is not suitable for high performance network processing for both of bare metal and virtualization environment, and developing network functions in user space by bypassing kernel is the feasible solution at the moment. It can also applied to East-West traffic (VM to VM) as well as North-South’ (VM to/from NIC).

There are on-going projects, called vDPA (vHost Data Path Acceleration) and XDP(eXpress Data Path) to accelerate data path. The basic idea of vDPA ⁽¹⁸⁾, mainly driven by DPDK, is to separate the data path (directly between the virtual machine and hardware device) and control path (through hypervisor), using the standard IO driver, virtio. It needs virtio’s transport (called ) capable IO device. The virtio driver in the virtual machine can exchange the data with the hardware directly via the virtio transport, without hypervisor’s involvement. The control path events (e.g. device start/stop) in the virtual machine are still trapped and handled by hypervisor.

XDP, driven by an open source project called IO Visor ⁽¹⁹⁾ of Linux Foundation projects, is to accelerate the packet process inside kernel not bypassing it. The basic idea is not to replace the kernel stack but to provide simpler and faster alternative way in kernel, by kernel hooks using eBPF (extended Berkley Packet Filter: a highly flexible and efficient one in Linux kernel allowing to execute bytecode at various hook points in a safe manner. It can be executed at the lowest point of software stack) in collaboration with the existing kernel network stack. As the data from device driver can be checked at the lowest point by eBPF, it can be used to accelerate VM to VM communication by processing and forwarding to the destination at the level.

4. Conclusion

There is no doubt the high performance hypervisor spending reasonably small resource is one of the best options for NFVI, but the reality is hard to overcome at the moment. Direct hardware access technologies, such as SR-IOV, MR-IOV, PCI pass-through, etc., are acceptably used in virtualization environment as it can avoid the performance bottlenecks in hypervisor level. This article has reviewed the direct hardware access technologies in VM to VM communication environment and its limitations based on the commercially available standard interface, PCIe, then checked the shared memory driver as an option to avoid PCIe limitations, and reviewed other technologies.

It is getting clearer that packet processing with poll mode driver in Linux user space is a feasible way to process high performance traffic as many of open committees assume it. Also, letting hypervisor be separated from data path is actively con- sidered as hypervisor’s intervention for data path causes lots of overheads requiring system resource.

The use case, huge traffic for VM to VM communication within single server used in this article, would be a specific and may not be required for enterprise level virtualization. However, as network intensive equipment and applications are getting more popular and they are in a way to virtualization, it is getting more realistic. Direct hardware access technology like SR-IOV, can be used for VM to VM communications (East-South traffic) as well as VM to/from NIC traffic (North-South traffic), with its hardware-based switching capability, as long as its PCIe bus bandwidth can cover. Shared memory scheme for direct VM to VM communications can be en efficient way to achieve the goal even though it is still proprietary. As long as there are requirements for the use case (huge traffic between VMs), there should be a way to overcome even though it may be a non-standard or interim solution.

References

October 2012, “Network Function Virtualisation : An Introduction, Benefits, Enablers, Challenges & Call for Action,” NFV White Paper; http://portal.etsi.org/NFV/NFV_White_Paper.pdf

ETSI GS NFV 002, “Network Functions Virtualization (NFV); Architectural Framework,” https://www.etsi.org/deliver/etsi_gs/nfv/001_099/002/01.02.01_60/gs_nfv002v010201p.pdf

DPDK (Data Plane Development Kit), www.dpdk.org

Carr Brian, Ashton Charlie, July 2011, Application Portability for Multicore Packet Processing Becomes a Reality, A White Paper from the Experts in Business-Critical Continuity

Open vSwitch open source project, https://www.openswitch.org/

Russel Rusty, July 2008, virtio: Towards a De-Facto Standard For Virtual I/O Devices, ACM SIGOPS Op. Sys. Rev., Vol. 42, No. 5, pp. 95-103

Y. Dong, 2010, High Performance Network Virtuali- zation with SR-IOV, in Proc. 2010 IEEE 16th Int’l Symp. High Performance Computer Architecture, pp. 1-10

Chatras Bruno, Frederic Ozog Fancois, July/August 2016, Network Func- tions Virtualization: The Portability Challenge, IEEE Network

https://community.mellanox.com/s/article/understanding-pcie- configuration-for-maximum-performance

https://en.wikipedia.org/wiki/DDR4_SDRAM

Burtsev Anton, June 2009, Fido: Fast Inter-Virtual-Machine Communication for Enterprise Appliances, in UNENIX ’09 Proceedings

X. Zhang, S. McIntosh, P. Rohatgi, J. L. Griffin, November 2007, Xen- Socket: A High-Throughput Interdomain Transport for Virtual Machines, in Middleware 2007: ACM/IFIP/USENIC 8th Inter- national Middleware Conference

K. Kim, C. Kim, S. Jung, H. Shin, 2008, Inter- domain Socket Communications Supporting High Perfor- mance and Full Binary Compatibility on Xen, in VEE ’08: Proceedings of the 4th ACM SIGPLAN/SIGOPS Inter- national Conference on Virtual Execution Environment

J. Wang, K.-L. Wright, K. Gopalan, 2008, XenLoop: A Trans- parent High Performance Inter-VM Network Loopback, in Proc. Of International Symposium on High Performance Distributed Computing (HPDC)

ODP (Open Data Plane), https://opendataplane.org

OFP (Open Fast Path), https://openfastpath.org

FD.io (Fast Data Input Output), https://fd.io

Liang Cunming, October 2018 Presentation file downloadable in https://eventslinuxfoundationorg/wp-content/uploads/2017/12/Cunming-Liang-Intel-KVM-Forum-2018-VDPA-VHOST-MDEVpdf, VDPA: VHOST-MDEV, AS NET VHOST PROTOCOL TRANSPORT, KVM Forum 2018

IO Visor Open Source Project, https://www.iovisor.org/technology/xdp

저자소개

김 용 근 (金容槿)

1988 아주대학교 전자계산학과 학사

1990 아주대학교 컴퓨터공학과 석사

1990~1998 쌍용정보통신(주) 선임연구원

1995 전자계산조직응용기술사

1998~2001 Lucent Technologies 부장

2001~2002 Jetstream Comm. 이사

2003~2018 6WIND S.A. 부사장

~현재 코리아퀘스트(주) 대표이사

E-Mail : mkim@koreaquest.net

KIEEThe Transactions P of
the Korean Institute of Electrical Engineers

Korean Journal of Air-Conditioning and Refrigeration Engineering

ISO Journal TitleTrans. P of KIEE

Journal Search

Journal XML

Journal Information

네트워크 가상화 구조에서 VM간 통신을 위한 성능 향상에 대한 고찰

Abstract

Key words

1. Introduction

2. Performance challenges in NFVI

2.1 NFV architecture

2.2 Performance challenges

3. Performance enhancement for VM to VM com- munications

3.1 VM to VM communication

3.2 Direct hardware access technologies

3.3 Shared memory virtual NIC

3.4 Other technologies

4. Conclusion

References

저자소개

김 용 근 (金容槿)

Article Information (continued)

Key words

KIEEThe Transactions P ofthe Korean Institute of Electrical Engineers

Korean Journal of Air-Conditioning and Refrigeration Engineering

ISO Journal TitleTrans. P of KIEE

Journal Search

Journal XML

Journal Information

네트워크 가상화 구조에서 VM간 통신을 위한 성능 향상에 대한 고찰

Abstract

Key words

1. Introduction

2. Performance challenges in NFVI

2.1 NFV architecture

2.2 Performance challenges

3. Performance enhancement for VM to VM com- munications

3.1 VM to VM communication

3.2 Direct hardware access technologies

3.3 Shared memory virtual NIC

3.4 Other technologies

4. Conclusion

References

저자소개

김 용 근 (金容槿)

Article Information (continued)

Key words

KIEEThe Transactions P of
the Korean Institute of Electrical Engineers