In the fast-paced world of cloud networking, performance and reliability are non-negotiable. Recently, we conducted a Proof of Concept (PoC) to evaluate the performance of dynamic routing using OVN-BGP-Agent with Fast Data Path using OVS-DPDK in Red Hat OpenStack Services on OpenShift (RHOSO) version 18. Using the Trex traffic generator along with BIRD as a plugin, we put this setup through its paces, assessing throughput, packet loss, stability, and system resource utilisation under various traffic conditions.
This article unpacks our findings, offering insights into both the strengths and current limitations of this powerful combination. Whether you’re a developer or a network engineer, you’ll find valuable takeaways here as we explore the performance landscape of cloud networking.
Overview of ovn-bgp-agent with OVS-DPDK
The OVN BGP Agent exposes VMs/Containers through BGP on ML2/OVN. It uses NB OVN BGP Driver, which adds a new exposing_method named ovn to make use of OVN routing, instead of relying on Kernel routing. It then leverages Free Range Routing (FRR) BGPd daemon in order to advertise IP addresses or FIPs associated with the VM using a VRF. The agent also introduces a separate OVN cluster on each node that manages the needed virtual infrastructure between the OpenStack networking overlay and the physical network. Because routing occurs at OVN/OVS level, this proposal makes it possible to support Fast Data Path with OVS-DPDK.
Figure 1 depicts ovn-bgp-agent in action to expose the VM’s IP to the external network.
Figure 1: The OVN-BGP-AGENT flow diagram.
Test environment and topology
Before diving into the results, let’s set the stage with our test environment and network topology.
Hardware configuration:
- Compute Node: Dell PowerEdge R650
- CPU: Intel Xeon Platinum (74 non-HT cores with 2 NUMA nodes)
- Memory: 256 GB, configured with 200x 1GB hugepages
- NIC: Dual 100Gbps Mellanox ConnectX-6 ports
- Traffic Generator: Dell PowerEdge R650 (similar configuration)
- Network Switch: Dell 100GbE switch for high-speed connectivity
Software stack:
- Operating System: Red Hat Enterprise Linux 9.4
- Cloud Platform: RHOSO 18.0.4
- Networking:
- OVS 3.3.4-62.efidp with DPDK 23.11.2
- OVN-BGP-Agent 1.0.1
- FRR 8.5.3 for routing
- Trex STL v3.06 with BIRD 2.0.8 for traffic generation and routing
Network topology
The Compute Node hosted an OVS-DPDK bridge (br-ex) connected to a high-speed DPDK port (dpdk2, 100 Gbps) for physical connectivity and a VLAN tap (vlan177, tag 177) for logical segmentation. The Trex node was connected over a Layer 2 network via BGP peering, simulating bi-directional traffic to and from a virtual machine (VM) on the Compute Node.
IP configurations:
- Compute Node: 12.12.12.1/30 on vlan177
- Trex Node: 12.12.12.2/30
- Provider Network: 172.16.101.0/24 (flat network type)
This setup isolated BGP traffic and efficiently handled test workloads, mimicking real-world cloud networking scenarios (Figure 2).
Figure 2: A diagram of the BGP network schema.
This setup leveraged DPDK’s fast packet processing and BGP’s dynamic routing to advertise VM routes externally, mimicking real-world cloud scenarios.
Baseline performance: Throughput and packet handling
Our baseline tests evaluated performance across five frame sizes—64B, 128B, 512B, 1024B, and 1500B—using bi-directional UDP traffic at line rate over 300-second intervals. The results showed distinct behaviours for small and large frames.
Small frames (64B-512B):
- Throughput: Saturated at ~5 million packets per second (Mpps) per direction:
- 4 Gbps for 64B frames
- 7 Gbps for 128B frames
- 16 Gbps for 512B frames
- Packet Loss: 0% across all runs, showcasing reliable delivery under saturation.
Large frames (1024B-1500B):
- Throughput: Achieved line-rate performance:
- 27-31 Gbps for 1024B and 1500B frames
- Packet rates: 2.97 Mpps for 1024B, 2.03 Mpps for 1500B
- Packet Loss: 0%, demonstrating robust high-bandwidth handling.
The results in Figures 3 and 4 underscore OVS-DPDK’s strength in maximising NIC bandwidth for bulk data transfers.
Figure 3: Throughput performance in Gbps.
Figure 4: Throughput performance in Mpps.
12-hour stability test: Endurance under pressure
To test long-term reliability, we ran a 12-hour test with sustained bi-directional UDP traffic at 4 Mpps per direction (8 Mpps total) using 64B frames, simulating a high packet-rate workload.
Key metrics:
- Packet rate: Steady 8 Mpps (4 Mpps Rx, 4 Mpps Tx) with no degradation.
- Throughput: Stabilised at ~4 Gbps, consistent with small-frame limits.
- Packet loss: 0%, reinforcing reliability under prolonged stress.
- BGP stability: The BGP session between FRR (ASN 64999) and Trex-BIRD remained “Established” with no flaps.
The results in Figures 5 and 6 confirm the setup’s endurance and stability, vital for production environments.
Figure 5: This is a diagram of 12Hr throughput performance.
Figure 6: This diagram shows BGP connection statistics.
System metrics of the compute node
We monitored system-level metrics on the compute node to gauge resource utilization and efficiency.
Datapath hits:
- PMD threads achieved a steady 4 Mpps with 64B frames (Figure 7).
- Rx queues on dpdk2 and vhu2fcb049b-8e averaged 3.78M hits/sec, aligning with the 4 Mpps target.
-
Tuning was critical: DPDK ports used single Rx/Tx queues with 2048 descriptors, hugepages (200x 1GB on compute, 10x 1GB on the Trex VM), and CPU isolation ensured optimal resource use (Figure 8).
Figure 7: This chart shows the OvsDpdk PMD core datapath cycle.
Figure 8: This chart shows the OvsDpdk PMD core cycle.
Ovs-vSwitchd flow performance (Figures 9 & 10):
- Consistent packet rate of ~8 Mpps (Tx+Rx) on DPDK ports over 12 hours.
- No unexpected spikes on FRR VLAN ports, indicating reliable sustained operation.
Figure 9: This chart shows the Ovs Ofctl throughput in flow.
Figure 10: This chart shows the Ovs packet flow.
These metrics reflect a well-tuned system, efficiently handling packet processing and resource allocation.
Key takeaways: Strengths and limitations
Our evaluation revealed notable strengths and areas for improvement.
Strengths:
- High throughput and efficiency: Excellent performance for large frames with zero packet loss.
- Long-term stability: Consistent 8 Mpps over 12 hours with stable BGP sessions.
- DPDK advantage: 3.5x improvement over kernel networking, with efficient resource use.
- Resource optimisation: Minimal wastage, thanks to precise tuning.
Limitations:
- Small packet bottlenecks: Capped at 5 Mpps for small frames (64B-512B) due to single Rx queue limitations, impacting latency-sensitive workloads.
- Architectural constraints: Lack of multi-provider network support and VLAN provider network capabilities restrict scalability.
- Manual overhead: Manual MAC bindings and OpenFlow tweaks add complexity.
- FDP maturity: BGP/BFD sync issues and inconsistent FRR tap performance suggest Fast Datapath isn’t fully production-ready.
Summary
The ML2/OVN with OVS-DPDK and OVN-BGP-Agent integration excels in large data transfers and long-term stability, but small packet handling and architectural limitations highlight areas for growth. With targeted enhancements, this solution could become a cornerstone of high-performance cloud networking in future RHOSO releases. Stay tuned for more updates as we refine this technology.
Learn more:
- Deploying a dynamic routing environment
- Deploying a network functions virtualization environment
- Welcome to the documentation of OVN BGP Agent
- BGP Floating IPs over L2 Segmented Networks
- OpenStack Docs: BGP dynamic routing
- OVN BGP Agent upstream documentation
- OVN BGP Agent: In-depth traffic flow inspection blogpost
The post BGP dynamic routing with Fast Data Path on RHOSO 18 appeared first on Red Hat Developer.