More

PC conflicts

Cong Wang

Submitted

Abstract

1. Socket Locality based Flow Selection in MPTCP

A multi-socket machine commonly has as many locally attached NICs as the number of sockets. Receiving network data on the NIC, that is closest to the application, leads to better end performance. Although it is possible to use static configuration to use the nearest NIC at application launch, there is no system solution for dynamically migrating the network traffic when the application needs to be migrated to a different socket.

We present a new MPTCP subflow selection strategy that allows a sender to send packets on the NIC closest to the receiving application. The current MPTCP path selection algorithm only considers metrics between TCP layers; extra latency introduced during application receive and send system calls are not reflected. Our goal is to study end-to-end metrics in flow selection logic to achieve better application throughput and latency. An added benefit of using the local NIC is also that it can free up overall system bandwidth and improve access latencies to memory and I/O.

We will also demonstrate the performance number of the redis memtier bechmark using our solution. Preliminary data suggests a 10% improvement in throughput and 6% in tail latencies. Cross socket communication gets reduced and traffic is aligned uniformly between multiple NICs. We will also highlight the behaviour of CFS when the load on the system increases. Ideally, to benefit from this strategy, the application should rarely schedule across sockets.

We are in the process of creating an RFC of code changes to receive community feedback.

2. A DPDK Implementation of MPTCP

Furthermore, we innovatively implemented the user-space MPTCP protocol stack based on DPDK, and applied it to the storage and high-performance computing scenarios in the data centers to improve the reliability of network transmission and the utilization of network resources, especially to solve practical problems such as switch black holes that are difficult to handle with standard TCP. The protocol stack refers to the RFC8684 specification, which can interact with the kernel MPTCP, and also supports automatic fallback to standard TCP when the MPTCP negotiations fail. These features facilitate the replacement and evolution of existing TCP applications. The protocol stack mainly includes two modules: sub-flow management and sub-flow selection. The sub-flow management module is responsible for the creation, destruction and address notification of sub-flows. With the help of the address notification and NIC flow bifurcation capability, it ensures that the traffic of multiple sub-flows can be processed in one DPDK PMD, which fully utilizes the multi-core processing while also maintaining the sharing nothing feature of the DPDK PMD to avoid locks; the sub-flow selection module is responsible for the sub-flow selection of sending data packets. It supports multiple selection strategies to meet the performance requirements in different scenarios. We also implemented the zero-copy interface to further improve the network throughput and latency by effectively avoiding the copy overhead between the applications and the protocol stack.

Authors (blind)

Zhiqiang You (ByteDance) <youzhiqiang@bytedance.com>

Satish Kumar (ByteDance) <satish.kumar@bytedance.com>

Wanchen Wang (ByteDance) <wangwanchen.0316@bytedance.com>

Punit Agrawal (ByteDance) <punit.agrawal@bytedance.com>

Submission Type
Talk
Submission Label
Nuts and Bolts
Estimated Length Of Time For Presentation (in minutes)
30
Attendance
Physically

To edit this submission, sign in using your email and password.