DPDK Bridges

Bridge must be specially configured to utilize DPDK-backed physical and virtual ports.

Quick Example

This example demonstrates how to add a bridge that will take advantage of DPDK:

$ ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev

This assumes Open vSwitch has been built with DPDK support. Refer to Open vSwitch with DPDK for more information.

Extended & Custom Statistics

The DPDK Extended Statistics API allows PMDs to expose a unique set of statistics. The Extended Statistics are implemented and supported only for DPDK physical and vHost ports. Custom statistics are a dynamic set of counters which can vary depending on the driver. Those statistics are implemented for DPDK physical ports and contain all “dropped”, “error” and “management” counters from XSTATS. A list of all XSTATS counters can be found here.

Note

vHost ports only support RX packet size-based counters. TX packet size counters are not available.

To enable statistics, you have to enable OpenFlow 1.4 support for OVS. To configure a bridge, br0, to support OpenFlow version 1.4, run:

$ ovs-vsctl set bridge br0 datapath_type=netdev \
  protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14

Once configured, check the OVSDB protocols column in the bridge table to ensure OpenFlow 1.4 support is enabled:

$ ovsdb-client dump Bridge protocols

You can also query the port statistics by explicitly specifying the -O OpenFlow14 option:

$ ovs-ofctl -O OpenFlow14 dump-ports br0

There are custom statistics that OVS accumulates itself and these stats has ovs_ as prefix. These custom stats are shown along with other stats using the following command:

$ ovs-vsctl get Interface <iface> statistics

EMC Insertion Probability

By default 1 in every 100 flows is inserted into the Exact Match Cache (EMC). It is possible to change this insertion probability by setting the emc-insert-inv-prob option:

$ ovs-vsctl --no-wait set Open_vSwitch . other_config:emc-insert-inv-prob=N

where:

N
A positive integer representing the inverse probability of insertion, i.e. on average 1 in every N packets with a unique flow will generate an EMC insertion.

If N is set to 1, an insertion will be performed for every flow. If set to 0, no insertions will be performed and the EMC will effectively be disabled.

With default N set to 100, higher megaflow hits will occur initially as observed with pmd stats:

$ ovs-appctl dpif-netdev/pmd-stats-show

For certain traffic profiles with many parallel flows, it’s recommended to set N to ‘0’ to achieve higher forwarding performance.

It is also possible to enable/disable EMC on per-port basis using:

$ ovs-vsctl set interface <iface> other_config:emc-enable={true,false}

Note

This could be useful for cases where different number of flows expected on different ports. For example, if one of the VMs encapsulates traffic using additional headers, it will receive large number of flows but only few flows will come out of this VM. In this scenario it’s much faster to use EMC instead of classifier for traffic from the VM, but it’s better to disable EMC for the traffic which flows to the VM.

For more information on the EMC refer to Open vSwitch with DPDK .

SMC cache

SMC cache or signature match cache is a new cache level after EMC cache. The difference between SMC and EMC is SMC only stores a signature of a flow thus it is much more memory efficient. With same memory space, EMC can store 8k flows while SMC can store 1M flows. When traffic flow count is much larger than EMC size, it is generally beneficial to turn off EMC and turn on SMC. It is currently turned off by default.

To turn on SMC:

$ ovs-vsctl --no-wait set Open_vSwitch . other_config:smc-enable=true

Datapath Classifier Performance

The datapath classifier (dpcls) performs wildcard rule matching, a compute intensive process of matching a packet miniflow to a rule miniflow. The code that does this compute work impacts datapath performance, and optimizing it can provide higher switching performance.

Modern CPUs provide extensive SIMD instructions which can be used to get higher performance. The CPU OVS is being deployed on must be capable of running these SIMD instructions in order to take advantage of the performance benefits. In OVS v2.14 runtime CPU detection was introduced to enable identifying if these CPU ISA additions are available, and to allow the user to enable them.

OVS provides multiple implementations of dpcls. The following command enables the user to check what implementations are available in a running instance

$ ovs-appctl dpif-netdev/subtable-lookup-prio-get
Available lookup functions (priority : name)
        0 : autovalidator
        1 : generic
        0 : avx512_gather

To set the priority of a lookup function, run the prio-set command

$ ovs-appctl dpif-netdev/subtable-lookup-prio-set avx512_gather 5
Lookup priority change affected 1 dpcls ports and 1 subtables.

The highest priority lookup function is used for classification, and the output above indicates that one subtable of one DPCLS port is has changed its lookup function due to the command being run. To verify the prioritization, re-run the get command, note the updated priority of the avx512_gather function

$ ovs-appctl dpif-netdev/subtable-lookup-prio-get
Available lookup functions (priority : name)
        0 : autovalidator
        1 : generic
        5 : avx512_gather

If two lookup functions have the same priority, the first one in the list is chosen, and the 2nd occurance of that priority is not used. Put in logical terms, a subtable is chosen if its priority is greater than the previous best candidate.

CPU ISA Testing and Validation

As multiple versions of DPCLS can co-exist, each with different CPU ISA optimizations, it is important to validate that they all give the exact same results. To easily test all DPCLS implementations, an autovalidator implementation of the DPCLS exists. This implementation runs all other available DPCLS implementations, and verifies that the results are identical.

Running the OVS unit tests with the autovalidator enabled ensures all implementations provide the same results. Note that the performance of the autovalidator is lower than all other implementations, as it tests the scalar implementation against itself, and against all other enabled DPCLS implementations.

To adjust the DPCLS autovalidator priority, use this command

$ ovs-appctl dpif-netdev/subtable-lookup-prio-set autovalidator 7

Running Unit Tests with Autovalidator

To run the OVS unit test suite with the DPCLS autovalidator as the default implementation, it is required to recompile OVS. During the recompilation, the default priority of the autovalidator implementation is set to the maximum priority, ensuring every test will be run with every lookup implementation

$ ./configure --enable-autovalidator

Compile OVS in debug mode to have ovs_assert statements error out if there is a mis-match in the DPCLS lookup implementation.

Datapath Interface Performance

The datapath interface (DPIF) or dp_netdev_input() is responsible for taking packets through the major components of the userspace datapath; such as miniflow_extract, EMC, SMC and DPCLS lookups, and a lot of the performance stats associated with the datapath.

Just like with the SIMD DPCLS feature above, SIMD can be applied to the DPIF to improve performance.

OVS provides multiple implementations of the DPIF. The available implementations can be listed with the following command

$ ovs-appctl dpif-netdev/dpif-impl-get
Available DPIF implementations:
  dpif_scalar (pmds: none)
  dpif_avx512 (pmds: 1,2,6,7)

By default, dpif_scalar is used. The DPIF implementation can be selected by name

$ ovs-appctl dpif-netdev/dpif-impl-set dpif_avx512
DPIF implementation set to dpif_avx512.

$ ovs-appctl dpif-netdev/dpif-impl-set dpif_scalar
DPIF implementation set to dpif_scalar.

Running Unit Tests with AVX512 DPIF

Since the AVX512 DPIF is disabled by default, a compile time option is available in order to test it with the OVS unit test suite. When building with a CPU that supports AVX512, use the following configure option

$ ./configure --enable-dpif-default-avx512

The following line should be seen in the configure output when the above option is used

checking whether DPIF AVX512 is default implementation... yes

Miniflow Extract

Miniflow extract (MFEX) performs parsing of the raw packets and extracts the important header information into a compressed miniflow. This miniflow is composed of bits and blocks where the bits signify which blocks are set or have values where as the blocks hold the metadata, ip, udp, vlan, etc. These values are used by the datapath for switching decisions later. The Optimized miniflow extract is traffic specific to speed up the lookup, whereas the scalar works for ALL traffic patterns

Most modern CPUs have SIMD capabilities. These SIMD instructions are able to process a vector rather than act on one variable. OVS provides multiple implementations of miniflow extract. This allows the user to take advantage of SIMD instructions like AVX512 to gain additional performance.

A list of implementations can be obtained by the following command. The command also shows whether the CPU supports each implementation

$ ovs-appctl dpif-netdev/miniflow-parser-get
    Available Optimized Miniflow Extracts:
        autovalidator (available: True, pmds: none)
        scalar (available: True, pmds: 1,15)
        study (available: True, pmds: none)

An implementation can be selected manually by the following command

$ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] [name]
                                             [study_cnt]

The above command has two optional parameters: study_cnt and core_id. The core_id sets a particular miniflow extract function to a specific pmd thread on the core. The third parameter study_cnt, which is specific to study and ignored by other implementations, means how many packets are needed to choose the best implementation.

Also user can select the study implementation which studies the traffic for a specific number of packets by applying all available implementations of miniflow extract and then chooses the one with the most optimal result for that traffic pattern. The user can optionally provide an packet count [study_cnt] parameter which is the minimum number of packets that OVS must study before choosing an optimal implementation. If no packet count is provided, then the default value, 128 is chosen. Also, as there is no synchronization point between threads, one PMD thread might still be running a previous round, and can now decide on earlier data.

The per packet count is a global value, and parallel study executions with differing packet counts will use the most recent count value provided by user.

Study can be selected with packet count by the following command

$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024

Study can be selected with packet count and explicit PMD selection by the following command

$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024

In the above command the first parameter is the CORE ID of the PMD thread and this can also be used to explicitly set the miniflow extraction function pointer on different PMD threads.

Scalar can be selected on core 3 by the following command where study count should not be provided for any implementation other than study

$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar

Miniflow Extract Validation

As multiple versions of miniflow extract can co-exist, each with different CPU ISA optimizations, it is important to validate that they all give the exact same results. To easily test all miniflow implementations, an autovalidator implementation of the miniflow exists. This implementation runs all other available miniflow extract implementations, and verifies that the results are identical.

Running the OVS unit tests with the autovalidator enabled ensures all implementations provide the same results.

To set the Miniflow autovalidator, use this command

$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator

A compile time option is available in order to test it with the OVS unit test suite. Use the following configure option

$ ./configure --enable-mfex-default-autovalidator

Unit Test Miniflow Extract

Unit test can also be used to test the workflow mentioned above by running the following test-case in tests/system-dpdk.at

make check-dpdk TESTSUITEFLAGS='-k MFEX'
OVS-DPDK - MFEX Autovalidator

The unit test uses mulitple traffic type to test the correctness of the implementaions.

The MFEX commands can also be tested for negative and positive cases to verify that the MFEX set command does not allow for incorrect parameters. A user can directly run the following configuration test case in tests/system-dpdk.at

make check-dpdk TESTSUITEFLAGS='-k MFEX'
OVS-DPDK - MFEX Configuration

Running Fuzzy test with Autovalidator

Fuzzy tests can also be done on miniflow extract with the help of auto-validator and Scapy. The steps below describes the steps to reproduce the setup with IP being fuzzed to generate packets.

Scapy is used to create fuzzy IP packets and save them into a PCAP

pkt = fuzz(Ether()/IP()/TCP())

Set the miniflow extract to autovalidator using

$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator

OVS is configured to receive the generated packets

$ ovs-vsctl add-port br0 pcap0 -- \
    set Interface pcap0 type=dpdk options:dpdk-devargs=net_pcap0
    "rx_pcap=fuzzy.pcap"

With this workflow, the autovalidator will ensure that all MFEX implementations are classifying each packet in exactly the same way. If an optimized MFEX implementation causes a different miniflow to be generated, the autovalidator has ovs_assert and logging statements that will inform about the issue.

Unit Fuzzy test with Autovalidator

Unit test can also be used to test the workflow mentioned above by running the following test-case in tests/system-dpdk.at

make check-dpdk TESTSUITEFLAGS='-k MFEX'
OVS-DPDK - MFEX Autovalidator Fuzzy