User Statically-Defined Tracing (USDT) probes

Sometimes it’s desired to troubleshoot one of OVS’s components in the field. One of the techniques used is to add dynamic tracepoints, for example using perf. However, the desired dynamic tracepoint and/or the desired variable, might not be available due to compiler optimizations.

In this case, a well-thought-off, static tracepoint could be permanently added, so it’s always available. For OVS we use the DTrace probe macro’s, which have little to no overhead when disabled. Various tools exist to enable them. See some examples below.

Compiling with USDT probes enabled

Since USDT probes are compiled out by default, a compile-time option is available to include them. To add the probes to the generated code, use the following configure option

$ ./configure --enable-usdt-probes

The following line should be seen in the configure output when the above option is used

checking whether USDT probes are enabled... yes

As USDT probes internally use the DTRACE_PROBExx macros, which are part of the SystemTap framework, you need to install the appropriate package for your Linux distribution. For example, on Fedora, you need to install the systemtap-sdt-devel package.

Listing available probes

There are various ways to display USDT probes available in a userspace application. Here we show three examples. All assuming ovs-vswitchd is in the search path with USDT probes enabled:

You can use the perf tool as follows

$ perf buildid-cache --add $(which ovs-vswitchd)
$ perf list | grep sdt_
  sdt_main:poll_block                                [SDT event]
  sdt_main:run_start                                 [SDT event]

You can use the bpftrace tool

# bpftrace -l "usdt:$(which ovs-vswitchd):*"
usdt:/usr/sbin/ovs-vswitchd:main:poll_block
usdt:/usr/sbin/ovs-vswitchd:main:run_start

Note

If you execute this on a running process, bpftrace -lp $(pidof ovs-vswitchd) "usdt:*" , it will list all USDT events, i.e., also the ones available in the used shared libraries.

Finally, you can use the tplist tool which is part of the bcc framework

$ /usr/share/bcc/tools/tplist -vv -l $(which ovs-vswitchd)
b'main':b'poll_block' [sema 0x0]
  location #1 b'/usr/sbin/ovs-vswitchd' 0x407fdc
b'main':b'run_start' [sema 0x0]
  location #1 b'/usr/sbin/ovs-vswitchd' 0x407ff6

Using probes

We will use the OVS sandbox environment in combination with the probes shown above to try out some of the available trace tools. To start up the virtual environment use the make sandbox command. In addition we have to create a bridge to kick of the main bridge loop

$ ovs-vsctl add-br test_bridge
$ ovs-vsctl show
055acdca-2f0c-4f6e-b542-f4b6d2c44e08
    Bridge test_bridge
        Port test_bridge
            Interface test_bridge
                type: internal

perf

Perf is using Linux uprobe based event tracing to for capturing the events. To enable the main:* probes as displayed above and take an actual trace, you need to execute the following sequence of perf commands

# perf buildid-cache --add $(which ovs-vswitchd)

# perf list | grep sdt_
  sdt_main:poll_block                                [SDT event]
  sdt_main:run_start                                 [SDT event]

# perf probe --add=sdt_main:poll_block --add=sdt_main:run_start
Added new events:
  sdt_main:poll_block  (on %poll_block in /usr/sbin/ovs-vswitchd)
  sdt_main:run_start   (on %run_start in /usr/sbin/ovs-vswitchd)

You can now use it in all perf tools, such as:

  perf record -e sdt_main:run_start -aR sleep 1

# perf record -e sdt_main:run_start -e sdt_main:poll_block \
    -p $(pidof ovs-vswitchd) sleep 30
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.039 MB perf.data (132 samples) ]

# perf script
    ovs-vswitchd  8576 [011] 21031.340433:  sdt_main:run_start: (407ff6)
    ovs-vswitchd  8576 [011] 21031.340516: sdt_main:poll_block: (407fdc)
    ovs-vswitchd  8576 [011] 21031.841726:  sdt_main:run_start: (407ff6)
    ovs-vswitchd  8576 [011] 21031.842088: sdt_main:poll_block: (407fdc)
...

Note that the above examples works with the sandbox environment, so make sure you execute the above command while in the sandbox shell!

There are a lot more options available with perf, for example, the --call-graph dwarf option, which would give you a call graph in the perf script output. See the perf documentation for more information.

One other interesting feature is that the perf data can be converted for use by the trace visualizer Trace Compass. This can be done using the --all --to-ctf option to the perf data convert tool.

bpftrace

bpftrace is a high-level tracing language based on eBPF, which can be used to script USDT probes. Here we will show a simple one-liner to display the USDT probes being hit. However, the script section below reference some more advanced bpftrace scripts.

This is a simple bpftrace one-liner to show all main:* USDT probes

# bpftrace -p $(pidof ovs-vswitchd) -e \
    'usdt::main:* { printf("%s %u [%u] %u %s\n",
      comm, pid, cpu, elapsed, probe); }'
Attaching 2 probes...
ovs-vswitchd 8576 [11] 203833199 usdt:main:run_start
ovs-vswitchd 8576 [11] 204086854 usdt:main:poll_block
ovs-vswitchd 8576 [11] 221611985 usdt:main:run_start
ovs-vswitchd 8576 [11] 221892019 usdt:main:poll_block

bcc

The BPF Compiler Collection (BCC) is a set of tools and scripts that also use eBPF for tracing. The example below uses the trace tool to show the events while they are being generated

# /usr/share/bcc/tools/trace -T -p $(pidof ovs-vswitchd) \
    'u::main:run_start' 'u::main:poll_block'
TIME     PID     TID     COMM            FUNC
15:49:06 8576    8576    ovs-vswitchd    main:run_start
15:49:06 8576    8576    ovs-vswitchd    main:poll_block
15:49:06 8576    8576    ovs-vswitchd    main:run_start
15:49:06 8576    8576    ovs-vswitchd    main:poll_block
^C

Scripts

To not have to re-invent the wheel when trying to debug complex OVS issues, a set of scripts are provided in the source repository. They are located in the utilities/usdt-scripts/ directory, and each script contains detailed information on how they should be used, and what information they provide.

Available probes

The next sections describes all the available probes, their use case, and if used in any script, which one. Any new probes being added to OVS should get their own section. See the below “Adding your own probes” section for the used naming convention.

Available probes in ovs_vswitchd:

  • dpif_netlink_operate__:op_flow_del

  • dpif_netlink_operate__:op_flow_execute

  • dpif_netlink_operate__:op_flow_get

  • dpif_netlink_operate__:op_flow_put

  • dpif_recv:recv_upcall

  • main:poll_block

  • main:run_start

  • revalidate_ukey__:entry

  • revalidate_ukey__:exit

  • udpif_revalidator:start_dump

  • udpif_revalidator:sweep_done

probe dpif_recv:recv_upcall

Description:

This probe gets triggered when the datapath independent layer gets notified that a packet needs to be processed by userspace. This allows the probe to intercept all packets sent by the kernel to ovs-vswitchd. The upcall_monitor.py script uses this probe to display and capture all packets sent to ovs-vswitchd.

Arguments:

  • arg0: (struct dpif *)->full_name

  • arg1: (struct dpif_upcall *)->type

  • arg2: dp_packet_data((struct dpif_upcall *)->packet)

  • arg3: dp_packet_size((struct dpif_upcall *)->packet)

  • arg4: (struct dpif_upcall *)->key

  • arg5: (struct dpif_upcall *)->key_len

Script references:

  • utilities/usdt-scripts/upcall_cost.py

  • utilities/usdt-scripts/upcall_monitor.py

probe main:run_start

Description:

The ovs-vswitchd’s main process contains a loop that runs every time some work needs to be done. This probe gets triggered every time the loop starts from the beginning. See also the main:poll_block probe below.

Arguments:

None

Script references:

  • utilities/usdt-scripts/bridge_loop.bt

probe main:poll_block

Description:

The ovs-vswitchd’s main process contains a loop that runs every time some work needs to be done. This probe gets triggered every time the loop is done, and it’s about to wait for being re-started by a poll_block() call returning. See also the main:run_start probe above.

Arguments:

None

Script references:

  • utilities/usdt-scripts/bridge_loop.bt

revalidate_ukey__:entry

Description:

This probe gets triggered on entry of the revalidate_ukey__() function.

Arguments:

  • arg0: (struct udpif *) udpif

  • arg1: (struct udpif_key *) ukey

  • arg2: (uint16_t) tcp_flags

  • arg3: (struct ofpbuf *) odp_actions

  • arg4: (struct recirc_refs *) recircs

  • arg5: (struct xlate_cache *) xcache

Script references:

  • utilities/usdt-scripts/reval_monitor.py

revalidate_ukey__:exit

Description:

This probe gets triggered right before the revalidate_ukey__() function exits.

Arguments:

  • arg0: (struct udpif *) udpif

  • arg1: (struct udpif_key *) ukey

  • arg2: (enum reval_result) result

Script references:

None

udpif_revalidator:start_dump

Description:

The ovs-vswitchd’s revalidator process contains a loop that runs every time revalidation work is needed. This probe gets triggered every time the dump phase has started.

Arguments:

  • arg0: (struct udpif *) udpif

  • arg1: (size_t) n_flows

Script references:

  • utilities/usdt-scripts/reval_monitor.py

udpif_revalidator:sweep_done

Description:

The ovs-vswitchd’s revalidator process contains a loop that runs every time revalidation work is needed. This probe gets triggered every time the sweep phase was completed.

Arguments:

  • arg0: (struct udpif *) udpif

  • arg1: (size_t) n_flows

  • arg2: (unsigned) MIN(ofproto_max_idle, ofproto_max_revalidator)

Script references:

  • utilities/usdt-scripts/reval_monitor.py

Adding your own probes

Adding your own probes is as simple as adding the OVS_USDT_PROBE() macro to the code. It’s similar to the DTRACE_PROBExx macro’s with the difference that it does automatically determine the number of optional arguments.

The macro requires at least two arguments. The first one being the provider, and the second one being the name. To keep some consistency with the probe naming, please use the following convention. The provider should be the function name, and the name should be the name of the tracepoint. If you do function entry and exit like probes, please use entry and exit.

If, for some reason, you do not like to use the function name as a provider, please prefix it with __, so we know it’s not a function name.

The remaining parameters, up to 10, can be variables, pointers, etc., that might be of interest to capture at this point in the code. Note that the provided variables can cause the compiler to be less effective in optimizing.