Scaling OVSDB Access With Relay

Open vSwitch 2.16 introduced support for OVSDB Relay mode with the goal to increase database scalability for a big deployments. Mainly, OVN (Open Virtual Network) Southbound Database deployments. This document describes the main concept and provides the configuration examples.

What is OVSDB Relay?

Relay is a database service model in which one ovsdb-server (relay) connects to another standalone or clustered database server (relay source) and maintains in-memory copy of its data, receiving all the updates via this OVSDB connection. Relay server handles all the read-only requests (monitors and transactions) on its own and forwards all the transactions that requires database modifications to the relay source.

Why is this needed?

Some OVN deployment could have hundreds or even thousands of nodes. On each of these nodes there is an ovn-controller, which is connected to the OVN_Southbound database that is served by a standalone or clustered OVSDB. Standalone database is handled by a single ovsdb-server process and clustered could consist of 3 to 5 ovsdb-server processes. For the clustered database, higher number of servers may significantly increase transaction latency due to necessity for these servers to reach consensus. So, in the end limited number of ovsdb-server processes serves ever growing number of clients and this leads to performance issues.

Read-only access could be scaled up with OVSDB replication on top of active-backup service model, but ovn-controller is a read-mostly client, not a read-only, i.e. it needs to execute write transactions from time to time. Here relay service model comes into play.

2-Tier Deployment

Solution for the scaling issue could look like a 2-tier deployment, where a set of relay servers is connected to the main database cluster (OVN_Southbound) and clients (ovn-conrtoller) connected to these relay servers:

+--------------------+   +----+ ovsdb-relay-1 +--+---+ client-1
|                    |   |                       |
|    Clustered       |   |                       +---+ client-2
|     Database       |   |                        ...
|                    |   |                       +---+ client-N
|        |   |
|  ovsdb-server-2    |   |
|   +        +       |   +----+ ovsdb-relay-2 +--+---+ client-N+1
|   |        |       |   |                       |
|   |        +       +---+                       +---+ client-N+2
|   |     |   |                        ...
|   | ovsdb-server-1 |   |                       +---+ client-2N
|   |        +       |   |
|   |        |       |   |
|   +        +       |   +      ... ... ... ... ...
|  ovsdb-server-3    |   |
|        |   |                       +---+ client-KN-1
|                    |   |       172.16.0.K      |
+--------------------+   +----+ ovsdb-relay-K +--+---+ client-KN

In practice, the picture might look a bit more complex, because all relay servers might connect to any member of a main cluster and clients might connect to any relay server of their choice.

Assuming that servers of a main cluster started like this:

$ ovsdb-server --remote=ptcp:6642: ovn-sb-1.db

The same for other two servers. In this case relay servers could be started like this:

$ REMOTES=tcp:,tcp:,tcp:
$ ovsdb-server --remote=ptcp:6642: relay:OVN_Southbound:$REMOTES
$ ...
$ ovsdb-server --remote=ptcp:6642:172.16.0.K relay:OVN_Southbound:$REMOTES

Every relay server could connect to any of the cluster members of their choice, fairness of load distribution is achieved by shuffling remotes.

For the actual clients, they could be configured to connect to any of the relay servers. For ovn-controllers the configuration could look like this:

$ REMOTES=tcp:,...,tcp:172.16.0.K:6642
$ ovs-vsctl set Open_vSwitch . external-ids:ovn-remote=$REMOTES

Setup like this allows the system to serve K * N clients while having only K actual connections on the main clustered database keeping it in a stable state.

It’s also possible to create multi-tier deployments by connecting one set of relay servers to another (smaller) set of relay servers, or even create tree-like structures with the cost of increased latency for write transactions, because they will be forwarded multiple times.