Getting started with C++ futures





We are currently writing an async framework for managing remote network devices over SSH. 
This is part of the Magma project led by Facebook. By the end of 2019 we at FRINX have contributed the CLI stack which provides various functions on top of the basic SSH client like keep alive, reconnect, caching etc. This post describes the the evolution of how we are using async functionality provided by the Folly library to meet our scalability requirements.

Architecture

The following text describes the high level architecture accompanied with pseudo code.
We started with a simple abstract class describing the CLI client:

class CLI {
    string executeRead(ReadCommand c);
    string executeWrite(WriteCommand c);
}

For the purpose of this text it is not important how those two methods differ. The basic idea is to create one CLI object per device, run one or more commands and collect output. This design allows us to introduce nesting: we define several classes with single responsibility:

      KeepAliveCli - send an empty command periodically. execute* methods just call inner CLI
      ReconnectingCli - both execute* methods call inner CLI, however if exception indicating connection error is returned, schedule reconnect
      Etc.

We can see that even though the layers are separated, they may require some work to be done asynchronously. We need an Executor where those tasks can run. We might need to separate executors for each layer so that resource starvation on one layer can be isolated in the testing environment. In production we expect that most of CLI layers will share the same executor with a thread count equal to CPU cores.

Unblocking callers

The first version clearly blocks callers, so we came up with a second version:

class CLI {
    Future < string > executeRead(ReadCommand c);
    Future < string > executeWrite(WriteCommand c);
}

This allows us to have a thread safe interface which can be used by many callers simultaneously. Returning Futures in folly is very easy, just pass a lambda to an executor:

auto future = via(getCPUExecutor())
    .thenValue([params](auto) {
        // do some computation
    });

Sometimes we need to create futures without running them immediately. This is possible using Promise class. Promise is the holder of a value that should be computed later, whereas Future is a read only view into its Promise.

shared_ptr < Promise < string >> promise = make_shared < Promise < string >> ();
return promise - > getFuture();
// later, fulfill the promise asynchronously:
promise - > setValue(result);

Future Chaining

To get the actual result of the computation one can call the `get()` method on the Future, but this call blocks the caller until the value is available. As reported in other environments, it is important not to mix async and blocking calls. Luckily, folly Futures allow chaining more work that will be executed when the target Future finishes:

auto future = via(getCPUExecutor())
    .thenValue([](auto) {
        /* 1. */
        MLOG(MDEBUG) << "setReconnecting: sleeping";
    })
    .delayed(1 s, getTimekeeper()) /* 2. */
    .thenValue(
        [](auto) - > Future < string > {
            return doSomething();
        }) /* 3. */
    .thenValue([](string value) {
        MLOG(MDEBUG) << "doSomething finished with: " << value; /* 4. */
    });

First future will only write a log message, after that a delay is executed (non blocking), after which the third future is executed. Note that futures may return not only values but also futures: return type of `doSomething()` is a Future, so folly will execute the last future only after the string value is obtained.

Queueing CLI commands

Moving from one blocking caller to many async callers has brought one major challenge: We need to create a queue of requests so that if a command is executed or we are waiting for output, no other command is sent to the device.
Executors have their own queues, there is even SerialExecutor that guarantees non-concurrent execution of tasks, however our task is just calling inner CLI’s execute methods, obtaining inner Futures and possibly chaining lambdas for post processing. SerialExecutor would only wait until inner Future is obtained. Calling `get()` on the inner Future would solve the issue at the expense of blocking the current thread.
This was the reason we created QueuedCli, which is yet another layer, with artificial Promise/Future model described above and a MPSC queue that is both safe and efficient. When `execute*` method is called, a new Promise/Future pair is created and put into a queue. QueuedCli contains its own SerialExecutor where the queue is processed: we wait until the inner Future finishes asynchronously, then set the value of artificial Promise and then process the head of the queue again. Check out the QueuedCli source code for details.

Common pitfalls

Scheduling chained futures on separate executors

If an API returns Future, the caller is able to chain subsequent futures on the same executor. This can lead to hard to trace bugs, especially if you want to have separate executors per API layer.
Solution is to always return SemiFuture instead of Future in public methods. The continuations (thenValue, thenError) will still be available, but the caller must explicitly specify executor that will run them.

Chaining errors

Contrary to futures in e.g. Java, folly future continuations form flat hierarchy:
In Java, it is possible to write tree-like handlers - success and error listeners can have separate list of listeners.
In folly, the structure forms a list:

auto f1 = getFuture() // f1
    .thenValue(...) // f2
    .within(...)
    .thenValue(...) // f3
    .thenError(...) // f4

If we want to define a listener, we have to move the value, and this prevents us from defining more handlers for the same future. The example above shows the problem: if f2 does not finish within specified timeout, it will throw FutureTimeout. We might wish to stop execution there, and use the f4 handler only if f3 execution fails, but it is not possible to define the relationships this way. Instead, f4 handler needs to take into account all previous failures - timeout as well as failures from f1.

Destructors competing with lambdas that capture `this`

One problem that has bitten us is interaction between destructor and async functions. In our first version, we did not anticipate this and wrote our lambdas in fashion similar to this pseudo code:

void executeRead(cmd) {
    innerCli.executeRead(cmd).thenValue([ = ](...) {
        LOG(INFO) << this.connectionId << “finished”;
    });
}

We were capturing `this`, and since there was no way to synchronize lambda with destructors, our program would segfault when destructors were called before lambdas finished.
Our second version used an enable_shared_from_this pattern. We would stop getting segfaults but our reconnect functionality started acting strange: since dropping Cli would not necessarily call the destructor, we often ended with trying to connect without dropping the old connection first.
To make sure our destructors work as expected, we redesigned our classes to capture a subclass instance instead of `this`. However we would still need to wait until lambdas finish before the destructor could finish. This manifested as various timeouts on every Cli layer, but worse was blocking in possibly async context.
Our last design which is currently in place looks like this:
We introduced a new method, `SemiFuture destroy()`, which is required not to block. It cancels all timeouts, using our custom Timekeeper and propagates the call to innermost cli. This will drop the connection and effectively cancel all inflight futures. Our reconnecting Cli will be notified when the `destroy()` method finishes, and then drops the Cli stack. Destructors do not need to block anymore, since the connection is dropped and lambdas do not hold any resource we might care about.

Summary

In this post we tried to explain basic usage of folly’s Future API together with real world code usage that leverages an async architecture.


Comments

Post a Comment

Popular posts from this blog

VodafoneZiggo deploys network-wide automation with FRINX

Elisa Polystar acquires FRINX to broaden its network automation portfolio

Second time´s a charm