Getting started with C++ futures
This is part of the Magma
project led by Facebook. By the end of 2019 we at FRINX have contributed the CLI stack which provides various functions on
top of the basic SSH client like keep alive, reconnect, caching etc. This post
describes the the evolution of how we are using async functionality provided by
the Folly
library to meet our scalability requirements.
Architecture
The following text describes the high level
architecture accompanied with pseudo code.
We started with a simple abstract class
describing the CLI client:
class CLI {
string
executeRead(ReadCommand c);
string
executeWrite(WriteCommand c);
}
For the purpose of this text it is not
important how those two methods differ. The basic idea is to create one CLI
object per device, run one or more commands and collect output. This design
allows us to introduce nesting: we define several classes with single
responsibility:
●
KeepAliveCli - send an empty
command periodically. execute* methods just call inner CLI
●
ReconnectingCli - both execute*
methods call inner CLI, however if exception indicating connection error is
returned, schedule reconnect
●
Etc.
We can see that even though the layers are
separated, they may require some work to be done asynchronously. We need an Executor where those tasks can run. We might
need to separate executors for each layer so that resource starvation on one
layer can be isolated in the testing environment. In production we expect that
most of CLI layers will share the same executor with a thread count equal to
CPU cores.
Unblocking callers
The first version clearly blocks callers, so
we came up with a second version:
class CLI {
Future <
string > executeRead(ReadCommand c);
Future <
string > executeWrite(WriteCommand c);
}
This allows us to have a thread safe interface
which can be used by many callers simultaneously. Returning Futures in folly is very easy, just pass a
lambda to an executor:
auto future = via(getCPUExecutor())
.thenValue([params](auto) {
// do some
computation
});
Sometimes we need to create futures without
running them immediately. This is possible using Promise class. Promise is the holder of a
value that should be computed later, whereas Future is a read only view into
its Promise.
shared_ptr < Promise < string >> promise = make_shared
< Promise < string >> ();
return promise - > getFuture();
// later, fulfill the promise asynchronously:
promise - > setValue(result);
Future Chaining
To get the actual result of the computation
one can call the `get()` method on the Future, but this call blocks the caller
until the value is available. As reported in other environments, it is important not to mix async
and blocking calls. Luckily, folly Futures allow chaining more work that will
be executed when the target Future finishes:
auto future = via(getCPUExecutor())
.thenValue([](auto) {
/* 1. */
MLOG(MDEBUG) << "setReconnecting: sleeping";
})
.delayed(1 s,
getTimekeeper()) /* 2. */
.thenValue(
[](auto) -
> Future < string > {
return
doSomething();
}) /* 3. */
.thenValue([](string value) {
MLOG(MDEBUG) << "doSomething finished with: " <<
value; /* 4. */
});
First future will only write a log message,
after that a delay is executed (non blocking), after which the third future is
executed. Note that futures may return not only values but also futures: return
type of `doSomething()` is a Future, so folly will execute the last future only
after the string value is obtained.
Queueing CLI commands
Moving from one blocking caller to many async
callers has brought one major challenge: We need to create a queue of requests
so that if a command is executed or we are waiting for output, no other command
is sent to the device.
Executors have their own queues, there is even
SerialExecutor that guarantees non-concurrent
execution of tasks, however our task is just calling inner CLI’s execute
methods, obtaining inner Futures and possibly chaining lambdas for post
processing. SerialExecutor would only wait until inner Future is obtained.
Calling `get()` on the inner Future would solve the issue at the expense of
blocking the current thread.
This was the reason we created QueuedCli,
which is yet another layer, with artificial Promise/Future model described
above and a MPSC queue that is both safe and efficient.
When `execute*` method is called, a new Promise/Future pair is created and put
into a queue. QueuedCli contains its own SerialExecutor where the queue is
processed: we wait until the inner Future finishes asynchronously, then set the
value of artificial Promise and then process the head of the queue again. Check
out the QueuedCli source code for details.
Common pitfalls
Scheduling chained futures on
separate executors
If an API returns Future, the caller is able
to chain subsequent futures on the same executor. This can lead to hard to
trace bugs, especially if you want to have separate executors per API layer.
Solution is to always return SemiFuture instead of Future in public
methods. The continuations (thenValue, thenError) will still be available, but
the caller must explicitly specify executor that will run them.
Chaining errors
Contrary to futures in e.g. Java, folly future
continuations form flat hierarchy:
In Java, it is possible to write tree-like
handlers - success and error listeners can have separate list of listeners.
In folly, the structure forms a list:
auto f1 = getFuture() // f1
.thenValue(...)
// f2
.within(...)
.thenValue(...)
// f3
.thenError(...)
// f4
If we want to define a listener, we have to
move the value, and this prevents us from defining more handlers for the same
future. The example above shows the problem: if f2 does not finish within
specified timeout, it will throw FutureTimeout. We might wish to stop execution
there, and use the f4 handler only if f3 execution fails, but it is not
possible to define the relationships this way. Instead, f4 handler needs to
take into account all previous failures - timeout as well as failures from f1.
Destructors competing with
lambdas that capture `this`
One problem that has bitten us is interaction
between destructor and async functions. In our first version, we did not
anticipate this and wrote our lambdas in fashion similar to this pseudo code:
void executeRead(cmd) {
innerCli.executeRead(cmd).thenValue([ = ](...) {
LOG(INFO)
<< this.connectionId << “finished”;
});
}
We were capturing `this`, and since there was
no way to synchronize lambda with destructors, our program would segfault when
destructors were called before lambdas finished.
Our second version used an enable_shared_from_this pattern. We would stop
getting segfaults but our reconnect functionality started acting strange: since
dropping Cli would not necessarily call the destructor, we often ended with
trying to connect without dropping the old connection first.
To make sure our destructors work as expected,
we redesigned our classes to capture a subclass instance instead of `this`.
However we would still need to wait until lambdas finish before the destructor
could finish. This manifested as various timeouts on every Cli layer, but worse
was blocking in possibly async context.
Our last design which is currently in place
looks like this:
We introduced a new method, `SemiFuture
destroy()`, which is required not to block. It cancels all timeouts, using our
custom Timekeeper and propagates the call to
innermost cli. This will drop the connection and effectively cancel all inflight
futures. Our reconnecting Cli will be notified when the `destroy()` method
finishes, and then drops the Cli stack. Destructors do not need to block
anymore, since the connection is dropped and lambdas do not hold any resource
we might care about.
Summary
In this post we tried to explain basic usage
of folly’s Future API together with real world code usage that leverages an
async architecture.
Folly futures are no joke !
ReplyDelete