1. What is Cadence?
Pedaling with a high Cadence (pedal revolutions) is called Spinning, Mashing or Grinding is slow (and bad).
The rapid rise in Big Data use cases over the last decade has been accelerated by popular massively scalable open source technologies such as Apache Cassandra® for storage, Apache Kafka® for streaming, and OpenSearch® for search. Now there’s a new member of the peloton, Cadence, for workflow orchestration.
My flatmate used to carry his Cello on a bike to music lessons, a bit like this, but music and cycling also have something more in common—Cadence!
In music, a Cadence is a musical phrase that sounds complete. And in cycling, Cadence is how fast the cyclist is pedalling, which influences efficiency and speed.
Workflow orchestration is the process of defining and executing a series of tasks and the order in which they are to be carried out. You may want some tasks to be performed in sequence, but others may be performed concurrently. You also see this idea in orchestral music scores, such as in this example, which provides the conductor, musicians, and choir instructions on what notes to play and when.
In typical workflow terminology (e.g. BPMN), a workflow has start and end events, activities (atomic or compound tasks), gateways (decisions), and sequences or flows—the order of activities in a workflow.
In bygone eras of computing I’ve experienced some of the previous attempts at implementing and modeling workflows, including Enterprise Java Beans (Stateful Session Beans were used for workflows), ebXML, BPEL, and modeling and evaluating workflows using discrete event simulation. Traditional workflow modeling uses the semantics of semi/formal notations including Finite State Machines (FSM), Markov Models, discrete event simulation, and Petri networks to specify, visualize, and analyze workflows.
Open source Cadence was developed by Uber, and based on the AWS SWF (Simple Workflow Service). It uses Apache Cassandra (and other databases) for persistence, and is designed to be highly fault tolerant and scalable. Perhaps the most surprising feature of Cadence, compared to other workflow engine approaches I’ve encountered previously, is that it is focused on helping developers write workflows primarily either in Java or Go (other languages are available), and therefore doesn’t come with any visualization notation or tool to specify workflows, and the semantics are just plain old code. However, it does have the ability (with the Cadence Web client) to visualize workflows as they execute.
Now, riding a bike needs a lot of gear. For example, you need a bike (logically), helmet, shoes, water bottle, glasses, gloves, a light, pump, lycra (optional!), etc. Likewise, successful use of Cadence needs a whole bunch of things including workflows, activities, domains, workflow clients, and workers. Let’s take a look at each in turn.
2. Cadence Workflows
Let’s start with the most basic Cadence concept, workflows. I’ll be using the cadence java client which requires you to download it and configure it to compile in your IDE of choice (I use Eclipse and Maven, and for simplicity I’ve omitted the imports). There’s a bunch of java client examples which inspired mine. First we need a workflow interface and implementation.
Workflow interface:
Workflow implementation:
In Cadence, there’s always exactly one method with the @WorkflowMethod annotation, which acts as the start event for the workflow, and contains the sequence flow and activities. Calling it starts a stateful workflow instance, which eventually ends.
In the implementation, the method startWorkflow() calls two other methods in sequence (task1, task2), giving us the workflow logic (simple sequential) and 2 activities.
Each running workflow instance has a unique ID, which we’ve used in the above example. You need a workflow ID as workflows can have multiple instances
However, there is some more complexity here. In reality, Cadence isn’t just a POJI/POJO, it is a platform to execute the stateful workflows scalably and reliably.
You’ll notice a timeout and activities and taskList above. Timeouts are needed as workflows, like music and cycling, must eventually end.
3. Cadence Activities
Activities are a core feature of Cadence. Workflows are stateful and fault-tolerant, but in a distributed microservices architecture, you will eventually want to call remote APIs to perform tasks. However, remote API calls can fail, and can’t be made fault-tolerant as they are external to the Cadence workflow. The solution Cadence uses is to allow any code, including remote calls, to be wrapped in a Cadence Activity, with the caveat that Cadence doesn’t recover activity state on failure, and activities are guaranteed to be executed at most once (idempotent activities will be automatically retried on failure, non-idempotent activity failures will need to be handled by business-specific logic including compensation activities, more information on RetryOptions can be found here).
Activities are just a POJO/POJI pair as well, containing the tasks/methods that are to be executed. Each method has a @ActivityMethod annotation with options.
We’ve already registered the Activities in the workflow constructor above. Note that there’s a slightly different way required to get the WorkflowId in an activity. So is a Workflow and Activities all we need to get it running?
4. Cadence Domains
Cadence has the concept of domains, which is basically just a namespace that workflows live in. You have to create or reuse an existing domain before you can start workflow or workers. Here’s an example of a method to register a new domain (or return if it exists):
And just in case you are wondering what IWorkflowService and WorkflowServiceTChannel are (I did, and we’ll also use these below)! These are in the ServiceClient package, which doesn’t seem well documented, but this appears to be the main way you connect to the Cadence server.
“The ServiceClient is an RPC service client that connects to the cadence service. It also serves as the building block of the other clients.”
In the java client code I found the following client types (in the client package, and serviceclient package): client/ActivityCompletionClient, client/WorkflowClient, serviceclient/IWorkflowService (There may be more).
5. Cadence Workflow Client
We’re now ready to try the example. In the main method here is some sample code to create a domain, and a new WorkflowClient instance:
6. Cadence Workers
If you try and run the above example, you’ll find that none of the tasks are progressed and the workflow just times out eventually. What’s gone wrong? The way that activities are executed is by Workers. Workers for the activities need to be created before any work is performed (similar to musicians in an orchestra—you can have a conductor and the score, but the score isn’t actually played until the musicians are on the stage with their instruments, ready to perform, and the conductor starts them off).
So now that we have a WorkflowClient we’re ready to create a worker. You have to specify the String activityName, and register both the Workflow and ActivitiesActivites implementations with the workflow before starting it (if you forget to register the activities there’s no error, but nothing happens as Cadence will just wait for a worker process to appear, at least until the workflow times out!):
Note that the workers would typically be run in a separate process to the workflow, and for scalability reasons on well resourced (and multiple) servers, as this is where the actual workflow task “work” is really being done.
But how many workers do we actually have? There are default concurrency and rate limits, and the default settings from WorkerOptions are
- Maximum number of concurrent activities = 100
- Maximum number of concurrent workflows = 50.
- Maximum number of concurrent local activities = 100
- Maximum number of activities started per second = 0 (unlimited)
I suspect that if you were clever and had sufficient monitoring of the workflows and activities (number and times in each) you could use Little’s Law (which relates concurrency, throughput, and time) to estimate the magic numbers more accurately. Here’s an example of changing the WorkerOptions defaults including an explanation of what makes sense.
7. Cadence Workflow Execution: Synchronous or Asynchronous
It’s now time to start executing a workflow instance as follows:
Workflows can be started synchronously or asynchronously. Two start two instances synchronously:
And asynchronously:
Just remember to wait for the asynchronous workflows to complete (as in this example using get()), otherwise, nothing much will happen (as the workers are in the same thread in this example).
8. Cadence Asynchronous Activities and Blocking
Workflows aren’t the only things that can be called asynchronously, activities can be executed concurrently as well. This is helpful when you want to run a large number of concurrent tasks at once, and then wait for them all to complete. For example, let’s replace the original startWorkflow(String name) method with a new method startConcurrentWorkflow(int max), and we’ll just reuse the original task1() activity but execute it concurrently:
Note that we have to use the Async.function(activities::activityName, arg) method, and the Promise class with methods add(), allOf() and get() to block correctly until all of the activities complete.
9. Cadence Restrictions
The way that Cadence manages workflow state is “clever”’! Rather than maintaining the state of each stateful workflow instance in a database (which is a typical approach for persistence and reliable workflow engines, but has scalability limitations), Cadence emits state changes to the database as history events, and then if something goes wrong (or even if you sleep in a workflow) it just restarts the workflow and replays the history to determine what the next task is to run. This uses the event sourcing pattern (similar to the way that Kafka Streams can compute state from event streams, or event streams from state).
This approach supports large numbers of workflow instances running at once (which is likely for long running and high throughput workflows), but does have negative implications for long workflows, which require increasingly large histories and replaying of many events to regain the correct state.
However, this approach does introduce some important restrictions on workflow implementation that you need to be aware of—basically they have to be deterministic. The Cadence Workflow class has lots of useful methods that can be used in place of unsafe/non-deterministic/side-effect riddled standard Java methods, in particular for time, sleeping, and random numbers.
So that’s just about all I know about Cadence so far. However, there’s a lot more to discover, including how it handles exceptions, retries, long running transactions, different sorts of timeouts, compensation actions, signals, queries, child workflows, and how you can call remote APIs, and integrate it with Kafka (e.g. to send and block/receive a message as an activity).