13 Apr Synchronous Express Workflows in real-life!
Inawisdom have embedded Machine Learning models at one of our client’s business by using number of inference styles:
The two most popular styles we see are real-time and batch (inc. micro). For batch, Amazon Step Functions calling a SageMaker Batch Transform are our weapons of choice. For real-time or near real-time, it is Lambda with SageMaker endpoints. However, AWS has now recently expanded its offering with Synchronous Express Workflows. In this blog, we will look into Synchronous Express Workflows, the benefits and the considerations for using them.
Synchronous Express Workflows for AWS Step Functions was launched during the pre:reinvent period in November 2020 (See here). Synchronous Express Workflows build on top of Express Workflows, that in turn builds on top of Standard Workflows. This means all the features of the ASL (Amazon State Language) and service integrations can be used by all three. However, there are a couple of design considerations:
- Maximum execution time is 5 minutes for Synchronous Express Workflows
- Synchronous Express Workflows (like Express Workflow) can run thousands of executions at once. Standard Workflows are meant to be a few hundred
- Synchronous means that the calling service or client will block until completed
The primary use case for Synchronous Express Workflows is to implement the logic behind an API, something like this:
When would you use a Synchronous Express Workflow instead of just writing a Lambda Function? There are a couple of reasons:
- Long poll – if your lambda only needs to do a long polling of a running process like glue (ETL) job or training a model then you can remove the boiler plate code needed and use a service integration.
- Parallelism to reduce latency – You require low latency, and your lambda might be taking longer than expected. Synchronous Express Workflows might mean you could look to break up your lambda using parallel branches to execute multiple lambdas, for example, calling two or models.
- Dynamic Parallelism– Building on top of parallelism, you may wish to scale depending on the results of one lambda and execute N number of other lambdas to process a data set. This can be done using the Map activity feature of the ASL.
- Legacy – You may have legacy or custom image lambdas from other tech stacks you might want to bring together. This allows you to integrate them without touching them.
Firstly, you will need to define your Express Workflow in ASL and then integrate with an API Gateway. I recommend AWS SAM to help you with this and todo your deployments. Now here is an import consideration; Synchronous Express Workflows only work with the V2 style APIs called HTTP API, not the original V1 REST API style. However, HTTP APIs do not have all the features of REST API (https://docs.aws.amazon.com/apigateway/latest/developerguide/http-api-vs-rest.html). One of the most important omissions is that HTTP APIs only have regional style endpoints (making your API publicly available) and this will require your clients to have internet access to reach it. This maybe an important thing to consider when looking at any security or latency requirements.
State is execution data that is passed as JSON into a step function, between tasks, and then as result of the output (this will be response from your API). In order to be able to combine the results from multiple lambdas you need to master input, output and result paths with your state. However, for parallel branches you need to also master the ResultSelector and how state works (https://docs.aws.amazon.com/step-functions/latest/dg/input-output-inputpath-params.html). It is important to realise that each branch becomes its own state machine with its own execution data, and the result (last state) is returned as an element in an array. From this array you then use ResultSelector at the end of parallel steps to select the data you want from the array to be passed to the next step. Here is an example:
The HTTP API response of your request will be an HTTP 200 status code and the body from the Synchronous Express Workflow execution API.
Figure 1 shows an example output return that shows the issues body from the Synchronous Express Workflow and currently you cannot transform the response. This means there are two issues:
- Synchronous Express Workflow execution API response will not be readily in a format your consumer might need.
- HTTP 200 status code will not likely match errors returned from your Lambda functions. The status and cause in the output does tell you of failure but it puts the interpretation on the consumer to understand this response.
To address this and to make the responses more consumer-friendly, a workaround is to use a lambda between the API and the Synchronous Express workflow. This also resolves the lack of API Gateway REST API support I mention previously as you can do this for any type of API. However, it maybe negates what I set out to-do, to remove all boilerplate code and focus on business logic.
Console vs CloudWatch
In order to the see the execution of your Synchronous Express Workflow, you need to look at the AWS Console like you do for Express Workflows. This, however, requires a little more work that Standard Workflows as you need to set up a logging configuration to point to CloudWatch Logs (a top tip for development is to set the IncludeExecutionData to True so that you can see the changes to your state).
Here is a SAM example:
You will also need to add some more permissions to your IAM policy for the logging to CloudWatch to work:
Here is an example of how this appears in the console:
The console is functionally complete and gives you all the detail to see how your Synchronous Express workflow executed. To do this, it is pulls log entries from a CloudWatch log stream which collates the entries together per execution. However, I personally still prefer the Standard Workflow GUI where you can visually see the execution and better navigate the state (execution data). Another top tip is that you may want to map the step function logging levels of ALL, ERROR and FATAL (https://docs.aws.amazon.com/step-functions/latest/dg/cloudwatch-log-level.html) to match your Lambda logging level and turn them down on production (only record errors for example).
The Synchronous Express Workflows use the same tracing options as Express Workflows and, when enabled, this will trace out all downstream services that your workflow uses. If this includes Lambda Functions, and those Lambda functions also have the X-Ray SDK patched, then everything those lambdas call will get included. Also everything and anything you annotate within the Lambda will also be included in the trace.
A worthy consideration here is that if you are invoking your Synchronous Express Workflows from upstream Lambda (the workaround for REST API or response handling) then you will need to set the Tracing Header correctly on the StartSyncExecution to get a full end-2-end trace. This has to be in this format:
Future Improvements & Further Considerations
The following are some additional areas where I hope that Synchronous Express Workflow
The only real way to test your Synchronous Express Workflow (including the Lambdas it invokes) is deploy it into your AWS account. This is time-consuming and very repetitive work. I would like AWS SAM to get the capability to invoke Synchronous Express Workflow (including the Lambdas) locally end-2-end including from the API. There is a way to accomplish this with Step Functions Local (https://docs.aws.amazon.com/step-functions/latest/dg/sfn-local-lambda.html) however this is not currently seamless and is fiddly.
There is current no blue/green or canary style deployment supported as there is only one version of the Synchronous Express Workflow. Also, it is very complex to get Synchronous Express Workflow to call different aliases of a Lambda, depending on the API stage invoked. I tried to do this a number of ways including multiple copies of the Synchronous Express Workflow, but none truly worked well for me.
Developing Synchronous Express Workflow, like all StepFunctions, means you need to wrangle JSON and YAML. If you find this not to your liking you can optionally use the CDK (Cloud Development Kit) to write your workflow if your language is supported https://docs.aws.amazon.com/cdk/api/latest/docs/aws-stepfunctions-readme.html.
Synchronous Express Workflows are a good new addition to the serverless toolkit. The most powerful feature is using it as a way to scale that lambda by allowing you to use a number of runtimes. This can be to reduce latency or do many things at once. For me, it does not make me love AWS Lambda less, in fact I kind of love Lambda more as each of my Lambdas can just do one thing very well. However, if I had one criticism of Synchronous Express Workflows it would be to smooth out the developer experience. Everything you need is there but needs better integration and to work out the box.
So, when one Lambda runtime is not enough, gets too complex or you need to scale your execution, then Synchronous Express Workflows is a contender!