Redshift Serverless: First Impressions

Redshift Serverless: First Impressions

There were a LOT of announcements at re:invent 2021 – nothing new in that. Standard practice in the New Year at Inawisdom is to go through a process of digesting them in a bit more depth than is possible during the hubbub of the event.

As part of this I had a quick play with Redshift Serverless. This is really relevant to Inawisdom as we employ the whole gamut of cluster configurations to support customer analytics needs – from single-node DC2, all the way up to the new RA3 node families.

It’s not news to say that Snowflake have made a huge impact in this area, and AWS have been steadily investing in new Redshift features in order to keep relevant. It’s a really interesting market dynamic, so obviously we were keen to see how Redshift Serverless could enhance our architectures. The launch blog has great details on the public preview so I won’t repeat them here… but I thought I’d post a few initial thoughts.

Findings

On first use, my immediate reaction was… it’s a bit unexciting, i.e. it’s just Redshift 🙂 but without specifying the cluster configuration. This is a good thing – PaaS is much easier to adopt with less operational responsibilities and fewer infrastructure choices to make. So basically, it does what it says on the tin!

Response Times

After setting up the preview service, the initial query performance/experience via the revamped v2 query editor seemed fine. I didn’t get any sense of a “cold start time” – always a potential concern with serverless offerings, and something that continues to be a consideration with Lambda, Glue, SageMaker, etc. The response was still quick after a deliberate 24-hour rest. However, after leaving it for a week, there was a noticeable start-up delay (in the console and via the CLI – maybe 2-3 minutes). I’d be keen to assess this rather more scientifically when we have the chance.

Architecture

The architecture changes made to Redshift back in 2019 to introduce RA3 instances (splitting the scaling relationship between compute and storage) are fundamental to delivering this serverless flexibility, as well as the new data sharing capability. Redshift Serverless does add an interesting new option to the data architect’s toolbox, as it can now be used to provide an entirely on-demand reporting datastore for specific user communities. So if you’ve already got an RA3 Redshift cluster, you can create datamarts for other departments (e.g. marketing) using data shares, rather than serving them from the original Redshift cluster. This would allow peaky workloads (e.g. month-end reporting) to be offloaded to an on-demand model – therefore, you can run a smaller RA3 cluster and save costs.

Tidying up

I played with the sample datasets – all behaved as you’d expect…er…like a database…so nothing to report there. One little issue was I couldn’t easily find a way to fully tidy up afterwards via the console, so had to use the AWS CLI to drop the “sample_data_dev” database it had created for me. The console gave me an error when trying to “drop database”, presumably because it was connected to it…

ERROR: database "sample_data_dev" is being accessed by other users

With “normal” Redshift we would generally just blow the cluster away (and therefore any databases, etc.) to tidy up after an evaluation session, but of course you can’t do this with Redshift Serverless. So, I was left with a “dev” database that has zero run cost and is empty (zero tables, etc.) but can’t be dropped – maybe a bit of a “preview” feature, I think.

Using the API

As reported above, I got to use the AWS CLI against Redshift Serverless a bit quicker than I intended 🙂, but I can report that the standard AWS CLI objects all seem to work (e.g. the “redshift-data” API works against Redshift Serverless) so there’s no new CLI API for it. For example:

aws redshift-data list-databases --database dev --profile XXXXXX

Returns this…

{
"Databases": [
"dev",
"sample_data_dev"
]
}

Interestingly, even with my tiny datasets (with sample datasets dropped and no tables), Redshift Serverless created automatic recovery points that were circa 200MB in size.

Screenshot of Redshift Serverless Recovery Points panel showing a data size of 195MB

Costs

Finally, we always want to look at costs. With the public preview, you get $500 in credits to play with, and I spent $0.52 in my playing around. This sounds like peanuts, and it is, but I didn’t do anything at scale, so further evaluation is needed here also.

Conclusions

Redshift Serverless does feel like a natural and welcome step forward, building on the RA3 instance capabilities and adding Snowflake-a-like auto stop/start. It feels genuinely on-demand – no use, then no cost. One of the barriers for Redshift use for some scenarios previously was the relatively always-on costing model (unless you got into backups/recoveries, etc.). Next steps for us would be maybe to evaluate it for dev environment workloads where demand is sporadic and somewhat unpredictable.

Robin Meehan
robin@inawisdom.com
No Comments

Post A Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.