Scale EC2 GPU Spot Instances When Uploading to S3 Bucket

Auto Scale EC2 GPU Spot Instances When Uploading to S3 Bucket

Problem description: we want to upload unlimited amounts of content to an S3 bucket, then we want our GPU image processing code to consume those uploads on-demand. We don’t want any idle servers, so instances should be provisioned when files are uploaded to the S3 bucket, live until all the new files are processed, then terminate themselves so we have no long-lived idle servers.

AWS can do the above using completely built-in tools and triggers and monitors, so let’s configure it all.

It took referencing over 15 different documents to get all of this sorted together, even though this has all been standard AWS behaviors for years. Hopefully it can save you some effort if you’re trying to get any of this working for the first time.

In only eleventy billion easy steps, you too can create a self-contained, zero-code, demand-proportional transient data processing platform.

Also note: this guide is for EC2 Auto Scaling and not AWS Auto Scaling which is a completely different product.

Also also note: this guide is for the AWS Web Console GUI because everything has reasonable defaults and is mostly accessible without typing 300 commands. Modern complex AWS octopus infrastructure configs like this should actually be maintained by using the AWS CDK which lets you create AWS resources from polyglot OO code (and higher level pre-defined templates) then translates your logical intentions into the underlying 10,000 lines of CloudFormation JSON to get everything running in an automated fashion (which is also much easier to deploy across multiple regions instead of repeating the same 77 Web Console steps over and over again).

Step 0: Choose a Region

All of AWS operations only operate within one region, and the region you pick will inform your cost structure. The current cheapest AWS region is Ohio us-east-2 with Oregon us-west-2 being second by most measures.

For pricing details and service offerings described on this page, I’m using Ohio us-east-2 in December 2020. Configuration referenced on this page is with the AWS web console GUI also as it exists mid-December 2020.

Step 1: Create A Bucket to Save Your Uploads

imagine the walrus i has bucket meme here it’s hilarious

Create your S3 bucket with appropriate permissions. The region for your bucket must be the same region as your SQS queue and the same region where your auto scale EC2 instances will launch.

For this enviroment, we’re only going to use the S3 bucket for holding uploads temporarily until our instances can process them, then we delete the S3 content. S3 storage costs of $0.023 GB/month will be minimal since most stored objects will be deleted in less than an hour after upload (under this “upload, store, process, delete in an hour” model it would only cost $31.50 of S3 usage to process 1 PB through S3 default rates: (($0.023/GB) / month) * 1 PB * 3600 seconds = $31.4859565).

CONSOLE GUIDE: Services -> S3 -> Create Bucket

  • Pick a globally unique bucket name
  • Select your region (we’re using us-east-2 for everything in this tutorial)
  • Select your access restrictions and versioning/lock requirements
  • -> Create Bucket

Now we need to take a detour to create a queue to receive the upload notifications, then we’ll return to the S3 bucket config for specifying the queue notification endpoint.

Step 2: Create SQS Queue For Upload Notifications

S3 supports posting an event to SQS, pronounced “squiss,” each time a bucket action happens (upload, delete, degraded, replicated, etc), so let’s create a message queue for S3 to post updates into.

Quick SQS refresher:

  • You get 1 million SQS requests free per month
    • after 1M, billed at
      • standard queue actions: $0.40 per million ($0.0000004 per API call)
      • FIFO queue actions: $0.50 per million
  • Publisher posts message to SQS
  • Consumers listen to SQS for updates
    • the SQS API limits consumers to a maximum of 20 second long poll timeout durations (it’s a short-timeout pub/sub system where you can only subscribe for 20 seconds max before reconnecting)
    • each consumer call can receive 1 to 10 messages per update
    • each long poll request is a chargeable API event even if you receive no messages
    • Though, if you spin in a 20 second request loop all month and receive no messages (i.e. you do 129,000 20-second calls), your entire SQS usage would still be within the 1M free category, and if you are out of the 1M free limit, it would cost $0.052 for one consumer being constantly-connected but unproductive all month.
  • Consumer receives 1 to 10 messages per update, but each update is only billed as one API event regardless of messages sent
  • Consumer processes content of queue message (runs RPC, APIs, misc data processing)
  • Consumer deletes messages from queue to signal completion
    • upon deletion, SQS will not re-deliver the message to other consumers
    • if messages are not deleted before Visibility Timeout, SQS will re-deliver the message to another consumer
    • SQS Standard queues don’t guarantee at-most-once delivery, so your processing system must be idempotent.

Now let’s create our new queue for S3 to populate.

CONSOLE GUIDE: Services -> Simple Queue Service -> Create Queue

Now you can click on your queue name in the SQS console GUI to manage the queue (including updating update the access policy if you need to add more readers/writers over time (remember your SQS queue has a strict access policy for readers and writers, so if something breaks, you probably need to add more access levels, including adding one or more access control sections for your queue consumers themselves)).

Also note the queue management page has a Send and receive messages button at the top which is a quick web interface for sending and receiving messages without standing up any external code clients.

It’s a good idea to click on Send and receive messages and just keep it open while we finish configuring the S3 bucket. When the S3 bucket is first configured to send messages to your SQS queue, S3 will push a s3:TestEvent message into the queue you’ll want to manually delete.

Step 3: Configure S3 Bucket to Post SQS Message On Upload

CONSOLE GUIDE: Services -> S3 -> Click on the bucket name -> Properties -> Event notifications -> Create event notification

  • Name your event
  • Specify any filename or prefix restrictions for this trigger if needed (subdirectories, only certain file types, etc)
  • Select your event trigger criteria (likely “All object create events” for when anything is added to the bucket)
  • Select Destination :: SQS Queue
  • Pick your previously created SQS queue name under ‘SQS queue’
  • -> Save Changes

Now your S3 bucket is connected to your SQS queue!

If you re-visit your SQS Send and receive messages page and click Poll for messages, you should notice a new message from S3 testing the endpoint. You should delete the test message now so it doesn’t mistakenly trigger new instances when we configure auto-scaling based on SQS unprocessed queue length.

Step 4: Create EC2 Launch Template

CONSOLE GUIDE: Services -> EC2 -> [sidebar] Instances -> Launch Templates -> Create launch template

  • Pick a name (again, pick a good name or you’ll be disappointed in yourself!)
  • You can select “Auto Scaling guidance” but all it seems to do is make AMI required?
  • Select your AMI
    • a good base is usually the Deep Learning AMI (Ubuntu) which as of right now is version 38.0 on Ubuntu 18.04
      • and by “good” we mean you have to service disable containerd and docker and kill background updates on startup otherwise the image will sit and try to upgrade all of itself for 10 minutes after it launches each time.
  • DO NOT select an Instance type — this will be specified in the auto scaling configuration!
  • Fill out the rest of the form as usual with your favorite key pair and VPC and storage volume configuration
    • note: your root EBS volume (which is not the included “instance attached storage” visible using lsblk) must be both big enough to hold your AMI as well as being a compatible boot volume!
      • so, do not pick a cheap EBS type of st1 or sc1 as your AMI root because it won’t work, you’ll get a non-helpful “your configuration is not available” auto scaler error (the web console shouldn’t even allow us to configure this known-bad combination, but it isn’t checking for compatibility apparently).
  • DO NOT select Request Spot Instances under Advanced details — this will be managed by the EC2 Auto Scaler
  • You can configure User data to be a shell script to run on each instance when it starts (useful for automatically starting your SQS queue consumer to start processing the S3 content)
  • -> Create Launch Template

Note: you may also see options for Launch Configuration in the AWS console, but they are deprecated in favor of more modern Launch Templates

Step 5: Create EC2 Auto Scaling Group

You’re 3/4 done with only 333 more steps to go!

Next up: attaching an auto-scaler to the available queue length so new servers will be launched when S3 posts new upload notifications into the queue.

CONSOLE GUIDE: Services -> EC2 -> [sidebar] Auto Scaling -> Auto Scaling Groups -> Create an Auto Scaling Group

  • Create a group name (and it better be good this time!)
  • Select your launch template from Step 4
  • -> Next
  • Here is where you can request 100% Spot instances
  • Since we only want Spot instances, specify 0 and 0% On-Demand and 100% Spot
  • Select Capacity optimized because it’s simple and recommended
    • but if you want the auto scaler to launch the cheapest Spot instance, select Lowest price then tell it to use all combinations of instance pools with 20 pools (instance pools are combinations of instance types times availabiity zones, so there can be many combinations of “pools” to pick from)
  • For our workload, we don’t care about rebalancing. We can terminate and reload normally.
  • Select your favorite instance type (you may need to request additional Spot launch quotas for instance types)
  • Use 1 for Weight
    • Auto Scale servers are launched by “requested capacity” defined as an integer weight, so if you request “capacity 5” and your server has weight 1, you launch 5 servers.
    • You could also use weight to store total memory of the instance type, then you could scale by “total memory in the group”
    • (or you can select no weight and it will work the same as a 1 weight)
  • Select your VPC and ideally select all subnets for all availability zones in your region for maximum launch flexibility
  • -> Next
  • You can configure auto-membership into a load balancer if you need it
  • Select Enable group metrics collection within CloudWatch because why not?
  • -> Next
  • Specify your capacity requirements likely based on your Spot instance quota
    • Since this is an on-demand auto scaling group, we want it to be 0 when there’s no content and only limited to our Spot quota for maximum scale up.
    • So, Desired is 0 and Minimum is 0 and Maximum is our Spot quota for the instance type we selected previously.
  • Select Scaling Policies to be None because our queue-based scaling is configured later.
  • You can likely ignore Enable instance scale-in protection because we want our auto-scaling group to destroy instances instead of protecting them from down-scaling
    • reminder: AWS uses “scale-in” to mean the opposite of “scale-out,” so “scale-in” means instance destruction while “scale-out” means instance creation.
  • -> Next
  • You can add email notifications when servers are scaled up/down (which you probably want to do so you’re aware how “auto” the “auto scaler” is acting)
    • email notifications are billed as SNS notifications and you get 1,000 free emails per month before it starts charging $0.00002 per email ($2 per 100k emails)
  • -> Next
  • tags? who needs em?
  • -> Next
  • review your settings. gucci?
  • -> Create Auto Scaling group

Now click on your new Auto Scaling group name to view it in detail.

Whew! Only 740 more steps to go!

Step 6: Create Auto Scale Metric

Auto Scaling in AWS happens by having the Auto Scaler subscribe to one or more CloudWatch metrics.

CloudWatch receives metrics then triggers ALERTS when metrics exceed limits (and limits can be learned over time pseudo-holt-winters or statically defined). CloudWatch has default metrics and you can also create custom metrics (and as of December 2019, SQS supports CloudWatch 1 minute intervals).

Also worth noting CloudWatch is a push based system where other services send monitoring updates into CloudWatch. CloudWatch is not polling for metrics itself. This is important because SQS queues stop sending updates to CloudWatch if they are idle (no producers or consumers) for six hours.

Auto Scale happens by one of three methods:

  • Target Tracking
    • auto scaler converts some group-wide average or CloudWatch metric into an instance count
    • e.g. maintain 50% CPU usage across all servers. if average usage goes above 50%, add more servers; if too many servers are below 50% CPU usage, decrease number of servers.
  • Simple Scaling
  • Step Scaling
    • recommended for manual specification (when not using Target Tracking)
    • responds to monitoring alarms based on multi-tier capacity you specify
    • you have to define scale-out (add instances) and scale-in (destroy instances) guidelines for your monitoring triggers

The downside of Target Tracking is it only supports four metrics by default:

  • Average CPU across the auto scale cluster
  • Average Throughput In (assume units is bytes per second? it’s not specified)
  • Average Throughput Out
  • Load Balancer request count per target

Since we want to grow and shrink our instance count based on SQS queue length, we have three options:

We’ll use the built-in wizard jumper inside Auto Scaling configuration to create our queue metric.

Step 7: Activate Auto Scaling Based On Unprocessed SQS Queue Length

Now let’s finally connect the SQS queue for instance scaling.

We’ll need to do this in two places: inside the Auto Scaling Group Automatic Scaling Policy and also inside the CloudWatch metric itself.

Let’s start in the Auto Scaling Group!

CONSOLE GUIDE: [auto scaling instance] -> Automatic Scaling -> Add policy

  • Policy type: Step scaling
  • Scaling policy name: pick a good name!
  • CloudWatch alarm: now we will select the queue to watch for scaling.
    • Click the new-tab Create a CloudWatch alarm to be redirected to the CloudWatch creation page
    • Select Metric -> SQS -> Queue Metrics -> [your queue name] :: ApproximateNumberOfMessagesVisible
    • -> Select Metric
    • Change Period to 1 minute
    • Since we want this to be a “create cluster when new data, remove cluster on no new data,” set threshold to:
      • Static -> Greater -> than… 0
    • -> Next
    • You can remove the notification and leave everything else empty here
    • -> Next
    • Give it a name and other details you like
    • -> Next
    • Review and -> Create alarm
  • Now return to the Auto Scaling groups policy page
  • Click the refresh icon to load your new CloudWatch alarm
  • Select your new CloudWatch alarm
  • Add -> 1 capacity units when 1 <= ApproximateNumberOfMessagesVisible <= +infinity
    • which just means “while the queue has items, continue adding new servers” which may be a bit excessive depending on your usage, but you can adjust the numbers as you see fit.
    • You can obviously make scale-out steps in unit quantities as “if queue length has 3,000 items, add 3 servers, over 10,000 items add 6 servers,” etc.
    • You can also scale back some over-ambitious launching by setting the Instances need warm up delay too so your new instances have time to start consuming queue messages before the next instance launch is attempted.
  • -> Create

You created a scale-out policy but now we need to also create policies to shrink the instance counts (and you can also fine tune how AWS EC2 decides to cull instances).

Create one more policies so instances will also scale down:

  • Basic (and broken) policy for removal:
    • Example: If queue is empty, remove servers.
    • Configured with the same alarm as previous, but to remove one server if
      • Action: Remove
      • 1 capacity units when 0 <= ApproximateNumberOfMessagesVisible < 1
    • This is a simple policy where your server instances only begin to shutdown when your queue is empty, but this policy will also start killing servers while the last server is still processing data (because messages transition from ‘Visible’ to ‘NotVisible’ after they are read for the processing timeout period) and avoiding last-server-kill requires even more work to integrate EC2 Auto Scaling lifecycle hooks where a server can delay its own destruction.

You may notice the policies can overlap too! What happens if we scale-out because queue length >= 1 and also scale-in because queue length <= 100? When there’s a scale-in/scale-out conflict, EC2 Auto Scaling always uses the scale-out result to grow (or maintain) live instances instead of prematurely decommissioning active instances.

You can think of your scaling policies as one big OR statement for scale-out, but one big AND statement for scale-in.

But back to the scale-in policy we haven’t perfected yet…

One publicly recommended SQS auto scale removal policy is remove when client queue receive loops stop returning new messages (but isn’t “zero new messages” the same as when ApproximateNumberOfMessagesVisible is 0, so we’re back to the initial condition of not being able to tell the difference between all messages being consumed vs one consumer still processing a message and just being between reads?)

Another policy is to scale down when ApproximateNumberOfMessagesNotVisible is zero which, when combined with the scale-out policy of “add instances when messages ARE visible,” will only trigger when, combined, Visible == 0 and NotVisible == 0, which means the queue is actually empty and no messages are currently being processed.

Let’s use the second ApproximateNumberOfMessagesNotVisible check.

  • Create a new scaling policy again
  • Step scaling
  • give it a gooood name
  • Create a new CloudWatch alarm to trigger on SQS->ApproximateNumberOfMessagesNotVisible
  • Static threshold, value < 1 (or, <= 0)
  • Save new metric
    • Note: because your new metric reports the default idle state of the queue, it will be in ALARM mode when your queue is empty. Just be aware ALARM doesn’t mean “fault,” it means “this metric has met its defined condition.”
  • Return to the scaling policy page
  • Reload the CloudWatch alarm list
  • Select your new alarm
  • Now say: Remove 1 capacity units (or heck, even remove 100%) when 0 >= ApproximateNumberOfMessagesNotVisible > -infinity
  • -> Create

Now you should have two policies in your Auto Scaling group:

  • policy for scale-out:
    • Add 1 capacity units when 1 <= ApproximateNumberOfMessagesVisible < +infinity
    • (and repeat the check after a warm up delay of N seconds for each new server)
  • policy for scale-in:
    • Remove 100 percent of group when 0 >= ApproximateNumberOfMessagesNotVisible > -infinity
    • (destroy the entire auto scaling group all at once, no partial scale-in)

This is lopsided scaling behavior where we scale-out one instance at a time, but then we let all the instances remain deployed until all queue messages are consumed then we destroy all instances at once.

For processing batch image uploads this is a reasonable pattern because maybe we have 5,000 images to process, so we deploy all our servers, process all the images, then destroy all the servers. Since each server is acting as an independent queue consumer, the image processing load should be distributed evenly across all active servers until their ultimate shutdown scale-in destruction triggers.

Step 8: Does it work?

If everything was configured correctly, including having enough quota permissions for your selected scaling instance type(s) in the region—and the instances are actually available, you should be able to upload a file to S3 and have an instance launch.

You can check your instance status via SNS emails and also the Activity tab of the auto scaling group detail page.

After your instance launches automatically, you can go back and configure your instance User data start script to pull in your queue processing and dataflow running code so it automatically runs on each instance as the auto scaling group provisions them (just make sure you delete each queue message when you are done so the auto scaling group can eventually scale back to zero instances).

As for me, I got this far, then my group scale-out policy triggered with…

Repeated errors have occurred processing the launch specification “g4dn.xlarge, ami-08e16447bd5caf26a, Linux/UNIX, us-east-2b while launching spot instance”. It will not be retried for at least 13 minutes. Error message: There is no Spot capacity available that matches your request. (Service: AmazonEC2; Status Code: 500; Error Code: InsufficientInstanceCapacity; Proxy: null)

…so us-east-2 has none of the Spot servers I want in any availability zone so the Spot-only auto scale group will never launch in a reasonable time (until instances become available again).

oh well, back to the drawing board.

update: so apparently the keywords above are “that matches your request” — my launch template specified an EBS volume type not supported by the instance apparently, but there was no notice of the incompatible config combination anywhere. Updating the launch template to use a supported storage type (or just “do not include in template”) fixed the auto scaler spot launch problem since EC2 wasn’t actually out of the single spot instance type the auto scaler needed to launch.

Other Resources

Official AWS document describing message format S3 posts into SQS for each upload: Amazon Simple Storage Service::Event message structure

Official AWS guide for this process but with LB to trigger scaling: Amazon EC2 Auto Scaling with EC2 Spot Instances (with Amazon EC2 Auto Scaling, Amazon EC2 Spot Instances, and AWS Application Load Balancer)

Official EC2 Autoscale-via-SQS documentation (only command line/JSON): Amazon EC2 Auto Scaling::Scaling based on Amazon SQS

Older (2018) guide (references previous web console GUI, uses unlimited queue ACL instead of proper scoping): Scaling GPU processing on AWS using Docker