Hello everyone! My name is Cyril, I am CTO in Adapty. Most of our architecture is located on AWS, and today I’ll talk about how we reduced server costs by 3 times by using spot instances in the production environment, as well as how to configure their auto-scaling. First, there will be an overview of how this works, and then a detailed instruction for launching.

What are spot instances?

Spot instances are the servers of other AWS users who are currently idle, and they sell them at a big discount (Amazon writes up to 90%, in our experience ~ 3x, varies depending on the region, AZ and type of instance). Their main difference from the usual ones is that they can turn off at any time. Therefore, we have long believed that it is normal to use them for virgo environments, or for tasks on calculating something, while maintaining intermediate results on S3 or in the database, but not for sale. There are third-party solutions that allow you to use spots on the prod, but there are many crutches for our case, so we did not implement them. The approach described in the article works completely within the standard AWS functionality, without additional scripts, crowns, etc.

Next, I’ll give a few screenshots that show the history of prices for spot instances.

m5.large in the eu-west-1 (Ireland) region. The price is mostly stable for 3 months, currently saving 2.9x.


m5.large in the region us-east-1 (N. Virginia). The price is constantly changing for 3 months, currently saving from 2.3x to 2.8x depending on the availability zone.


t3.small in the region us-east-1 (N. Virginia). The price is stable for 3 months, currently saving 3.4x.


Service architecture

The basic architecture of the service, which we will discuss in the framework of this article, is depicted in the diagram below.


Application Load Balancer → EC2 Target Group → Elastic Container Service

The Application Load Balancer (ALB), which sends requests to the EC2 Target Group (TG), is used as a balancer. TG is responsible for opening ports for ALBs on instances and associating them with Elastic Container Service (ECS) container ports. ECS is an analogue of Kubernetes in AWS that manages Docker containers.

There can be several working containers with the same ports on the same instance, so we cannot set them fixedly. ECS tells TG that it launches a new task (in Kubernetes terminology, it is called under), it does a free port check on the instance and assigns one of them to the task to be launched. TG also regularly checks whether the instance and api are working on it using the health check, and if it sees any problems, it stops sending requests there.

EC2 Auto Scaling Groups + ECS Capacity Providers

The above diagram does not show the EC2 Auto Scaling Groups (ASG) service. From the name you can understand that he is responsible for scaling instances. However, until recently, AWS had no built-in ability to control the number of running machines from ECS. ECS allowed scaling the number of tasks, for example, by using CPU, RAM or the number of requests. But if the task took all the free instances, the new machines did not automatically rise.

This has changed with the advent of ECS Capacity Providers (ECS CP).Now each service in ECS can be associated with ASG, and if tasks do not fit on working instances, then new ones will rise (but within the framework of established ASG limits). This also works in the opposite direction, if ECS CP sees idle instances without tasks, then it will give the ASG command to turn them off. ECS CP has the ability to specify the target percentage of instance loading, so that a certain number of machines are always free to quickly scale tasks, I will talk about this a little later.

EC2 Launch Templates

The last service that I will talk about before moving on to a detailed description of the creation of this infrastructure is EC2 Launch Templates. It allows you to create a template that will run all the machines so as not to repeat it every time from scratch. Here you can choose the type of machine to start, security group, disk image and many other parameters. You can also specify user data that will be uploaded to all running instances. You can run scripts in user data, for example, you can edit the contents of the file ECS Agent Configuration .

One of the most important configuration parameters for this article is ECS_ENABLE_SPOT_INSTANCE_DRAINING =true. If this option is enabled, as soon as the ECS receives a signal that the spot instance is being taken away, it translates all tasks that work on it into the Draining status. No new tasks will be assigned to this instance; if there are tasks that right now want to roll out onto it, they are canceled. Requests from the balancer also stop coming. A notification about the removal of an instance comes 2 minutes before the actual event. Therefore, if your service does not perform tasks for more than 2 minutes and does not save anything to disk, then you can use spot instances without data loss.

Regarding the drive - AWS recently made it possible to use the Elastic File System (EFS) together with ECS, with this scheme even the drive is not an obstacle, but we did not try it, because in principle we do not need the drive to store the state. By default, after receiving SIGINT (it is sent when the task is transferred to the Draining status), all working tasks will be stopped after 30 seconds, even if they did not have time to complete, you can change this time using the parameter ECS_CONTAINER_STOP_TIMEOUT . The main thing is not to expose it for more than 2 minutes for spot cars.

Creating a service

We proceed directly to the creation of the described service. In the process, I will additionally describe several useful points that were not mentioned above. In general, this is a step-by-step instruction, but I will not consider any very basic or very specific cases. All actions are performed in the AWS visual console, but they can be played back programmatically using CloudFormation or Terraform. At Adapty, we use Terraform.

EC2 Launch Template

This service creates the configuration of the machines that will be used. Template management takes place in the EC2 - > Instances - > Launch templates.

Amazon machine image (AMI) - specify the disk image with which all instances will be launched. For ECS, in most cases, you should use an optimized image from Amazon. It is regularly updated and contains everything necessary for the work of ECS. To find the current image ID, go to the Amazon ECS-optimized AMIs , select the region used and copy the AMI ID for it. For example, for the us-east-1 region, the current ID at the time of writing is ami-00c7c1cf5bdc913ed . This ID must be inserted into the Specify a custom value item.

Instance type - indicate the type of instance.Choose the one that best suits your needs.

Key pair (login) - specify the certificate with which you can connect to the SSH instance if necessary.

Network settings - specify the network settings. Networking platform in most cases should be Virtual Private Cloud (VPC). Security groups - Security groups for your instances. Since we will use the balancer in front of instances, I recommend specifying here a group that allows incoming connections only from the balancer. That is, you will have 2 security groups, one for the balancer, which allows inbound connections from anywhere on ports 80 (http) and 443 (https), and the second for machines, which allows incoming connections on any ports from the balancer group. Outbound connections in both groups must be opened by TCP protocol to all ports to all addresses. You can limit the ports and addresses for outgoing connections, but then you need to constantly monitor that you are not trying to access somewhere through a closed port.

Storage (volumes) - specify the disk settings for the machines. The disk capacity cannot be less than what is set in AMI, for ECS Optimized - 30 GiB.

Advanced details - specify additional parameters.

Purchasing option - whether we want to buy spot instances. We want, but we won’t check this box, we’ll configure it in the Auto Scaling Group, there are more options.

IAM instance profile - specify the role with which the instances will start. In order for instances to work in ECS, they need rights, which usually lie in the role of ecsInstanceRole . In some cases, it can be created, if not, then here is the instruction on how to do this. After creating, specify it in the template.
Then there are a lot of parameters, basically you can leave default values ​​everywhere, but each of them has a clear description. I always include the EBS-optimized instance and T2/T3 Unlimited options if burstable instances.

User data - specify user data. We will edit the CDMY0CDMY file, which contains the configuration of the ECS agent.
An example of what user data might look like:

#!/bin/bash echo ECS_CLUSTER=DemoApiClusterProd >>/etc/ecs/ecs.config echo ECS_ENABLE_SPOT_INSTANCE_DRAINING=true >>/etc/ecs/ecs.config echo ECS_CONTAINER_STOP_TIMEOUT=1m >>/etc/ecs/ecs.config echo ECS_ENGINE_AUTH_TYPE=docker >>/etc/ecs/ecs.config echo "ECS_ENGINE_AUTH_DATA={\"registry.gitlab.com\":{\"username\":\"username\",\"password\":\"password\"}}" >>/etc/ecs/ecs.config 

CDMY1CDMY - the parameter indicates that the instance belongs to the cluster with the specified name, that is, this cluster will be able to place its tasks on this server. We have not created a cluster yet, but we will use this name to create it.

CDMY2CDMY - the parameter indicates that when receiving a signal to turn off the spot instance, all tasks on it should be transferred to the Draining status.

CDMY3CDMY - the parameter indicates that after receiving a SIGINT signal, all tasks have 1 minute before they are killed.

CDMY4CDMY - the parameter indicates that the docker scheme is used as the authorization mechanism

CDMY5CDMY - parameters for connecting to a private container registry where your Docker images are stored. If it is public, then you don’t need to specify anything.

In this article, I will use a public image from the Docker Hub, so I don’t need to specify the parameters CDMY6CDMY and CDMY7CDMY.

Good to know : it is recommended to regularly update AMI, because the new versions update versions of Docker, Linux, ECS agent, etc. To not forget about this, you can configure notifications of new versions. You can receive notifications by email and update by hand, or you can write a Lambda function that will automatically create a new version of Launch Template with an updated AMI.

EC2 Auto Scaling Group

Auto Scaling Group is responsible for launching and scaling instances. Group management takes place in the EC2 - > Auto Scaling - > Auto Scaling Groups.

Launch template - select the template created in the previous step.We leave the default version.

Purchase options and instance types - specify the types of instances for the cluster. The Adhere to launch template uses the instance type from the Launch Template. Combine purchase options and instance types allows you to flexibly customize instance types. We will use it.

Optional On-Demand base - the number of regular, non-spot instances that will always work.

On-Demand percentage above base - the percentage ratio of regular and spot instances, 50-50 will be distributed equally, 20-80 for each ordinary instance 4 spots will rise. As part of this example, I will indicate 50-50, but in reality we most often do 20-80, in some cases 0-100.

Instance types - here you can specify additional types of instances that will be used in the cluster. We never used, because I do not really understand the meaning of this story. It may be a matter of limits on specific types of instances, but they increase easily through support. If you know the application, I will be glad to read in the comments)


Network - network settings, select VPC and subnets for machines, in most cases it’s worth choosing all available subnets.

Load balancing - balancer settings, but we will do it separately, we’re not touching anything here. Health checks will also be configured later.

Group size - specify the limits on the number of cars in the cluster and the desired number of cars at the start. The number of machines in the cluster will never be less than the minimum specified and more than the maximum, even if scaling should occur according to metrics.

Scaling policies - scaling options, but we will scale based on ECS-launched tasks, so we’ll configure scaling later.

Instance scale-in protection - protect instances from deletion when scaling down. We turn it on so that ASG does not delete the machine that has working tasks. Disable protection for instances that do not have task will be ECS Capacity Provider.

Add tags - you can specify tags for instances (to do this, check the Tag new instances box). I recommend that you specify the Name tag, then all the instances that run within the group will be called the same, they are conveniently viewed in the console.


After creating a group, open it and go to the Advanced configurations section, why at the creation stage not all options are visible in the console.

Termination policies - the rules that are taken into account when deleting instances. They are applied in order. We usually use ones like in the picture below. First, the instances with the oldest Launch Template are deleted (for example, if we updated AMI, we created a new version, but all the instances managed to switch to it). Then the instances are selected that are closest to the next billing billing time. And then the oldest ones by the launch date are selected.


Good to know : to update all the machines in the cluster, it’s convenient to use Instance Refresh . If you combine this with the Lambda function from the previous step, then you will have a fully automated instance update system. Before upgrading all machines, you must disable instance scale-in protection for all instances in the group. Not the setting in the group, but the protection from the machines themselves, this is done on the Instance management tab.

Application Load Balancer and EC2 Target Group

The balancer is created in the section EC2 → Load Balancing → Load Balancers.We will use the Application Load Balancer, a comparison of different types of balancers can be found on the service page .

Listeners - it makes sense to make 80 and 443 ports and redirect from 80 to 443 later using balancer rules.

Availability Zones - in most cases, we select access zones for everyone.

Configure Security Settings - indicates the SSL certificate for the balancer here, the most convenient option is make a certificate in ACM. You can read about the Security Policy differences in documentation , you can leave the default CDMY8CDMY selected. After creating the balancer, you will see its DNS name , which you need to configure CNAME for your domain. For example, it looks like this in Cloudflare.


Security Group - we create or select a security group for the balancer, I wrote more about this a little higher in the section EC2 Launch Template → Network settings.

Target group - we create a group that is responsible for routing requests from the balancer to the machines and checks their availability to replace in case of problems. Target type must be Instance, Protocol and Port any, if you use HTTPS for communication between the balancer and the instances, then you need to upload a certificate to them. As part of this example, we will not do this, just leave port 80.

Health checks - service health check parameters. In a real service, this should be a separate request that implements important parts of the business logic; in this example, I will leave the default settings. Next, you can select the request interval, timeout, successful response codes, etc. In our example, we will specify Success codes 200-399, because the Docker image that will be used returns 304 code.


Register Targets - here the machines for the group are selected, but in our case ECS will do this, so just skip this step.

Good to know : at the balancer level, you can enable logs that will be stored in S3 in a specific format . From there, they can be exported to third-party analytics services, or you can make SQL queries directly from the data in S3 with using Athena . It is convenient and works without any additional code. I also recommend setting up the removal of logs from the S3 bucket after a specified period of time.

ECS Task Definition

In the previous steps, we created everything related to the service infrastructure, now we turn to the description of the containers that we will launch. This is done in the section ECS → Task Definitions.

Launch type compatibility - select EC2.

Task execution IAM role - select CDMY9CDMY. Using it, logs are written, access to secret variables, etc. is given.

In the Container Definitions section, click Add Container.

Image - link to the image with the project code, as part of this example I will use the public image with the Docker Hub bitnami/node-example: 0.0.1 .

Memory Limits - memory limits for the container. Hard Limit - hard limit, if the container goes beyond the specified value, the docker kill command will be executed, the container will die immediately. Soft Limit - soft limit, the container may go beyond the specified value, but this parameter will be taken into account when placing task on the machine. For example, if the machine has 4 GiB of RAM, and the soft limit of the container is 2048 MiB, then on this machine there can be a maximum of 2 running tasks with this container. In reality, 4 GiB of RAM is slightly less than 4096 MiB, you can see it on the ECS Instances tab in the cluster. Soft limit cannot be greater than hard limit. It is important to understand that if there are several containers in one task, then their limits are summed up.

Port mappings - in Host port we specify 0, this means that the port will be assigned dynamically, it will be tracked by the Target Group. Container Port - the port on which your application runs is often specified in the command for execution, or assigned in the code of your application, Dockerfile, etc. For our example, we use 3000 because it is listed in Dockerfile image used.

Health check - container health check parameters, not to be confused with the one configured in the Target Group.

Environment - environment settings. CPU units - similar to Memory limits, only about the processor. Each processor core is 1024 units, so if the server has a dual-core processor and the container has a value of 512, then 4 tasks with this container can be launched on one server. CPU units always correspond to the number of cores, they can not be a little less, as is the case with memory.

Command - a command to start the service inside the container, all parameters are indicated with a comma. It can be gunicorn, npm, etc. If not specified, the value of the CMD directive from the Dockerfile will be used. Specify CDMY10CDMY.

Environment variables - container environment variables. This can be just plain text data or secret variables from Secrets Manager or Parameter Store .

Storage and Logging - here we will configure logging in CloudWatch Logs (service for logs from AWS). To do this, just enable the Auto-configure CloudWatch Logs checkbox. After creating Task Definition, a group of logs will be automatically created in CloudWatch. By default, the logs in it are stored indefinitely, I recommend changing the Retention period from Never Expire to the required period. This is done in CloudWatch Log groups, you need to click on the current period and select a new one.


ECS Cluster and ECS Capacity Provider

Go to the ECS → Clusters section to create a cluster. As a template, select EC2 Linux + Networking.

Cluster name - very important, here we make the same name as specified in the Launch Template parameter CDMY11CDMY, in our case CDMY12CDMY. Check the box Create an empty cluster. Optionally, you can enable Container Insights to view service metrics in CloudWatch. If you did everything right, then in the ECS Instances section you will see the machines that were created in the Auto Scaling group.


Go to the Capacity Providers tab and create a new one. Let me remind you that it is needed in order to control the creation and shutdown of machines, depending on the number of working ECS ​​tasks. It is important to note that a provider can only be associated with one group.

Auto Scaling group - select the previously created group.

Managed scaling - turn it on so that the provider can scale the service.

Target capacity% - what percentage of loading of machines with task we need.If you specify 100%, then all machines will always be occupied by working tasks. If you specify 50%, then half of the cars will always be free. In this case, if there is a sharp jump in the load, new taxis will immediately get to free cars, without having to wait for the deployment of instances.

Managed termination protection - turn it on, this parameter allows the provider to remove the protection of instances from deletion. This happens when there are no active tasks on the machine and allows Target capacity%.

ECS Service and scaling settings

Last step :) To create a service, you need to go to the previously created cluster on the Services tab.

Launch type - you need to click on Switch to capacity provider strategy and select the provider you created earlier.


Task Definition - select the previously created Task Definition and its revision.

Service name - so as not to get confused, we always specify the same as Task Definition.

Service type - always Replica.

Number of tasks - the desired number of active tasks in the service. This setting is controlled by scaling, but you still need to specify it.

Minimum healthy percent and Maximum percent - determine the behavior of tasks during deployment. The default values ​​of 100 and 200 indicate that at the time of the deployment, the number of tasks will increase by a factor of 2, and then return to the desired one. If you have 1 task running, min=0, and max=100, then during the deploy it will be killed, and after that a new one will rise, that is, it will be simple. If 1 task works, min=50, max=150, then the deployment will not happen at all, because it can’t be split 1 in half or increased one and a half times.

Deployment type - we leave the Rolling update.

Placement Templates - rules for placing task on machines. The default is AZ Balanced Spread - this means who each new task will be placed on a new instance until the machines rise in all access zones. We usually do BinPack - CPU and Spread - AZ, with this policy tasks are placed as densely as possible on one machine per CPU. If you need to create a new machine, it is created in a new availability zone.


Load balancer type - select Application Load Balancer.

Service IAM role - select CDMY13CDMY.

Load balancer name - select the previously created balancer.

Health check grace period - a pause before performing health checks after rolling out a new task, we usually set 60 seconds.

Container to load balance - in the Target group name section, select the previously created group and everything will be automatically filled.


Service Auto Scaling - scaling parameters of the service. Select Configure Service Auto Scaling to adjust your service’s desired count. We set the minimum and maximum number of tasks when scaling.

IAM role for Service Auto Scaling - select CDMY14CDMY.

Automatic task scaling policies - rules for scaling. There are 2 types:

  1. Target tracking - tracking the target metric (CPU/RAM usage or the number of requests for each task). For example, we want the average processor load to be 85%, when it gets higher, then new tasks will be added until it reaches the target value. If the load is lower, then the task will be removed on the contrary, if the protection against scaling down is not enabled ( Disable scale-in ).
  2. Step scaling - reaction to an arbitrary event.Here you can configure the reaction to any event (CloudWatch Alarm), when it occurs, you can add or remove the specified number of tasks, or specify the exact number of tasks.

A service can have several scaling rules, it can be useful, the main thing is to ensure that they do not conflict with each other.


If you followed the instructions and used the same Docker image, your service should return such a page.


  1. We created a template by which all machines in the service are launched. We also learned how to update machines when changing the template.
  2. We set up the processing of the stop instance stop signal, therefore, within a minute after receiving it, all working tasks are removed from the machine, so nothing is lost or interrupted.
  3. We raised the balancer to evenly distribute the load across the machines.
  4. We created a service that works on spot instances, due to this, the cost of cars is reduced by about 3 times.
  5. We set up auto-scaling in both directions to handle the increase in workloads, but at the same time not to pay for downtime.
  6. We use the Capacity Provider so that the application manages the infrastructure (machines), and not vice versa.
  7. We are great.

If you have predictable bursts of load, for example, you advertise in a large email newsletter, you can adjust the scaling by schedule .

You can also do scaling based on data from different parts of your system. For example, we have the functionality of sending individual promotional offers to users of a mobile application. Sometimes a campaign is sent to 1M + people. After this distribution, there is always a large increase in API requests, as many users log into the application at the same time. So if we see that there are significantly more standard indicators in the queue for sending promo pushes, we can immediately launch several additional machines and tasks to be ready for the load.

I will be glad if in the comments you tell interesting cases of using spot instances and ECS or something about scaling.

Soon there will be articles about how we process thousands of analytic events per second on a predominantly serverless stack (with money) and how services are deployed using GitLab CI and Terraform Cloud.

Subscribe to us, it will be interesting!