Module - Scale runners

This module is treated as internal module, breaking changes will not trigger a major release bump.

This module creates resources required to run the GitHub action runner on AWS EC2 spot instances. The life cycle of the runners on AWS is managed by two lambda functions. One function will handle scaling up, the other scaling down.

Overview

Action runners on EC2

The action runners are created via a launch template; in the launch template only the subnet needs to be provided. During launch the installation is handled via a user data script. The configuration is fetched from SSM parameter store.

Lambda scale up

The scale up lambda is triggered by events on a SQS queue. Events on this queue are delayed, which will give the workflow some time to start running on available runners. For each event the lambda will check if the workflow is still queued and no other limits are reached. In that case the lambda will create a new EC2 instance. The lambda only needs to know which launch template to use and which subnets are available. From the available subnets a random one will be chosen. Once the instance is created the event is assumed as handled, and we assume the workflow wil start at some moment once the created instance is ready.

Lambda scale down

The scale down lambda is triggered via a CloudWatch event. The event is triggered by a cron expression defined in the variable scale_down_schedule_expression (https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/ScheduledEvents.html). For scaling down GitHub does not provide a good API yet, therefore we run the scaling down based on this event every x minutes. Each time the lambda is triggered it tries to remove all runners older than x minutes (configurable) managed in this deployment. In case the runner can be removed from GitHub, which means it is not executing a workflow, the lambda will terminate the EC2 instance.

Lambda Function

The Lambda function is written in TypeScript and requires Node 12.x and yarn. Sources are located in [./lambdas/runners]. Two lambda functions share the same sources, there is one entry point for scaleDown and another one for scaleUp.

Install

cd lambdas/runners
yarn install

Test

Test are implemented with Jest, calls to AWS and GitHub are mocked.

yarn run test

Package

To compile all TypeScript/JavaScript sources in a single file ncc is used.

yarn run dist

Requirements

Name	Version
terraform	>= 1.3.0
aws	~> 5.27

Providers

Name	Version
aws	~> 5.27

Modules

Name	Source	Version
job_retry	./job-retry	n/a
pool	./pool	n/a

Resources

Name	Type
aws_cloudwatch_event_rule.scale_down	resource
aws_cloudwatch_event_rule.ssm_housekeeper	resource
aws_cloudwatch_event_target.scale_down	resource
aws_cloudwatch_event_target.ssm_housekeeper	resource
aws_cloudwatch_log_group.gh_runners	resource
aws_cloudwatch_log_group.scale_down	resource
aws_cloudwatch_log_group.scale_up	resource
aws_cloudwatch_log_group.ssm_housekeeper	resource
aws_iam_instance_profile.runner	resource
aws_iam_policy.ami_id_ssm_parameter_read	resource
aws_iam_role.runner	resource
aws_iam_role.scale_down	resource
aws_iam_role.scale_up	resource
aws_iam_role.ssm_housekeeper	resource
aws_iam_role_policy.cloudwatch	resource
aws_iam_role_policy.describe_tags	resource
aws_iam_role_policy.dist_bucket	resource
aws_iam_role_policy.ec2	resource
aws_iam_role_policy.job_retry_sqs_publish	resource
aws_iam_role_policy.runner_session_manager_aws_managed	resource
aws_iam_role_policy.scale_down	resource
aws_iam_role_policy.scale_down_logging	resource
aws_iam_role_policy.scale_down_xray	resource
aws_iam_role_policy.scale_up	resource
aws_iam_role_policy.scale_up_logging	resource
aws_iam_role_policy.scale_up_xray	resource
aws_iam_role_policy.service_linked_role	resource
aws_iam_role_policy.ssm_housekeeper	resource
aws_iam_role_policy.ssm_housekeeper_logging	resource
aws_iam_role_policy.ssm_housekeeper_xray	resource
aws_iam_role_policy.ssm_parameters	resource
aws_iam_role_policy_attachment.ami_id_ssm_parameter_read	resource
aws_iam_role_policy_attachment.managed_policies	resource
aws_iam_role_policy_attachment.scale_down_vpc_execution_role	resource
aws_iam_role_policy_attachment.scale_up_vpc_execution_role	resource
aws_iam_role_policy_attachment.ssm_housekeeper_vpc_execution_role	resource
aws_iam_role_policy_attachment.xray_tracing	resource
aws_lambda_event_source_mapping.scale_up	resource
aws_lambda_function.scale_down	resource
aws_lambda_function.scale_up	resource
aws_lambda_function.ssm_housekeeper	resource
aws_lambda_permission.scale_down	resource
aws_lambda_permission.scale_runners_lambda	resource
aws_lambda_permission.ssm_housekeeper	resource
aws_launch_template.runner	resource
aws_security_group.runner_sg	resource
aws_ssm_parameter.cloudwatch_agent_config_runner	resource
aws_ssm_parameter.disable_default_labels	resource
aws_ssm_parameter.jit_config_enabled	resource
aws_ssm_parameter.runner_agent_mode	resource
aws_ssm_parameter.runner_config_run_as	resource
aws_ssm_parameter.runner_enable_cloudwatch	resource
aws_ssm_parameter.token_path	resource
aws_ami.runner	data source
aws_caller_identity.current	data source
aws_iam_policy_document.lambda_assume_role_policy	data source
aws_iam_policy_document.lambda_xray	data source

Inputs

Name	Description	Type	Default	Required
ami_filter	Map of lists used to create the AMI filter for the action runner AMI.	`map(list(string))`	{ "state": [ "available" ] }	no
ami_id_ssm_parameter_name	Externally managed SSM parameter (of data type aws:ec2:image) that contains the AMI ID to launch runner instances from. Overrides ami_filter	`string`	`null`	no
ami_kms_key_arn	Optional CMK Key ARN to be used to launch an instance from a shared encrypted AMI	`string`	`null`	no
ami_owners	The list of owners used to select the AMI of action runner instances.	`list(string)`	[ "amazon" ]	no
associate_public_ipv4_address	Associate public IPv4 with the runner. Only tested with IPv4	`bool`	`false`	no
aws_partition	(optional) partition for the base arn if not 'aws'	`string`	`"aws"`	no
aws_region	AWS region.	`string`	n/a	yes
block_device_mappings	The EC2 instance block device configuration. Takes the following keys: `device_name`, `delete_on_termination`, `volume_type`, `volume_size`, `encrypted`, `iops`, `throughput`, `kms_key_id`, `snapshot_id`.	list(object({ delete_on_termination = optional(bool, true) device_name = optional(string, "/dev/xvda") encrypted = optional(bool, true) iops = optional(number) kms_key_id = optional(string) snapshot_id = optional(string) throughput = optional(number) volume_size = number volume_type = optional(string, "gp3") }))	[ { "volume_size": 30 } ]	no
cloudwatch_config	(optional) Replaces the module default cloudwatch log config. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html for details.	`string`	`null`	no
create_service_linked_role_spot	(optional) create the service linked role for spot instances that is required by the scale-up lambda.	`bool`	`false`	no
credit_specification	The credit option for CPU usage of a T instance. Can be unset, "standard" or "unlimited".	`string`	`null`	no
disable_runner_autoupdate	Disable the auto update of the github runner agent. Be aware there is a grace period of 30 days, see also the GitHub article	`bool`	`false`	no
ebs_optimized	The EC2 EBS optimized configuration.	`bool`	`false`	no
egress_rules	List of egress rules for the GitHub runner instances.	list(object({ cidr_blocks = list(string) ipv6_cidr_blocks = list(string) prefix_list_ids = list(string) from_port = number protocol = string security_groups = list(string) self = bool to_port = number description = string }))	[ { "cidr_blocks": [ "0.0.0.0/0" ], "description": null, "from_port": 0, "ipv6_cidr_blocks": [ "::/0" ], "prefix_list_ids": null, "protocol": "-1", "security_groups": null, "self": null, "to_port": 0 } ]	no
enable_cloudwatch_agent	Enabling the cloudwatch agent on the ec2 runner instances, the runner contains default config. Configuration can be overridden via `cloudwatch_config`.	`bool`	`true`	no
enable_ephemeral_runners	Enable ephemeral runners, runners will only be used once.	`bool`	`false`	no
enable_jit_config	Overwrite the default behavior for JIT configuration. By default JIT configuration is enabled for ephemeral runners and disabled for non-ephemeral runners. In case of GHES check first if the JIT config API is avaialbe. In case you upgradeing from 3.x to 4.x you can set `enable_jit_config` to `false` to avoid a breaking change when having your own AMI.	`bool`	`null`	no
enable_job_queued_check	Only scale if the job event received by the scale up lambda is is in the state queued. By default enabled for non ephemeral runners and disabled for ephemeral. Set this variable to overwrite the default behavior.	`bool`	`null`	no
enable_managed_runner_security_group	Enabling the default managed security group creation. Unmanaged security groups can be specified via `runner_additional_security_group_ids`.	`bool`	`true`	no
enable_on_demand_failover_for_errors	Enable on-demand failover. For example to fall back to on demand when no spot capacity is available the variable can be set to `InsufficientInstanceCapacity`. When not defined the default behavior is to retry later.	`list(string)`	`[]`	no
enable_organization_runners	Register runners to organization, instead of repo level	`bool`	n/a	yes
enable_runner_binaries_syncer	Option to disable the lambda to sync GitHub runner distribution, useful when using a pre-build AMI.	`bool`	`true`	no
enable_runner_detailed_monitoring	Enable detailed monitoring for runners	`bool`	`false`	no
enable_ssm_on_runners	Enable to allow access to the runner instances for debugging purposes via SSM. Note that this adds additional permissions to the runner instances.	`bool`	n/a	yes
enable_user_data_debug_logging	Option to enable debug logging for user-data, this logs all secrets as well.	`bool`	`false`	no
enable_userdata	Should the userdata script be enabled for the runner. Set this to false if you are using your own prebuilt AMI	`bool`	`true`	no
ghes_ssl_verify	GitHub Enterprise SSL verification. Set to 'false' when custom certificate (chains) is used for GitHub Enterprise Server (insecure).	`bool`	`true`	no
ghes_url	GitHub Enterprise Server URL. DO NOT SET IF USING PUBLIC GITHUB	`string`	`null`	no
github_app_parameters	Parameter Store for GitHub App Parameters.	object({ key_base64 = map(string) id = map(string) })	n/a	yes
idle_config	List of time period that can be defined as cron expression to keep a minimum amount of runners active instead of scaling down to 0. By defining this list you can ensure that in time periods that match the cron expression within 5 seconds a runner is kept idle.	list(object({ cron = string timeZone = string idleCount = number evictionStrategy = optional(string, "oldest_first") }))	`[]`	no
instance_allocation_strategy	The allocation strategy for spot instances. AWS recommends to use `capacity-optimized` however the AWS default is `lowest-price`.	`string`	`"lowest-price"`	no
instance_max_spot_price	Max price price for spot intances per hour. This variable will be passed to the create fleet as max spot price for the fleet.	`string`	`null`	no
instance_profile_path	The path that will be added to the instance_profile, if not set the prefix will be used.	`string`	`null`	no
instance_target_capacity_type	Default lifecyle used runner instances, can be either `spot` or `on-demand`.	`string`	`"spot"`	no
instance_types	List of instance types for the action runner. Defaults are based on runner_os (al2023 for linux and Windows Server Core for win).	`list(string)`	`null`	no
job_retry	Configure job retries. The configuration enables job retries (for ephemeral runners). After creating the insances a message will be published to a job retry queue. The job retry check lambda is checking after a delay if the job is queued. If not the message will be published again on the scale-up (build queue). Using this feature can impact the reate limit of the GitHub app. `enable`: Enable or disable the job retry feature. `delay_in_seconds`: The delay in seconds before the job retry check lambda will check the job status. `delay_backoff`: The backoff factor for the delay. `lambda_memory_size`: Memory size limit in MB for the job retry check lambda. 'lambda_reserved_concurrent_executions': Amount of reserved concurrent executions for the job retry check lambda function. A value of 0 disables lambda from being triggered and -1 removes any concurrency limitations. `lambda_timeout`: Time out of the job retry check lambda in seconds. `max_attempts`: The maximum number of attempts to retry the job.	object({ enable = optional(bool, false) delay_in_seconds = optional(number, 300) delay_backoff = optional(number, 2) lambda_memory_size = optional(number, 256) lambda_reserved_concurrent_executions = optional(number, 1) lambda_timeout = optional(number, 30) max_attempts = optional(number, 1) })	`{}`	no
key_name	Key pair name	`string`	`null`	no
kms_key_arn	Optional CMK Key ARN to be used for Parameter Store.	`string`	`null`	no
lambda_architecture	AWS Lambda architecture. Lambda functions using Graviton processors ('arm64') tend to have better price/performance than 'x86_64' functions.	`string`	`"arm64"`	no
lambda_runtime	AWS Lambda runtime.	`string`	`"nodejs22.x"`	no
lambda_s3_bucket	S3 bucket from which to specify lambda functions. This is an alternative to providing local files directly.	`string`	`null`	no
lambda_scale_down_memory_size	Memory size limit in MB for scale down lambda.	`number`	`512`	no
lambda_scale_up_memory_size	Memory size limit in MB for scale-up lambda.	`number`	`512`	no
lambda_security_group_ids	List of security group IDs associated with the Lambda function.	`list(string)`	`[]`	no
lambda_subnet_ids	List of subnets in which the lambda will be launched, the subnets needs to be subnets in the `vpc_id`.	`list(string)`	`[]`	no
lambda_tags	Map of tags that will be added to all the lambda function resources. Note these are additional tags to the default tags.	`map(string)`	`{}`	no
lambda_timeout_scale_down	Time out for the scale down lambda in seconds.	`number`	`60`	no
lambda_timeout_scale_up	Time out for the scale up lambda in seconds.	`number`	`60`	no
lambda_zip	File location of the lambda zip file.	`string`	`null`	no
log_level	Logging level for lambda logging. Valid values are 'silly', 'trace', 'debug', 'info', 'warn', 'error', 'fatal'.	`string`	`"info"`	no
logging_kms_key_id	Specifies the kms key id to encrypt the logs with	`string`	`null`	no
logging_retention_in_days	Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653.	`number`	`180`	no
metadata_options	Metadata options for the ec2 runner instances. By default, the module uses metadata tags for bootstrapping the runner, only disable `instance_metadata_tags` when using custom scripts for starting the runner.	`map(any)`	{ "http_endpoint": "enabled", "http_put_response_hop_limit": 1, "http_tokens": "required", "instance_metadata_tags": "enabled" }	no
metrics	Configuration for metrics created by the module, by default metrics are disabled to avoid additional costs. When metrics are enable all metrics are created unless explicit configured otherwise.	object({ enable = optional(bool, false) namespace = optional(string, "GitHub Runners") metric = optional(object({ enable_github_app_rate_limit = optional(bool, true) enable_job_retry = optional(bool, true) enable_spot_termination_warning = optional(bool, true) }), {}) })	`{}`	no
minimum_running_time_in_minutes	The time an ec2 action runner should be running at minimum before terminated if non busy. If not set the default is calculated based on the OS.	`number`	`null`	no
overrides	This map provides the possibility to override some defaults. The following attributes are supported: `name_sg` overrides the `Name` tag for all security groups created by this module. `name_runner_agent_instance` overrides the `Name` tag for the ec2 instance defined in the auto launch configuration. `name_docker_machine_runners` overrides the `Name` tag spot instances created by the runner agent.	`map(string)`	{ "name_runner": "", "name_sg": "" }	no
pool_config	The configuration for updating the pool. The `pool_size` to adjust to by the events triggered by the `schedule_expression`. For example you can configure a cron expression for week days to adjust the pool to 10 and another expression for the weekend to adjust the pool to 1. Use `schedule_expression_timezone` to override the schedule time zone (defaults to UTC).	list(object({ schedule_expression = string schedule_expression_timezone = optional(string) size = number }))	`[]`	no
pool_lambda_memory_size	Lambda Memory size limit in MB for pool lambda	`number`	`512`	no
pool_lambda_reserved_concurrent_executions	Amount of reserved concurrent executions for the scale-up lambda function. A value of 0 disables lambda from being triggered and -1 removes any concurrency limitations.	`number`	`1`	no
pool_lambda_timeout	Time out for the pool lambda in seconds.	`number`	`60`	no
pool_runner_owner	The pool will deploy runners to the GitHub org ID, set this value to the org to which you want the runners deployed. Repo level is not supported.	`string`	`null`	no
prefix	The prefix used for naming resources	`string`	`"github-actions"`	no
role_path	The path that will be added to the role; if not set, the prefix will be used.	`string`	`null`	no
role_permissions_boundary	Permissions boundary that will be added to the created role for the lambda.	`string`	`null`	no
runner_additional_security_group_ids	(optional) List of additional security groups IDs to apply to the runner	`list(string)`	`[]`	no
runner_architecture	The platform architecture of the runner instance_type.	`string`	`"x64"`	no
runner_as_root	Run the action runner under the root user. Variable `runner_run_as` will be ignored.	`bool`	`false`	no
runner_boot_time_in_minutes	The minimum time for an EC2 runner to boot and register as a runner.	`number`	`5`	no
runner_disable_default_labels	Disable default labels for the runners (os, architecture and `self-hosted`). If enabled, the runner will only have the extra labels provided in `runner_extra_labels`.	`bool`	`false`	no
runner_ec2_tags	Map of tags that will be added to the launch template instance tag specifications.	`map(string)`	`{}`	no
runner_group_name	Name of the runner group.	`string`	`"Default"`	no
runner_hook_job_completed	Script to be ran in the runner environment at the end of every job	`string`	`""`	no
runner_hook_job_started	Script to be ran in the runner environment at the beginning of every job	`string`	`""`	no
runner_iam_role_managed_policy_arns	Attach AWS or customer-managed IAM policies (by ARN) to the runner IAM role	`list(string)`	`[]`	no
runner_labels	All the labels for the runners (GitHub) including the default one's(e.g: self-hosted, linux, x64, label1, label2). Separate each label by a comma	`list(string)`	n/a	yes
runner_log_files	(optional) List of logfiles to send to CloudWatch, will only be used if `enable_cloudwatch_agent` is set to true. Object description: `log_group_name`: Name of the log group, `prefix_log_group`: If true, the log group name will be prefixed with `/github-self-hosted-runners/<var.prefix>`, `file_path`: path to the log file, `log_stream_name`: name of the log stream.	list(object({ log_group_name = string prefix_log_group = bool file_path = string log_stream_name = string }))	`null`	no
runner_name_prefix	The prefix used for the GitHub runner name. The prefix will be used in the default start script to prefix the instance name when register the runner in GitHub. The value is availabe via an EC2 tag 'ghr:runner_name_prefix'.	`string`	`""`	no
runner_os	The EC2 Operating System type to use for action runner instances (linux,windows).	`string`	`"linux"`	no
runner_run_as	Run the GitHub actions agent as user.	`string`	`"ec2-user"`	no
runners_lambda_s3_key	S3 key for runners lambda function. Required if using S3 bucket to specify lambdas.	`string`	`null`	no
runners_lambda_s3_object_version	S3 object version for runners lambda function. Useful if S3 versioning is enabled on source bucket.	`string`	`null`	no
runners_maximum_count	The maximum number of runners that will be created. Setting the variable to `-1` desiables the maximum check.	`number`	`3`	no
s3_runner_binaries	Bucket details for cached GitHub binary.	object({ arn = string id = string key = string })	n/a	yes
scale_down_schedule_expression	Scheduler expression to check every x for scale down.	`string`	`"cron(/5 * * ? *)"`	no
scale_up_reserved_concurrent_executions	Amount of reserved concurrent executions for the scale-up lambda function. A value of 0 disables lambda from being triggered and -1 removes any concurrency limitations.	`number`	`1`	no
sqs_build_queue	SQS queue to consume accepted build events.	object({ arn = string url = string })	n/a	yes
ssm_housekeeper	Configuration for the SSM housekeeper lambda. This lambda deletes token / JIT config from SSM. `schedule_expression`: is used to configure the schedule for the lambda. `state`: state of the cloudwatch event rule. Valid values are `DISABLED`, `ENABLED`, and `ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS`. `lambda_memory_size`: lambda memery size limit. `lambda_timeout`: timeout for the lambda in seconds. `config`: configuration for the lambda function. Token path will be read by default from the module.	object({ schedule_expression = optional(string, "rate(1 day)") state = optional(string, "ENABLED") lambda_memory_size = optional(number, 512) lambda_timeout = optional(number, 60) config = object({ tokenPath = optional(string) minimumDaysOld = optional(number, 1) dryRun = optional(bool, false) }) })	{ "config": {} }	no
ssm_paths	The root path used in SSM to store configuration and secrets.	object({ root = string tokens = string config = string })	n/a	yes
subnet_ids	List of subnets in which the action runners will be launched, the subnets needs to be subnets in the `vpc_id`.	`list(string)`	n/a	yes
tags	Map of tags that will be added to created resources. By default resources will be tagged with name.	`map(string)`	`{}`	no
tracing_config	Configuration for lambda tracing.	object({ mode = optional(string, null) capture_http_requests = optional(bool, false) capture_error = optional(bool, false) })	`{}`	no
userdata_content	Alternative user-data content, replacing the templated one. By providing your own user_data you have to take care of installing all required software, including the action runner and registering the runner. Be-aware configuration paramaters in SSM as well as tags are treated as internals. Changes will not trigger a breaking release.	`string`	`null`	no
userdata_post_install	User-data script snippet to insert after GitHub action runner install	`string`	`""`	no
userdata_pre_install	User-data script snippet to insert before GitHub action runner install	`string`	`""`	no
userdata_template	Alternative user-data template file path, replacing the default template. By providing your own user_data you have to take care of installing all required software, including the action runner. Variables userdata_pre/post_install are ignored.	`string`	`null`	no
vpc_id	The VPC for the security groups.	`string`	n/a	yes

Outputs

Name	Description
lambda_pool	n/a
lambda_pool_log_group	n/a
lambda_scale_down	n/a
lambda_scale_down_log_group	n/a
lambda_scale_up	n/a
lambda_scale_up_log_group	n/a
launch_template	n/a
logfiles	List of logfiles to send to CloudWatch. Object description: `log_group_name`: Name of the log group, `file_path`: path to the log file, `log_stream_name`: name of the log stream.
role_pool	n/a
role_runner	n/a
role_scale_down	n/a
role_scale_up	n/a
runners_log_groups	List of log groups from different log files of runner machine.