Module - Multi runner
This module replaces the top-level module to make it easy to create with one deployment multiple type of runners.
This module creates many runners with a single GitHub app. The module utilizes the internal modules and deploys parts of the stack for each runner defined.
The module takes a configuration as input containing a matcher for the labels. The webhook lambda is using the configuration to delegate events based on the labels in the workflow job and sent them to a dedicated queue based on the configuration. Events on each queue are processed by a dedicated lambda per configuration to scale runners.
For each configuration:
- When enabled, the distribution syncer is deployed for each unique combination of OS and architecture.
- For each configuration a queue is created and runner module is deployed
Matching
Matching of the configuration is done based on the labels specified in labelMatchers configuration. The webhook is processing the workflow_job
event and match the labels against the labels specified in labelMatchers configuration in the order of configuration with exact-match true first, followed by all exact matches false.
The catch
Controlling which event is taken up by which runner is not to this module. It is completely done by GitHub. This means when potentially different runners can run the same job there is nothing that can be done to guarantee a certain runner will take up the job.
An example, given you have two runners one with the labels. self-hosted, linux, x64, large
and one with the labels self-hosted, linux, x64, small
. Once you define a subset of the labels in the workflow, for example self-hosted, linux, x64
. Both runners can take the job potentially. You can define to scale one of the runners for the event, but still there is no guarantee that the scaled runner takes the job. The workflow with subset of labels (self-hosted, linux, x64
) can take up runner with specific labels (self-hosted, linux, x64, large
) and leave the workflow with labels (self-hosted, linux, x64, large
) be without the runner.
The only mitigation that is available right now is to use a small pool of runners. Pool instances can also exist for a short amount of time and only created once in x time based on a cron expression.
Jobs not defining all all labels but for example only [self-hosted, linux]
could be matched to potentially different runners. The matcher scales the first runner that matches. With the attribute priority
the order of matchers can be defined.
Usages
A complete example is available in the examples, see the multi-runner example for actual implementation.
module "multi-runner" {
prefix = "multi-runner"
github_app = {
# app details
}
multi_runner_config = {
"linux-arm" = {
matcherConfig : {
labelMatchers = [["self-hosted", "linux", "arm64", "arm"]]
exactMatch = true
}
runner_config = {
runner_os = "linux"
runner_architecture = "arm64"
runner_extra_labels = "arm"
enable_ssm_on_runners = true
instance_types = ["t4g.large", "c6g.large"]
...
}
...
},
"linux-x64" = {
matcherConfig : {
labelMatchers = [["self-hosted", "linux", "x64"]]
exactMatch = false
}
runner_config = {
runner_os = "linux"
runner_architecture = "x64"
instance_types = ["m5ad.large", "m5a.large"]
enable_ephemeral_runners = true
delay_webhook_event = 0
...
}
...
}
}
}
Requirements
Name | Version |
---|---|
terraform | >= 1.3 |
aws | ~> 5.27 |
random | ~> 3.0 |
Providers
Name | Version |
---|---|
aws | ~> 5.27 |
random | ~> 3.0 |
Modules
Name | Source | Version |
---|---|---|
ami_housekeeper | ../ami-housekeeper | n/a |
instance_termination_watcher | ../termination-watcher | n/a |
runner_binaries | ../runner-binaries-syncer | n/a |
runners | ../runners | n/a |
ssm | ../ssm | n/a |
webhook | ../webhook | n/a |
Resources
Name | Type |
---|---|
aws_sqs_queue.queued_builds | resource |
aws_sqs_queue.queued_builds_dlq | resource |
aws_sqs_queue_policy.build_queue_dlq_policy | resource |
aws_sqs_queue_policy.build_queue_policy | resource |
random_string.random | resource |
aws_iam_policy_document.deny_unsecure_transport | data source |
Inputs
Name | Description | Type | Default | Required |
---|---|---|---|---|
ami_housekeeper_cleanup_config | Configuration for AMI cleanup. | object({ |
{} |
no |
ami_housekeeper_lambda_memory_size | Memory size linit in MB of the lambda. | number |
256 |
no |
ami_housekeeper_lambda_s3_key | S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas. | string |
null |
no |
ami_housekeeper_lambda_s3_object_version | S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket. | string |
null |
no |
ami_housekeeper_lambda_schedule_expression | Scheduler expression for action runner binary syncer. | string |
"cron(11 7 * * ? *)" |
no |
ami_housekeeper_lambda_timeout | Time out of the lambda in seconds. | number |
300 |
no |
ami_housekeeper_lambda_zip | File location of the lambda zip file. | string |
null |
no |
associate_public_ipv4_address | Associate public IPv4 with the runner. Only tested with IPv4 | bool |
false |
no |
aws_partition | (optiona) partition in the arn namespace to use if not 'aws' | string |
"aws" |
no |
aws_region | AWS region. | string |
n/a | yes |
cloudwatch_config | (optional) Replaces the module default cloudwatch log config. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html for details. | string |
null |
no |
enable_ami_housekeeper | Option to disable the lambda to clean up old AMIs. | bool |
false |
no |
enable_managed_runner_security_group | Enabling the default managed security group creation. Unmanaged security groups can be specified via runner_additional_security_group_ids . |
bool |
true |
no |
enable_metrics_control_plane | (Experimental) Enable or disable the metrics for the module. Feature can change or renamed without a major release. | bool |
false |
no |
eventbridge | Enable the use of EventBridge by the module. By enabling this feature events will be put on the EventBridge by the webhook instead of directly dispatching to queues for scaling. | object({ |
{} |
no |
ghes_ssl_verify | GitHub Enterprise SSL verification. Set to 'false' when custom certificate (chains) is used for GitHub Enterprise Server (insecure). | bool |
true |
no |
ghes_url | GitHub Enterprise Server URL. Example: https://github.internal.co - DO NOT SET IF USING PUBLIC GITHUB | string |
null |
no |
github_app | GitHub app parameters, see your github app. Ensure the key is the base64-encoded .pem file (the output of base64 app.private-key.pem , not the content of private-key.pem ). |
object({ |
n/a | yes |
instance_profile_path | The path that will be added to the instance_profile, if not set the environment name will be used. | string |
null |
no |
instance_termination_watcher | Configuration for the spot termination watcher lambda function. This feature is Beta, changes will not trigger a major release as long in beta.enable : Enable or disable the spot termination watcher.memory_size : Memory size linit in MB of the lambda.s3_key : S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.s3_object_version : S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.timeout : Time out of the lambda in seconds.zip : File location of the lambda zip file. |
object({ |
{} |
no |
key_name | Key pair name | string |
null |
no |
kms_key_arn | Optional CMK Key ARN to be used for Parameter Store. | string |
null |
no |
lambda_architecture | AWS Lambda architecture. Lambda functions using Graviton processors ('arm64') tend to have better price/performance than 'x86_64' functions. | string |
"arm64" |
no |
lambda_principals | (Optional) add extra principals to the role created for execution of the lambda, e.g. for local testing. | list(object({ |
[] |
no |
lambda_runtime | AWS Lambda runtime. | string |
"nodejs20.x" |
no |
lambda_s3_bucket | S3 bucket from which to specify lambda functions. This is an alternative to providing local files directly. | string |
null |
no |
lambda_security_group_ids | List of security group IDs associated with the Lambda function. | list(string) |
[] |
no |
lambda_subnet_ids | List of subnets in which the action runners will be launched, the subnets needs to be subnets in the vpc_id . |
list(string) |
[] |
no |
lambda_tags | Map of tags that will be added to all the lambda function resources. Note these are additional tags to the default tags. | map(string) |
{} |
no |
log_level | Logging level for lambda logging. Valid values are 'silly', 'trace', 'debug', 'info', 'warn', 'error', 'fatal'. | string |
"info" |
no |
logging_kms_key_id | Specifies the kms key id to encrypt the logs with | string |
null |
no |
logging_retention_in_days | Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653. | number |
180 |
no |
matcher_config_parameter_store_tier | The tier of the parameter store for the matcher configuration. Valid values are Standard , and Advanced . |
string |
"Standard" |
no |
metrics | Configuration for metrics created by the module, by default metrics are disabled to avoid additional costs. When metrics are enable all metrics are created unless explicit configured otherwise. | object({ |
{} |
no |
metrics_namespace | The namespace for the metrics created by the module. Merics will only be created if explicit enabled. | string |
null |
no |
multi_runner_config | multi_runner_config = { runner_config: { runner_os: "The EC2 Operating System type to use for action runner instances (linux,windows)." runner_architecture: "The platform architecture of the runner instance_type." runner_metadata_options: "(Optional) Metadata options for the ec2 runner instances." ami_filter: "(Optional) List of maps used to create the AMI filter for the action runner AMI. By default amazon linux 2 is used." ami_owners: "(Optional) The list of owners used to select the AMI of action runner instances." create_service_linked_role_spot: (Optional) create the serviced linked role for spot instances that is required by the scale-up lambda. credit_specification: "(Optional) The credit specification of the runner instance_type. Can be unset, standard or unlimited .delay_webhook_event: "The number of seconds the event accepted by the webhook is invisible on the queue before the scale up lambda will receive the event." disable_runner_autoupdate: "Disable the auto update of the github runner agent. Be aware there is a grace period of 30 days, see also the GitHub article" ebs_optimized: "The EC2 EBS optimized configuration." enable_ephemeral_runners: "Enable ephemeral runners, runners will only be used once." enable_job_queued_check: "Enables JIT configuration for creating runners instead of registration token based registraton. JIT configuration will only be applied for ephemeral runners. By default JIT confiugration is enabled for ephemeral runners an can be disabled via this override. When running on GHES without support for JIT configuration this variable should be set to true for ephemeral runners." enable_on_demand_failover_for_errors: "Enable on-demand failover. For example to fall back to on demand when no spot capacity is available the variable can be set to InsufficientInstanceCapacity . When not defined the default behavior is to retry later."enable_organization_runners: "Register runners to organization, instead of repo level" enable_runner_binaries_syncer: "Option to disable the lambda to sync GitHub runner distribution, useful when using a pre-build AMI." enable_ssm_on_runners: "Enable to allow access the runner instances for debugging purposes via SSM. Note that this adds additional permissions to the runner instances." enable_userdata: "Should the userdata script be enabled for the runner. Set this to false if you are using your own prebuilt AMI." instance_allocation_strategy: "The allocation strategy for spot instances. AWS recommends to use capacity-optimized however the AWS default is lowest-price ."instance_max_spot_price: "Max price price for spot intances per hour. This variable will be passed to the create fleet as max spot price for the fleet." instance_target_capacity_type: "Default lifecycle used for runner instances, can be either spot or on-demand ."instance_types: "List of instance types for the action runner. Defaults are based on runner_os (al2023 for linux and Windows Server Core for win)." job_queue_retention_in_seconds: "The number of seconds the job is held in the queue before it is purged" minimum_running_time_in_minutes: "The time an ec2 action runner should be running at minimum before terminated if not busy." pool_runner_owner: "The pool will deploy runners to the GitHub org ID, set this value to the org to which you want the runners deployed. Repo level is not supported." runner_additional_security_group_ids: "List of additional security groups IDs to apply to the runner. If added outside the multi_runner_config block, the additional security group(s) will be applied to all runner configs. If added inside the multi_runner_config, the additional security group(s) will be applied to the individual runner." runner_as_root: "Run the action runner under the root user. Variable runner_run_as will be ignored."runner_boot_time_in_minutes: "The minimum time for an EC2 runner to boot and register as a runner." runner_disable_default_labels: "Disable default labels for the runners (os, architecture and self-hosted ). If enabled, the runner will only have the extra labels provided in runner_extra_labels . In case you on own start script is used, this configuration parameter needs to be parsed via SSM."runner_extra_labels: "Extra (custom) labels for the runners (GitHub). Separate each label by a comma. Labels checks on the webhook can be enforced by setting multi_runner_config.matcherConfig.exactMatch . GitHub read-only labels should not be provided."runner_group_name: "Name of the runner group." runner_name_prefix: "Prefix for the GitHub runner name." runner_run_as: "Run the GitHub actions agent as user." runners_maximum_count: "The maximum number of runners that will be created. Setting the variable to -1 desiables the maximum check."scale_down_schedule_expression: "Scheduler expression to check every x for scale down." scale_up_reserved_concurrent_executions: "Amount of reserved concurrent executions for the scale-up lambda function. A value of 0 disables lambda from being triggered and -1 removes any concurrency limitations." userdata_template: "Alternative user-data template, replacing the default template. By providing your own user_data you have to take care of installing all required software, including the action runner. Variables userdata_pre/post_install are ignored." enable_jit_config "Overwrite the default behavior for JIT configuration. By default JIT configuration is enabled for ephemeral runners and disabled for non-ephemeral runners. In case of GHES check first if the JIT config API is avaialbe. In case you upgradeing from 3.x to 4.x you can set enable_jit_config to false to avoid a breaking change when having your own AMI."enable_runner_detailed_monitoring: "Should detailed monitoring be enabled for the runner. Set this to true if you want to use detailed monitoring. See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-cloudwatch-new.html for details." enable_cloudwatch_agent: "Enabling the cloudwatch agent on the ec2 runner instances, the runner contains default config. Configuration can be overridden via cloudwatch_config ."cloudwatch_config: "(optional) Replaces the module default cloudwatch log config. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html for details." userdata_pre_install: "Script to be ran before the GitHub Actions runner is installed on the EC2 instances" userdata_post_install: "Script to be ran after the GitHub Actions runner is installed on the EC2 instances" runner_ec2_tags: "Map of tags that will be added to the launch template instance tag specifications." runner_iam_role_managed_policy_arns: "Attach AWS or customer-managed IAM policies (by ARN) to the runner IAM role" vpc_id: "The VPC for security groups of the action runners. If not set uses the value of var.vpc_id ."subnet_ids: "List of subnets in which the action runners will be launched, the subnets needs to be subnets in the vpc_id . If not set, uses the value of var.subnet_ids ."idle_config: "List of time period that can be defined as cron expression to keep a minimum amount of runners active instead of scaling down to 0. By defining this list you can ensure that in time periods that match the cron expression within 5 seconds a runner is kept idle." runner_log_files: "(optional) Replaces the module default cloudwatch log config. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html for details." block_device_mappings: "The EC2 instance block device configuration. Takes the following keys: device_name , delete_on_termination , volume_type , volume_size , encrypted , iops , throughput , kms_key_id , snapshot_id ."job_retry: "Experimental! Can be removed / changed without trigger a major release. Configure job retries. The configuration enables job retries (for ephemeral runners). After creating the insances a message will be published to a job retry queue. The job retry check lambda is checking after a delay if the job is queued. If not the message will be published again on the scale-up (build queue). Using this feature can impact the reate limit of the GitHub app." pool_config: "The configuration for updating the pool. The pool_size to adjust to by the events triggered by the schedule_expression . For example you can configure a cron expression for week days to adjust the pool to 10 and another expression for the weekend to adjust the pool to 1. Use schedule_expression_timezone to override the schedule time zone (defaults to UTC)."} matcherConfig: { labelMatchers: "The list of list of labels supported by the runner configuration. [[self-hosted, linux, x64, example]] "exactMatch: "If set to true all labels in the workflow job must match the GitHub labels (os, architecture and self-hosted ). When false if any workflow label matches it will trigger the webhook."priority: "If set it defines the priority of the matcher, the matcher with the lowest priority will be evaluated first. Default is 999, allowed values 0-999." } fifo: "Enable a FIFO queue to remain the order of events received by the webhook. Suggest to set to true for repo level runners." redrive_build_queue: "Set options to attach (optional) a dead letter queue to the build queue, the queue between the webhook and the scale up lambda. You have the following options. 1. Disable by setting enabled to false. 2. Enable by setting enabled to true , maxReceiveCount to a number of max retries."} |
map(object({ |
n/a | yes |
pool_lambda_reserved_concurrent_executions | Amount of reserved concurrent executions for the scale-up lambda function. A value of 0 disables lambda from being triggered and -1 removes any concurrency limitations. | number |
1 |
no |
pool_lambda_timeout | Time out for the pool lambda in seconds. | number |
60 |
no |
prefix | The prefix used for naming resources | string |
"github-actions" |
no |
queue_encryption | Configure how data on queues managed by the modules in ecrypted at REST. Options are encryped via SSE, non encrypted and via KMSS. By default encryptes via SSE is enabled. See for more details the Terraform aws_sqs_queue resource https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sqs_queue. |
object({ |
{ |
no |
repository_white_list | List of github repository full names (owner/repo_name) that will be allowed to use the github app. Leave empty for no filtering. | list(string) |
[] |
no |
role_path | The path that will be added to the role; if not set, the environment name will be used. | string |
null |
no |
role_permissions_boundary | Permissions boundary that will be added to the created role for the lambda. | string |
null |
no |
runner_additional_security_group_ids | (optional) List of additional security groups IDs to apply to the runner | list(string) |
[] |
no |
runner_binaries_s3_sse_configuration | Map containing server-side encryption configuration for runner-binaries S3 bucket. | any |
{ |
no |
runner_binaries_s3_versioning | Status of S3 versioning for runner-binaries S3 bucket. Once set to Enabled the change cannot be reverted via Terraform! | string |
"Disabled" |
no |
runner_binaries_syncer_lambda_timeout | Time out of the binaries sync lambda in seconds. | number |
300 |
no |
runner_binaries_syncer_lambda_zip | File location of the binaries sync lambda zip file. | string |
null |
no |
runner_binaries_syncer_memory_size | Memory size limit in MB for binary syncer lambda. | number |
256 |
no |
runner_egress_rules | List of egress rules for the GitHub runner instances. | list(object({ |
[ |
no |
runners_lambda_s3_key | S3 key for runners lambda function. Required if using S3 bucket to specify lambdas. | string |
null |
no |
runners_lambda_s3_object_version | S3 object version for runners lambda function. Useful if S3 versioning is enabled on source bucket. | string |
null |
no |
runners_lambda_zip | File location of the lambda zip file for scaling runners. | string |
null |
no |
runners_scale_down_lambda_timeout | Time out for the scale down lambda in seconds. | number |
60 |
no |
runners_scale_up_lambda_timeout | Time out for the scale up lambda in seconds. | number |
30 |
no |
runners_ssm_housekeeper | Configuration for the SSM housekeeper lambda. This lambda deletes token / JIT config from SSM.schedule_expression : is used to configure the schedule for the lambda.enabled : enable or disable the lambda trigger via the EventBridge.lambda_memory_size : lambda memery size limit.lambda_timeout : timeout for the lambda in seconds.config : configuration for the lambda function. Token path will be read by default from the module. |
object({ |
{ |
no |
scale_down_lambda_memory_size | Memory size limit in MB for scale down. | number |
512 |
no |
scale_up_lambda_memory_size | Memory size limit in MB for scale_up lambda. | number |
512 |
no |
ssm_paths | The root path used in SSM to store configuration and secreets. | object({ |
{} |
no |
state_event_rule_binaries_syncer | Option to disable EventBridge Lambda trigger for the binary syncer, useful to stop automatic updates of binary distribution | string |
"ENABLED" |
no |
subnet_ids | List of subnets in which the action runners will be launched, the subnets needs to be subnets in the vpc_id . |
list(string) |
n/a | yes |
syncer_lambda_s3_key | S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas. | string |
null |
no |
syncer_lambda_s3_object_version | S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket. | string |
null |
no |
tags | Map of tags that will be added to created resources. By default resources will be tagged with name and environment. | map(string) |
{} |
no |
tracing_config | Configuration for lambda tracing. | object({ |
{} |
no |
vpc_id | The VPC for security groups of the action runners. | string |
n/a | yes |
webhook_lambda_apigateway_access_log_settings | Access log settings for webhook API gateway. | object({ |
null |
no |
webhook_lambda_memory_size | Memory size limit in MB for webhook lambda. | number |
256 |
no |
webhook_lambda_s3_key | S3 key for webhook lambda function. Required if using S3 bucket to specify lambdas. | string |
null |
no |
webhook_lambda_s3_object_version | S3 object version for webhook lambda function. Useful if S3 versioning is enabled on source bucket. | string |
null |
no |
webhook_lambda_timeout | Time out of the lambda in seconds. | number |
10 |
no |
webhook_lambda_zip | File location of the webhook lambda zip file. | string |
null |
no |
Outputs
Name | Description |
---|---|
binaries_syncer_map | n/a |
instance_termination_handler | n/a |
instance_termination_watcher | n/a |
runners_map | n/a |
ssm_parameters | n/a |
webhook | n/a |