Setup Guide - GitHub Actions Self Hosted Runners using AWS EKS

Setup Guide - GitHub Actions Self Hosted Runners using AWS EKS

Instead of manually managing GitHub Actions runners, we can use a Kubernetes operator called Actions Runner Controller (ARC).
Instead of using the provided runner which is a virtual machine by Github that has limited performance and runs sequentially, we can use AWS EC2 instances via a Kubernetes cluster managed by AWS EKS (Elastic Kubernetes Service).
ARC automatically scales the number of runners based on your workflow load so it can quickly scale up or down depending on demand.
For example, you have a bunch of teams all wanting to test their PRs at the same time, rather than them having to queue we can have multiple runners test each of their branches asap so devs get faster feedback. Once the testing is done and theirs no more work, the extra runners shutoff and stop costing money.
For this example, we'll set up ARC on AWS EKS using Helm (a package manager for Kubernetes) in a way thats easy to understand step-by-step.
 
notion image
 

Configure Command-Line Tools

(For Linux/MacOs)
AWS CLI
  1. Install: brew install awscli
  1. Configure: aws configure - Enter access key, secret key, region, and output format, docs here: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-quickstart.html
Terraform:
Install: brew install terraform
Kubectl:
Install: brew install kubectl
Helm:
Install: brew install helm
 
 

Setup AWS EKS with Terraform

Create a repo/folder directory for the project and some starter terraform files like these, e.g /terraform
Here we will write Infrastructure as Code (IaS) using terraform which will essentially create all our infrastructure resources is an easily reproducible way. This is ideal if you’re using EKS because you can spin everything up using terraform apply from the command line and then when you’re done just run terraform destroy to ensure you aren’t getting charged long term
 
To save time you may download this .zip file which contains starter Terraform code:
Then change any availability zones and names to match your project
Alternatively, follow these steps below to configure from scratch or just to learn more about each file
 
To Deploy:
Initialise the files by running terraform init at the root directory followed by terraform apply and yes when prompted
 

Provider.tf

Configures Terraform to use the AWS provider for managing resources in the provided region
provider "aws" { region = "ap-southeast-2" } terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 3.0" } } }
 

VPC.tf

Adding a Virtual private Cloud is an essential foundation where we will later configure all of our infrastructure within. In this case you dont need to understand the cidr block but do note the name “aws_vpc”, using this we can reference the VPC from other files without needing any complex jargon
resource "aws_vpc" "main" { cidr_block = "10.0.0.0/16" tags = { Name = "main" } }
 
 

IGW.tf

An Internet GateWay allows resources within the VPC defined in vpc.tf to access the outside internet. Note, this alone costs $$$ per hour
resource "aws_internet_gateway" "igw" { vpc_id = aws_vpc.main.id tags = { Name = "igw" } }
 

Subnets.tf

This intimidating file defines four subnets within the VPC, creating isolated network segments for public and private resources across two availability zones for redundancy. Just be aware this is just necessary for the cluster to operate
Note: Ensure you change your availability zones if you;re not wanting to setup in ap-southeast-2
resource "aws_subnet" "private-ap-southeast-2a" { vpc_id = aws_vpc.main.id cidr_block = "10.0.0.0/19" availability_zone = "ap-southeast-2a" tags = { "Name" = "private-ap-southeast-2a" "kubernetes.io/role/internal-elb" = "1" "kubernetes.io/cluster/ekstreme" = "owned" } } resource "aws_subnet" "private-ap-southeast-2b" { vpc_id = aws_vpc.main.id cidr_block = "10.0.32.0/19" availability_zone = "ap-southeast-2b" tags = { "Name" = "private-ap-southeast-2b" "kubernetes.io/role/internal-elb" = "1" "kubernetes.io/cluster/ekstreme" = "owned" } } resource "aws_subnet" "public-ap-southeast-2a" { vpc_id = aws_vpc.main.id cidr_block = "10.0.64.0/19" availability_zone = "ap-southeast-2a" tags = { "Name" = "public-ap-southeast-2a" "kubernetes.io/role/elb" = "1" "kubernetes.io/cluster/ekstreme" = "owned" } } resource "aws_subnet" "public-ap-southeast-2b" { vpc_id = aws_vpc.main.id cidr_block = "10.0.96.0/19" availability_zone = "ap-southeast-2b" tags = { "Name" = "public-ap-southeast-2b" "kubernetes.io/role/elb" = "1" "kubernetes.io/cluster/ekstreme" = "owned" } }
 

NAT.tf

This file creates a Network Address Translation (NAT) Gateway, allowing private resources within the VPC to access the internet via a single public IP address (egress). Note the depends_on, this command ensures this NAT is created only after the earlier Internet Gateway (IGW.tf) files has completed
resource "aws_eip" "nat" { vpc = true tags = { Name = "nat" } } resource "aws_nat_gateway" "nat" { allocation_id = aws_eip.nat.id subnet_id = aws_subnet.public-us-east-1a.id tags = { Name = "nat" } depends_on = [aws_internet_gateway.igw] }
 

Routes.tf

Another intimidating file but it’s mostly boilerplate, it creates a ‘route-table’ which essentially just connects traffic between the public and private subnets.
resource "aws_route_table" "private" { vpc_id = aws_vpc.main.id route = [ { cidr_block = "0.0.0.0/0" nat_gateway_id = aws_nat_gateway.nat.id carrier_gateway_id = "" destination_prefix_list_id = "" egress_only_gateway_id = "" gateway_id = "" instance_id = "" ipv6_cidr_block = "" local_gateway_id = "" network_interface_id = "" transit_gateway_id = "" vpc_endpoint_id = "" vpc_peering_connection_id = "" }, ] tags = { Name = "private" } } resource "aws_route_table" "public" { vpc_id = aws_vpc.main.id route = [ { cidr_block = "0.0.0.0/0" gateway_id = aws_internet_gateway.igw.id nat_gateway_id = "" carrier_gateway_id = "" destination_prefix_list_id = "" egress_only_gateway_id = "" instance_id = "" ipv6_cidr_block = "" local_gateway_id = "" network_interface_id = "" transit_gateway_id = "" vpc_endpoint_id = "" vpc_peering_connection_id = "" }, ] tags = { Name = "public" } } resource "aws_route_table_association" "private-ap-southeast-2a" { subnet_id = aws_subnet.private-ap-southeast-2.id route_table_id = aws_route_table.private.id } resource "aws_route_table_association" "private-ap-southeast-2b" { subnet_id = aws_subnet.private-ap-southeast-2b.id route_table_id = aws_route_table.private.id } resource "aws_route_table_association" "public-ap-southeast-2a" { subnet_id = aws_subnet.public-ap-southeast-2a.id route_table_id = aws_route_table.public.id } resource "aws_route_table_association" "public-ap-southeast-2b" { subnet_id = aws_subnet.public-ap-southeast-2b.id route_table_id = aws_route_table.public.id }
 

EKS.tf

The first interesting file, here we (somewhat cheekily) define a security policy which we should ideally do in a dedicated IAM.tf.
Then we have EKS spin up the cluster using the subnets we create earlier
resource "aws_iam_role" "ap-southeast-2" { name = "eks-cluster-demo" assume_role_policy = <<POLICY { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "eks.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } POLICY } resource "aws_iam_role_policy_attachment" "demo-AmazonEKSClusterPolicy" { policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy" role = aws_iam_role.demo.name } resource "aws_eks_cluster" "ekstreme-demo" { name = "eks-cluster-demo" role_arn = aws_iam_role.ap-southeast-2.arn vpc_config { subnet_ids = [ aws_subnet.private-ap-southeast-2a.id, aws_subnet.private-ap-southeast-2b.id, aws_subnet.public-ap-southeast-2a.id, aws_subnet.public-ap-southeast-2b.id ] } depends_on = [aws_iam_role_policy_attachment.demo-AmazonEKSClusterPolicy] }
 

Nodes.tf

If you’ve never used Kubernetes before, a node is essentially an abstraction for a physical machine. In the case of AWS, we’ll use an EC2 instance of type t3.small
resource "aws_iam_role" "nodes" { name = "eks-node-group-nodes" assume_role_policy = jsonencode({ Statement = [{ Action = "sts:AssumeRole" Effect = "Allow" Principal = { Service = "ec2.amazonaws.com" } }] Version = "2012-10-17" }) } resource "aws_iam_role_policy_attachment" "nodes-AmazonEKSWorkerNodePolicy" { policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy" role = aws_iam_role.nodes.name } resource "aws_iam_role_policy_attachment" "nodes-AmazonEKS_CNI_Policy" { policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy" role = aws_iam_role.nodes.name } resource "aws_iam_role_policy_attachment" "nodes-AmazonEC2ContainerRegistryReadOnly" { policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly" role = aws_iam_role.nodes.name } resource "aws_eks_node_group" "general" { cluster_name = aws_eks_cluster.ekstreme.name node_group_name = "general" node_role_arn = aws_iam_role.nodes.arn subnet_ids = [ aws_subnet.private-ap-southeast-2a.id, aws_subnet.private-ap-southeast-2b.id ] capacity_type = "ON_DEMAND" instance_types = ["t3.small"] scaling_config { desired_size = 2 max_size = 5 min_size = 1 } update_config { max_unavailable = 1 } # Ensure that IAM Role permissions are created before and deleted after EKS Node Group handling. # Otherwise, EKS will not be able to properly delete EC2 Instances and Elastic Network Interfaces. depends_on = [ aws_iam_role_policy_attachment.nodes-AmazonEKSWorkerNodePolicy, aws_iam_role_policy_attachment.nodes-AmazonEKS_CNI_Policy, aws_iam_role_policy_attachment.nodes-AmazonEC2ContainerRegistryReadOnly, ] }
This last one is quite interesting as its how you can later easily modify the ‘power’ of your cluster, using terraform to ensure the service never goes down but can scale with relative ease. Simply update the values for
  • desired_sizeRun 2 EC2 instances by default)
  • max_size When workloads are intense, spin upto a total of 5 more EC2 instances to handle the increased demand
  • min_sizeAt worst never have less than one EC2 running and able to handle the workload, ie if upgrading or resetting it should only disable a single EC2 instance at a time
 
 
 
 
notion image

Configure Kubernetes Cluster

1- Connect your local Kubectl to your deployed AWS cluster
aws eks --region <ap-southeast-2> update-kubeconfig \ --name <ekstreme>
 
2- Check the connection
kubectl get svc
If you see a table with some resources, you’re good to go
 
Otherwise, keep following below
 
3- Setup Github
  1. Create a test repo which you’ll use to test your runner. If you have existing Github Actions in that repo, simply add runs-on: <your chosen installation name> Under any jobs you want on the self-hosted cluster. Alternatively, you can just download this demo-workflow.yaml file and place it under .github/workflows in the repo root directory:
  1. In your GitHub repo, create a PAT (Personal Access Token) with permissions enabled for Repo and Workflow: https://github.com/settings/tokens
  1. Keep the repo link and PAT token safe for the next step
 
4- Use Helm to install the basic Kubernetes resources
NAMESPACE="arc-systems" helm install arc \ --namespace "${NAMESPACE}" \ --create-namespace \ oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller
 
5- Connect K8s to your Github repo with the token and repo you created in step #3
INSTALLATION_NAME="arc-runner-set" NAMESPACE="arc-runners" GITHUB_CONFIG_URL="https://github.com/<your_enterprise/org/repo>" GITHUB_PAT="<PAT Toen>" helm install "${INSTALLATION_NAME}" \ --namespace "${NAMESPACE}" \ --create-namespace \ --set githubConfigUrl="${GITHUB_CONFIG_URL}" \ --set githubConfigSecret.github_token="${GITHUB_PAT}" \ oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set
 
6- Open the Actions tab in your Github repo and run the workflow you just added
notion image
 
7- Quickly run kubectl get pods -n arc-runnersin your terminal to see a pod spin up which will execute your workflow
 
8- Once its done you can open the workflow back in the GitHub UI, open the logs and in the first step you can see what machine it ran on
 
And thats it! Congrats you’ve now got EKS running your workflows. The next steps will help to configure some specific changes based on various needs
 
notion image
 
 

Guides and practical notes, training references, and code snippets shared freely for learning and career growth.