Github Actions Self-Hosted Runners with AWS EKS

Instead of manually managing GitHub Actions runners, we can use a Kubernetes operator called Actions Runner Controller (ARC).

Instead of using the provided runner which is a virtual machine by Github that has limited performance and runs sequentially, we can use AWS EC2 instances via a Kubernetes cluster managed by AWS EKS (Elastic Kubernetes Service).

ARC automatically scales the number of runners based on your workflow load so it can quickly scale up or down depending on demand.

For example, you have a bunch of teams all wanting to test their PRs at the same time, rather than them having to queue we can have multiple runners test each of their branches asap so devs get faster feedback. Once the testing is done and theirs no more work, the extra runners shutoff and stop costing money.

For this example, we'll set up ARC on AWS EKS using Helm (a package manager for Kubernetes) in a way thats easy to understand step-by-step.

Configure Command-Line Tools

(For Linux/MacOs)

AWS CLI

Install: brew install awscli

Configure: aws configure - Enter access key, secret key, region, and output format, docs here: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-quickstart.html

Terraform:

Install: brew install terraform

Kubectl:

Install: brew install kubectl

Helm:

Install: brew install helm

Setup AWS EKS with Terraform

Create a repo/folder directory for the project and some starter terraform files like these, e.g /terraform

Here we will write Infrastructure as Code (IaS) using terraform which will essentially create all our infrastructure resources is an easily reproducible way. This is ideal if you’re using EKS because you can spin everything up using terraform apply from the command line and then when you’re done just run terraform destroy to ensure you aren’t getting charged long term

To save time you may download this .zip file which contains starter Terraform code:

eksterraform.zip

7.1KB

Then change any availability zones and names to match your project

Alternatively, follow these steps below to configure from scratch or just to learn more about each file

To Deploy:

Initialise the files by running terraform init at the root directory followed by terraform apply and yes when prompted

Provider.tf

Configures Terraform to use the AWS provider for managing resources in the provided region


provider "aws" {
  region = "ap-southeast-2"
}

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 3.0"
    }
  }
}

VPC.tf

Adding a Virtual private Cloud is an essential foundation where we will later configure all of our infrastructure within. In this case you dont need to understand the cidr block but do note the name “aws_vpc”, using this we can reference the VPC from other files without needing any complex jargon


resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"

  tags = {
    Name = "main"
  }
}

IGW.tf

An Internet GateWay allows resources within the VPC defined in vpc.tf to access the outside internet. Note, this alone costs $$$ per hour


resource "aws_internet_gateway" "igw" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "igw"
  }
}

Subnets.tf

This intimidating file defines four subnets within the VPC, creating isolated network segments for public and private resources across two availability zones for redundancy. Just be aware this is just necessary for the cluster to operate

Note: Ensure you change your availability zones if you;re not wanting to setup in ap-southeast-2


resource "aws_subnet" "private-ap-southeast-2a" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.0.0/19"
  availability_zone = "ap-southeast-2a"

  tags = {
    "Name"                            = "private-ap-southeast-2a"
    "kubernetes.io/role/internal-elb" = "1"
    "kubernetes.io/cluster/ekstreme"  = "owned"
  }
}

resource "aws_subnet" "private-ap-southeast-2b" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.32.0/19"
  availability_zone = "ap-southeast-2b"

  tags = {
    "Name"                            = "private-ap-southeast-2b"
    "kubernetes.io/role/internal-elb" = "1"
    "kubernetes.io/cluster/ekstreme"  = "owned"
  }
}

resource "aws_subnet" "public-ap-southeast-2a" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.64.0/19"
  availability_zone = "ap-southeast-2a"

  tags = {
    "Name"                           = "public-ap-southeast-2a"
    "kubernetes.io/role/elb"         = "1"
    "kubernetes.io/cluster/ekstreme" = "owned"
  }
}

resource "aws_subnet" "public-ap-southeast-2b" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.96.0/19"
  availability_zone = "ap-southeast-2b"

  tags = {
    "Name"                           = "public-ap-southeast-2b"
    "kubernetes.io/role/elb"         = "1"
    "kubernetes.io/cluster/ekstreme" = "owned"
  }
}

NAT.tf

This file creates a Network Address Translation (NAT) Gateway, allowing private resources within the VPC to access the internet via a single public IP address (egress). Note the depends_on, this command ensures this NAT is created only after the earlier Internet Gateway (IGW.tf) files has completed


resource "aws_eip" "nat" {
  vpc = true

  tags = {
    Name = "nat"
  }
}

resource "aws_nat_gateway" "nat" {
  allocation_id = aws_eip.nat.id
  subnet_id     = aws_subnet.public-us-east-1a.id

  tags = {
    Name = "nat"
  }

  depends_on = [aws_internet_gateway.igw]
}

Routes.tf

Another intimidating file but it’s mostly boilerplate, it creates a ‘route-table’ which essentially just connects traffic between the public and private subnets.


resource "aws_route_table" "private" {
  vpc_id = aws_vpc.main.id

  route = [
    {
      cidr_block                 = "0.0.0.0/0"
      nat_gateway_id             = aws_nat_gateway.nat.id
      carrier_gateway_id         = ""
      destination_prefix_list_id = ""
      egress_only_gateway_id     = ""
      gateway_id                 = ""
      instance_id                = ""
      ipv6_cidr_block            = ""
      local_gateway_id           = ""
      network_interface_id       = ""
      transit_gateway_id         = ""
      vpc_endpoint_id            = ""
      vpc_peering_connection_id  = ""
    },
  ]

  tags = {
    Name = "private"
  }
}

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route = [
    {
      cidr_block                 = "0.0.0.0/0"
      gateway_id                 = aws_internet_gateway.igw.id
      nat_gateway_id             = ""
      carrier_gateway_id         = ""
      destination_prefix_list_id = ""
      egress_only_gateway_id     = ""
      instance_id                = ""
      ipv6_cidr_block            = ""
      local_gateway_id           = ""
      network_interface_id       = ""
      transit_gateway_id         = ""
      vpc_endpoint_id            = ""
      vpc_peering_connection_id  = ""
    },
  ]

  tags = {
    Name = "public"
  }
}

resource "aws_route_table_association" "private-ap-southeast-2a" {
  subnet_id      = aws_subnet.private-ap-southeast-2.id
  route_table_id = aws_route_table.private.id
}

resource "aws_route_table_association" "private-ap-southeast-2b" {
  subnet_id      = aws_subnet.private-ap-southeast-2b.id
  route_table_id = aws_route_table.private.id
}

resource "aws_route_table_association" "public-ap-southeast-2a" {
  subnet_id      = aws_subnet.public-ap-southeast-2a.id
  route_table_id = aws_route_table.public.id
}

resource "aws_route_table_association" "public-ap-southeast-2b" {
  subnet_id      = aws_subnet.public-ap-southeast-2b.id
  route_table_id = aws_route_table.public.id
}

EKS.tf

The first interesting file, here we (somewhat cheekily) define a security policy which we should ideally do in a dedicated IAM.tf.

Then we have EKS spin up the cluster using the subnets we create earlier


resource "aws_iam_role" "ap-southeast-2" {
  name = "eks-cluster-demo"

  assume_role_policy = <<POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "eks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
POLICY
}

resource "aws_iam_role_policy_attachment" "demo-AmazonEKSClusterPolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = aws_iam_role.demo.name
}

resource "aws_eks_cluster" "ekstreme-demo" {
  name     = "eks-cluster-demo"
  role_arn = aws_iam_role.ap-southeast-2.arn

  vpc_config {
    subnet_ids = [
      aws_subnet.private-ap-southeast-2a.id,
      aws_subnet.private-ap-southeast-2b.id,
      aws_subnet.public-ap-southeast-2a.id,
      aws_subnet.public-ap-southeast-2b.id
    ]
  }

  depends_on = [aws_iam_role_policy_attachment.demo-AmazonEKSClusterPolicy]
}

Nodes.tf

If you’ve never used Kubernetes before, a node is essentially an abstraction for a physical machine. In the case of AWS, we’ll use an EC2 instance of type t3.small


resource "aws_iam_role" "nodes" {
  name = "eks-node-group-nodes"

  assume_role_policy = jsonencode({
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com"
      }
    }]
    Version = "2012-10-17"
  })
}

resource "aws_iam_role_policy_attachment" "nodes-AmazonEKSWorkerNodePolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = aws_iam_role.nodes.name
}

resource "aws_iam_role_policy_attachment" "nodes-AmazonEKS_CNI_Policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.nodes.name
}

resource "aws_iam_role_policy_attachment" "nodes-AmazonEC2ContainerRegistryReadOnly" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = aws_iam_role.nodes.name
}

resource "aws_eks_node_group" "general" {
  cluster_name    = aws_eks_cluster.ekstreme.name
  node_group_name = "general"
  node_role_arn   = aws_iam_role.nodes.arn

  subnet_ids = [
    aws_subnet.private-ap-southeast-2a.id,
    aws_subnet.private-ap-southeast-2b.id
  ]

  capacity_type  = "ON_DEMAND"
  instance_types = ["t3.small"]

  scaling_config {
    desired_size = 2
    max_size     = 5
    min_size     = 1
  }

  update_config {
    max_unavailable = 1
  }

  # Ensure that IAM Role permissions are created before and deleted after EKS Node Group handling.
  # Otherwise, EKS will not be able to properly delete EC2 Instances and Elastic Network Interfaces.
  depends_on = [
    aws_iam_role_policy_attachment.nodes-AmazonEKSWorkerNodePolicy,
    aws_iam_role_policy_attachment.nodes-AmazonEKS_CNI_Policy,
    aws_iam_role_policy_attachment.nodes-AmazonEC2ContainerRegistryReadOnly,
  ]
}

This last one is quite interesting as its how you can later easily modify the ‘power’ of your cluster, using terraform to ensure the service never goes down but can scale with relative ease. Simply update the values for

desired_sizeRun 2 EC2 instances by default)

max_size When workloads are intense, spin upto a total of 5 more EC2 instances to handle the increased demand

min_sizeAt worst never have less than one EC2 running and able to handle the workload, ie if upgrading or resetting it should only disable a single EC2 instance at a time

Configure Kubernetes Cluster

1- Connect your local Kubectl to your deployed AWS cluster


aws eks --region <ap-southeast-2> update-kubeconfig \
	--name <ekstreme>

2- Check the connection


kubectl get svc

If you see a table with some resources, you’re good to go

For the most up-to date instructions, follow the official guide here: https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/quickstart-for-actions-runner-controller

Otherwise, keep following below

3- Setup Github

Create a test repo which you’ll use to test your runner. If you have existing Github Actions in that repo, simply add runs-on: <your chosen installation name> Under any jobs you want on the self-hosted cluster. Alternatively, you can just download this demo-workflow.yaml file and place it under .github/workflows in the repo root directory:

demo-workflow.yml

0.3KB

In your GitHub repo, create a PAT (Personal Access Token) with permissions enabled for Repo and Workflow: https://github.com/settings/tokens

Keep the repo link and PAT token safe for the next step

4- Use Helm to install the basic Kubernetes resources


NAMESPACE="arc-systems"
helm install arc \
    --namespace "${NAMESPACE}" \
    --create-namespace \
    oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller

5- Connect K8s to your Github repo with the token and repo you created in step #3


INSTALLATION_NAME="arc-runner-set"
NAMESPACE="arc-runners"
GITHUB_CONFIG_URL="https://github.com/<your_enterprise/org/repo>"
GITHUB_PAT="<PAT Toen>"
helm install "${INSTALLATION_NAME}" \
    --namespace "${NAMESPACE}" \
    --create-namespace \
    --set githubConfigUrl="${GITHUB_CONFIG_URL}" \
    --set githubConfigSecret.github_token="${GITHUB_PAT}" \
    oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set

6- Open the Actions tab in your Github repo and run the workflow you just added

7- Quickly run kubectl get pods -n arc-runnersin your terminal to see a pod spin up which will execute your workflow

8- Once its done you can open the workflow back in the GitHub UI, open the logs and in the first step you can see what machine it ran on

And thats it! Congrats you’ve now got EKS running your workflows. The next steps will help to configure some specific changes based on various needs

Setup Guide - GitHub Actions Self Hosted Runners using AWS EKS