Setting up a Kubernetes Cluster in Amazon EKS using Terraform Nov 5 2022

If you check the AWS documentation, they use eksctl to create the EKS cluster. eksctl uses CloudFormation, and even if in the end, I could fetch the template, it feels like eksctl is an imperative way of creating an EKS Cluster. I prefer to keep track of all of my infrastructure as code, and using eksctl leaves an essential part of the infrastructure out of the codebase, the cluster itself.

I'll describe how to create a Kubernetes cluster in Amazon EKS using Terraform in this article.

Note: If you are interested in learning more about how to set up the directory structure for your Terraform project, you might find my guide, Meditations on Directory Structure for Terraform Projects, useful.

Create the EKS cluster

The eks_cluster resource is the one that creates the EKS cluster. It is a simple resource. Its required fields are name, role_arn and vpc_config.

With that information, we should be able to create the cluster. In reality, if we set up the eks_cluster resource, we'll have a Kubernetes cluster, but we still need to set up the infrastructure for the worker nodes. EKS manages the control plane, so once we've created the EKS cluster, AWS will handle the control plane for us, so we don't have to worry about that. But we are still responsible for setting up the worker nodes. We'll do that later. Let's start by exploring what each of the fields for the eks_cluster requires (except name because naming is the hardest part of programming and is out of this article's scope).

IAM role

A Kubernetes cluster needs to run on Nodes. Those nodes are EC2 instances, and they need to have permission to run the Kubernetes components. Those permissions are defined in an IAM role, and that role is attached to the EC2 instances that run the Kubernetes components.

The IAM role should allow the cluster to create those nodes (EC2 instances), create load balancers (Ingresses in k8s), manage auto scaling groups, etcetera. So we need to grant permissions to the EKS cluster to create, modify and update all the resources it needs. The good thing is that Amazon already made a policy we can use that handles the basics, arn:aws:iam::aws:policy/AmazonEKSClusterPolicy. You can see how the policy is defined if you log into your AWS console and search for it in the IAM section or by using the following command using the aws cli:

1
aws iam get-policy --policy-arn arn:aws:iam::aws:policy/AmazonEKSClusterPolicy

From there, you'll be able to see the DefaultVersionId. To view the default version of the policy, you can use the following command:

1
aws iam get-policy-version --policy-arn arn:aws:iam::aws:policy/AmazonEKSClusterPolicy --version-id v5

Ok, so let's create the IAM role and attach the policy. We can do that using the aws_iam_role and aws_iam_role_policy_attachment resources.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# EKS Cluster IAM Role
resource "aws_iam_role" "cluster" {
  name = "eks-cluster-role"

  assume_role_policy = data.aws_iam_policy_document.eks_assume_role_policy.json
}

data "aws_iam_policy_document" "eks_assume_role_policy" {
  statement {
    actions = ["sts:AssumeRole"]

    principals {
      type        = "Service"
      identifiers = ["eks.amazonaws.com"]
    }
  }
}

resource "aws_iam_role_policy_attachment" "cluster_AmazonEKSClusterPolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = aws_iam_role.cluster.name
}

With the IAM role out of the way, let's now look at the vpc_config.

VPC configuration and subnets

The VPC configuration has more moving parts. There are specific requirements that the VPC needs to meet for the cluster to work properly. The requirements are specified in the AWS Documentation - Amazon EKS VPC and subnet requirements and considerations. I won't rewrite what is already explained in the documentation, but there are a few items worth noting:

Those are the main ones. Other considerations include having enough IP addresses, etcetera, so check the documentation.

Let's define the VPC and the subnets. We'll use the aws_vpc and aws_subnet resources.

I won't go into detail about how to plan the CIDR blocks and how to do your subnets, but I'll show you the schema I'll use:

1
2
3
4
5
6
7
8
9
10
11
VPC CIDR:
10.0.0.0/18
  public_subnets = {
    a = "10.0.0.0/22"
    b = "10.0.4.0/22"
  }

  private_subnets = {
    a = "10.0.8.0/22"
    b = "10.0.12.0/22"
  }

If you are looking for a subnet calculator, you can use mine: https://rdicidr.rderik.com/. The observant reader will notice that I like palindromes, and RDICIDR is one! it stands for RDerik Interactive CIDR.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
data "aws_region" "current" {}

resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/18"

  # EKS requirements
  # The VPC must have DNS hostname and DNS resolution support. Otherwise, nodes can't register to your cluster.
  # https://docs.aws.amazon.com/eks/latest/userguide/network_reqs.html
  enable_dns_hostnames = true
  enable_dns_support   = true
}

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
}

resource "aws_default_route_table" "public" {
  default_route_table_id = aws_vpc.main.default_route_table_id


  route = [
    {
      cidr_block = "0.0.0.0/0"
      gateway_id = aws_internet_gateway.main.id

      # these seem to be required due to an AWS provider bug
      carrier_gateway_id         = ""
      destination_prefix_list_id = ""
      egress_only_gateway_id     = ""
      instance_id                = ""
      ipv6_cidr_block            = ""
      local_gateway_id           = ""
      nat_gateway_id             = ""
      network_interface_id       = ""
      transit_gateway_id         = ""
      vpc_endpoint_id            = ""
      vpc_endpoint_id            = ""
      vpc_peering_connection_id  = ""
    }
  ]
}

resource "aws_subnet" "public" {
  for_each = {
    a = "10.0.0.0/22"
    b = "10.0.4.0/22"
  }

  vpc_id                  = aws_vpc.main.id
  availability_zone       = "${data.aws_region.current.name}${each.key}"
  cidr_block              = each.value
  map_public_ip_on_launch = true
}

resource "aws_route_table_association" "public" {
  for_each       = aws_subnet.public
  subnet_id      = each.value.id
  route_table_id = aws_vpc.main.default_route_table_id
}

resource "aws_eip" "main" {
  for_each = aws_subnet.public
  vpc      = true
}

resource "aws_nat_gateway" "main" {
  for_each      = aws_subnet.public
  allocation_id = aws_eip.main[each.key].id
  subnet_id     = each.value.id
}


resource "aws_subnet" "private" {
  for_each = {
    a = "10.0.8.0/22"
    b = "10.0.12.0/22"
  }

  vpc_id            = aws_vpc.main.id
  availability_zone = "${data.aws_region.current.name}${each.key}"
  cidr_block        = each.value
}

resource "aws_route_table" "private" {
  for_each = aws_nat_gateway.main
  vpc_id   = aws_vpc.main.id

  route = [
    {
      cidr_block     = "0.0.0.0/0"
      nat_gateway_id = each.value.id

      # these seem to be required due to an AWS provider bug
      carrier_gateway_id         = ""
      destination_prefix_list_id = ""
      egress_only_gateway_id     = ""
      gateway_id                 = ""
      instance_id                = ""
      ipv6_cidr_block            = ""
      local_gateway_id           = ""
      network_interface_id       = ""
      transit_gateway_id         = ""
      vpc_endpoint_id            = ""
      vpc_peering_connection_id  = ""
    }
  ]
}

resource "aws_route_table_association" "private" {
  for_each       = aws_subnet.private
  subnet_id      = each.value.id
  route_table_id = aws_route_table.private[each.key].id
}

We now have a VPC with two public and two private subnets. The public subnets have a route to the internet through an internet gateway, and the private subnets have a route to the internet through a NAT gateway.

EKS Cluster security group

For the VPC configuration, we will need to provide the security groups for the EKS cluster. We'll use the aws_security_group resource. You will notice that in the following security group definition, we are referencing aws_securitygroup.eks_nodes.id, which hasn't been created yet. We'll create it later when we define the nodes. At the moment, assume that it'll be defined later.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# EKS Cluster Security Group
resource "aws_security_group" "eks_cluster" {
  name        = "eks-cluster-sg"
  description = "Cluster communication with worker nodes"
  vpc_id      = aws_vpc.main.id
}

resource "aws_security_group_rule" "cluster_inbound" {
  description              = "Allow worker nodes to communicate with the cluster API Server"
  from_port                = 443
  protocol                 = "tcp"
  security_group_id        = aws_security_group.eks_cluster.id
  source_security_group_id = aws_security_group.eks_nodes.id
  to_port                  = 443
  type                     = "ingress"
}

resource "aws_security_group_rule" "cluster_outbound" {
  description              = "Allow cluster API Server to communicate with the worker nodes"
  from_port                = 1024
  protocol                 = "tcp"
  security_group_id        = aws_security_group.eks_cluster.id
  source_security_group_id = aws_security_group.eks_nodes.id
  to_port                  = 65535
  type                     = "egress"
}

Now let's look at creating the nodes where the Kubernetes cluster will run.

Creating the worker nodes

We have a few different ways to set up the worker nodes. We could use two approaches:

We'll use Managed node groups. The reason is that it's easier to manage and the recommended way to do it.

Let's start by creating the IAM role that the nodes will use. We'll call it eks-node-role and attach the AmazonEKSWorkerNodePolicy and AmazonEKS_CNI_Policy policies to it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# EKS Node IAM Role
resource "aws_iam_role" "node" {
  name = "eks-node-role"

  assume_role_policy = data.aws_iam_policy_document.ec2_assume_role_policy.json
}

data "aws_iam_policy_document" "ec2_assume_role_policy" {
  statement {
    actions = ["sts:AssumeRole"]

    principals {
      type        = "Service"
      identifiers = ["ec2.amazonaws.com"]
    }
  }
}

Now we'll attach the policies to the role.

1
2
3
4
5
6
7
8
9
resource "aws_iam_role_policy_attachment" "node_AmazonEKSWorkerNodePolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = aws_iam_role.node.name
}

resource "aws_iam_role_policy_attachment" "node_AmazonEKS_CNI_Policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.node.name
}

If you plan to use the AWS Load Balancer controller, and you want to directly assign the policy to the nodes, we can do that now:

1
2
3
4
5
6
7
8
9
10
11
12
resource "aws_iam_policy" "alb_controller_policy" {
  name = "AlbControllerPolicy"
  # We are going to use the ALB controller implementation from the Kubernetes SIGs
  # the following policy is needed
  # Source: `curl -o alb_controller_policy.json https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.4.4/docs/install/iam_policy.json`
  policy = file("${path.module}/alb_controller_policy.json")
}

resource "aws_iam_role_policy_attachment" "node_alb_controller_policy" {
  policy_arn = aws_iam_policy.alb_controller_policy.arn
  role       = aws_iam_role.node.name
}

Now we'll create the security group for the nodes. We'll call it eks-node-sg, and we'll allow traffic from the security group of the control plane.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# EKS Node Security Group
resource "aws_security_group" "eks_nodes" {
  name        = "eks-node-sg"
  description = "Security group for all nodes in the cluster"
  vpc_id      = aws_vpc.main.id

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_security_group_rule" "nodes_internal" {
  description              = "Allow nodes to communicate with each other"
  from_port                = 0
  protocol                 = "-1"
  security_group_id        = aws_security_group.eks_nodes.id
  source_security_group_id = aws_security_group.eks_nodes.id
  to_port                  = 65535
  type                     = "ingress"
}

resource "aws_security_group_rule" "nodes_cluster_inbound" {
  description              = "Allow worker Kubelets and pods to receive communication from the cluster control plane"
  from_port                = 1025
  protocol                 = "tcp"
  security_group_id        = aws_security_group.eks_nodes.id
  source_security_group_id = aws_security_group.eks_cluster.id
  to_port                  = 65535
  type                     = "ingress"
}

resource "aws_security_group_rule" "nodes_cluster_outbound" {
  description              = "Allow worker Kubelets and pods to receive communication from the cluster control plane"
  from_port                = 1025
  protocol                 = "tcp"
  security_group_id        = aws_security_group.eks_nodes.id
  source_security_group_id = aws_security_group.eks_cluster.id
  to_port                  = 65535
  type                     = "egress"
}

And we finally have everything to create our EKS cluster.

Creating the EKS cluster:

We'll create the cluster using the eks_cluster resource. We'll call it eks-cluster and use everything we created before. But we also want to log events in CloudWatch for the cluster, so let's create the log group first:

1
2
3
4
5
resource "aws_cloudwatch_log_group" "cluster" {
  # Reference: https://docs.aws.amazon.com/eks/latest/userguide/control-plane-logs.html
  name              = "/eks/eks-cluster/cluster"
  retention_in_days = 0
}

Ok, now we are ready to put everything together and set up the EKS cluster:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# EKS Cluster
resource "aws_eks_cluster" "main" {
  name = "eks-cluster"

  # reference for log_types:
  # https://docs.aws.amazon.com/eks/latest/userguide/control-plane-logs.html
  enabled_cluster_log_types = ["api", "audit"]

  role_arn = aws_iam_role.cluster.arn

  # reference for EKS versions:
  # https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html
  version = "1.23"

  vpc_config {
    # Security Groups considerations reference:
    # https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html
    security_group_ids      = [aws_security_group.eks_cluster.id, aws_security_group.eks_nodes.id]
    subnet_ids              = concat(aws_subnet.public.*.id, aws_subnet.private.*.id)
    endpoint_private_access = true
    endpoint_public_access  = true
    public_access_cidrs     = ["0.0.0.0/0"]
  }

  depends_on = [
    aws_iam_role_policy_attachment.cluster_AmazonEKSClusterPolicy,
    aws_cloudwatch_log_group.cluster
  ]
}

Now we can happily terraform apply, and we'll have our EKS cluster up and running.

Configuriong kubectl to connect to the cluster

Now that we have our cluster up and running, we need to configure kubectl to connect to it. We can do that by running the following command:

1
aws eks --region us-east-1 update-kubeconfig --name eks-cluster

Modify the region and cluster name if you choose different ones.

Final thoughts

Using Terraform to create the EKS cluster seems more convoluted than just typing a couple of lines with eksctl, but it has the benefit of describing what was deployed and having it as part of our Infrastructure as Code.

I hope this article was helpful and shed some light on how to set up your initial EKS cluster.

References


** If you want to check what else I'm currently doing, be sure to follow me on twitter @rderik or subscribe to the newsletter. If you want to send me a direct message, you can send it to derik@rderik.com.