Setting up a Kubernetes Cluster in Amazon EKS using Terraform Nov 5 2022
If you check the AWS documentation, they use eksctl
to create the EKS cluster. eksctl
uses CloudFormation, and even if in the end, I could fetch the template, it feels like eksctl
is an imperative way of creating an EKS Cluster. I prefer to keep track of all of my infrastructure as code, and using eksctl
leaves an essential part of the infrastructure out of the codebase, the cluster itself.
I'll describe how to create a Kubernetes cluster in Amazon EKS using Terraform in this article.
Note: If you are interested in learning more about how to set up the directory structure for your Terraform project, you might find my guide, Meditations on Directory Structure for Terraform Projects, useful.
Create the EKS cluster
The eks_cluster
resource is the one that creates the EKS cluster. It is a simple resource. Its required fields are name
, role_arn
and vpc_config
.
name
is the name of the clusterrole_arn
is the ARN of the IAM role that the cluster will usevpc_config
includes the VPC ID and the subnets that the cluster will use
With that information, we should be able to create the cluster. In reality, if we set up the eks_cluster
resource, we'll have a Kubernetes cluster, but we still need to set up the infrastructure for the worker nodes. EKS manages the control plane, so once we've created the EKS cluster, AWS will handle the control plane for us, so we don't have to worry about that. But we are still responsible for setting up the worker nodes. We'll do that later. Let's start by exploring what each of the fields for the eks_cluster
requires (except name because naming is the hardest part of programming and is out of this article's scope).
IAM role
A Kubernetes cluster needs to run on Nodes. Those nodes are EC2 instances, and they need to have permission to run the Kubernetes components. Those permissions are defined in an IAM role, and that role is attached to the EC2 instances that run the Kubernetes components.
The IAM role should allow the cluster to create those nodes (EC2 instances), create load balancers (Ingresses in k8s), manage auto scaling groups, etcetera. So we need to grant permissions to the EKS cluster to create, modify and update all the resources it needs. The good thing is that Amazon already made a policy we can use that handles the basics, arn:aws:iam::aws:policy/AmazonEKSClusterPolicy
. You can see how the policy is defined if you log into your AWS console and search for it in the IAM section or by using the following command using the aws
cli:
1
aws iam get-policy --policy-arn arn:aws:iam::aws:policy/AmazonEKSClusterPolicy
From there, you'll be able to see the DefaultVersionId
. To view the default version of the policy, you can use the following command:
1
aws iam get-policy-version --policy-arn arn:aws:iam::aws:policy/AmazonEKSClusterPolicy --version-id v5
Ok, so let's create the IAM role and attach the policy. We can do that using the aws_iam_role
and aws_iam_role_policy_attachment
resources.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# EKS Cluster IAM Role
resource "aws_iam_role" "cluster" {
name = "eks-cluster-role"
assume_role_policy = data.aws_iam_policy_document.eks_assume_role_policy.json
}
data "aws_iam_policy_document" "eks_assume_role_policy" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["eks.amazonaws.com"]
}
}
}
resource "aws_iam_role_policy_attachment" "cluster_AmazonEKSClusterPolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
role = aws_iam_role.cluster.name
}
With the IAM role out of the way, let's now look at the vpc_config
.
VPC configuration and subnets
The VPC configuration has more moving parts. There are specific requirements that the VPC needs to meet for the cluster to work properly. The requirements are specified in the AWS Documentation - Amazon EKS VPC and subnet requirements and considerations. I won't rewrite what is already explained in the documentation, but there are a few items worth noting:
- The VPC must have at least two subnets that are in different availability zones
- The VPC must have DNS hostname and DNS resolution support.
Those are the main ones. Other considerations include having enough IP addresses, etcetera, so check the documentation.
Let's define the VPC and the subnets. We'll use the aws_vpc
and aws_subnet
resources.
I won't go into detail about how to plan the CIDR blocks and how to do your subnets, but I'll show you the schema I'll use:
1
2
3
4
5
6
7
8
9
10
11
VPC CIDR:
10.0.0.0/18
public_subnets = {
a = "10.0.0.0/22"
b = "10.0.4.0/22"
}
private_subnets = {
a = "10.0.8.0/22"
b = "10.0.12.0/22"
}
If you are looking for a subnet calculator, you can use mine: https://rdicidr.rderik.com/. The observant reader will notice that I like palindromes, and RDICIDR is one! it stands for RDerik Interactive CIDR.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
data "aws_region" "current" {}
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/18"
# EKS requirements
# The VPC must have DNS hostname and DNS resolution support. Otherwise, nodes can't register to your cluster.
# https://docs.aws.amazon.com/eks/latest/userguide/network_reqs.html
enable_dns_hostnames = true
enable_dns_support = true
}
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
}
resource "aws_default_route_table" "public" {
default_route_table_id = aws_vpc.main.default_route_table_id
route = [
{
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
# these seem to be required due to an AWS provider bug
carrier_gateway_id = ""
destination_prefix_list_id = ""
egress_only_gateway_id = ""
instance_id = ""
ipv6_cidr_block = ""
local_gateway_id = ""
nat_gateway_id = ""
network_interface_id = ""
transit_gateway_id = ""
vpc_endpoint_id = ""
vpc_endpoint_id = ""
vpc_peering_connection_id = ""
}
]
}
resource "aws_subnet" "public" {
for_each = {
a = "10.0.0.0/22"
b = "10.0.4.0/22"
}
vpc_id = aws_vpc.main.id
availability_zone = "${data.aws_region.current.name}${each.key}"
cidr_block = each.value
map_public_ip_on_launch = true
}
resource "aws_route_table_association" "public" {
for_each = aws_subnet.public
subnet_id = each.value.id
route_table_id = aws_vpc.main.default_route_table_id
}
resource "aws_eip" "main" {
for_each = aws_subnet.public
vpc = true
}
resource "aws_nat_gateway" "main" {
for_each = aws_subnet.public
allocation_id = aws_eip.main[each.key].id
subnet_id = each.value.id
}
resource "aws_subnet" "private" {
for_each = {
a = "10.0.8.0/22"
b = "10.0.12.0/22"
}
vpc_id = aws_vpc.main.id
availability_zone = "${data.aws_region.current.name}${each.key}"
cidr_block = each.value
}
resource "aws_route_table" "private" {
for_each = aws_nat_gateway.main
vpc_id = aws_vpc.main.id
route = [
{
cidr_block = "0.0.0.0/0"
nat_gateway_id = each.value.id
# these seem to be required due to an AWS provider bug
carrier_gateway_id = ""
destination_prefix_list_id = ""
egress_only_gateway_id = ""
gateway_id = ""
instance_id = ""
ipv6_cidr_block = ""
local_gateway_id = ""
network_interface_id = ""
transit_gateway_id = ""
vpc_endpoint_id = ""
vpc_peering_connection_id = ""
}
]
}
resource "aws_route_table_association" "private" {
for_each = aws_subnet.private
subnet_id = each.value.id
route_table_id = aws_route_table.private[each.key].id
}
We now have a VPC with two public and two private subnets. The public subnets have a route to the internet through an internet gateway, and the private subnets have a route to the internet through a NAT gateway.
EKS Cluster security group
For the VPC configuration, we will need to provide the security groups for the EKS cluster. We'll use the aws_security_group
resource. You will notice that in the following security group definition, we are referencing aws_securitygroup.eks_nodes.id
, which hasn't been created yet. We'll create it later when we define the nodes. At the moment, assume that it'll be defined later.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# EKS Cluster Security Group
resource "aws_security_group" "eks_cluster" {
name = "eks-cluster-sg"
description = "Cluster communication with worker nodes"
vpc_id = aws_vpc.main.id
}
resource "aws_security_group_rule" "cluster_inbound" {
description = "Allow worker nodes to communicate with the cluster API Server"
from_port = 443
protocol = "tcp"
security_group_id = aws_security_group.eks_cluster.id
source_security_group_id = aws_security_group.eks_nodes.id
to_port = 443
type = "ingress"
}
resource "aws_security_group_rule" "cluster_outbound" {
description = "Allow cluster API Server to communicate with the worker nodes"
from_port = 1024
protocol = "tcp"
security_group_id = aws_security_group.eks_cluster.id
source_security_group_id = aws_security_group.eks_nodes.id
to_port = 65535
type = "egress"
}
Now let's look at creating the nodes where the Kubernetes cluster will run.
Creating the worker nodes
We have a few different ways to set up the worker nodes. We could use two approaches:
- Self-managed nodes - We would need to manage the nodes ourselves using Auto Scaling Groups and all that it entails.
- Managed node groups - "Amazon EKS managed node groups automate the provisioning and lifecycle management of nodes (Amazon EC2 instances) for Amazon EKS Kubernetes clusters." Amazon EKS nodes
We'll use Managed node groups. The reason is that it's easier to manage and the recommended way to do it.
Let's start by creating the IAM role that the nodes will use. We'll call it eks-node-role
and attach the AmazonEKSWorkerNodePolicy
and AmazonEKS_CNI_Policy
policies to it.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# EKS Node IAM Role
resource "aws_iam_role" "node" {
name = "eks-node-role"
assume_role_policy = data.aws_iam_policy_document.ec2_assume_role_policy.json
}
data "aws_iam_policy_document" "ec2_assume_role_policy" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["ec2.amazonaws.com"]
}
}
}
Now we'll attach the policies to the role.
1
2
3
4
5
6
7
8
9
resource "aws_iam_role_policy_attachment" "node_AmazonEKSWorkerNodePolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
role = aws_iam_role.node.name
}
resource "aws_iam_role_policy_attachment" "node_AmazonEKS_CNI_Policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
role = aws_iam_role.node.name
}
If you plan to use the AWS Load Balancer controller, and you want to directly assign the policy to the nodes, we can do that now:
1
2
3
4
5
6
7
8
9
10
11
12
resource "aws_iam_policy" "alb_controller_policy" {
name = "AlbControllerPolicy"
# We are going to use the ALB controller implementation from the Kubernetes SIGs
# the following policy is needed
# Source: `curl -o alb_controller_policy.json https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.4.4/docs/install/iam_policy.json`
policy = file("${path.module}/alb_controller_policy.json")
}
resource "aws_iam_role_policy_attachment" "node_alb_controller_policy" {
policy_arn = aws_iam_policy.alb_controller_policy.arn
role = aws_iam_role.node.name
}
Now we'll create the security group for the nodes. We'll call it eks-node-sg
, and we'll allow traffic from the security group of the control plane.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# EKS Node Security Group
resource "aws_security_group" "eks_nodes" {
name = "eks-node-sg"
description = "Security group for all nodes in the cluster"
vpc_id = aws_vpc.main.id
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_security_group_rule" "nodes_internal" {
description = "Allow nodes to communicate with each other"
from_port = 0
protocol = "-1"
security_group_id = aws_security_group.eks_nodes.id
source_security_group_id = aws_security_group.eks_nodes.id
to_port = 65535
type = "ingress"
}
resource "aws_security_group_rule" "nodes_cluster_inbound" {
description = "Allow worker Kubelets and pods to receive communication from the cluster control plane"
from_port = 1025
protocol = "tcp"
security_group_id = aws_security_group.eks_nodes.id
source_security_group_id = aws_security_group.eks_cluster.id
to_port = 65535
type = "ingress"
}
resource "aws_security_group_rule" "nodes_cluster_outbound" {
description = "Allow worker Kubelets and pods to receive communication from the cluster control plane"
from_port = 1025
protocol = "tcp"
security_group_id = aws_security_group.eks_nodes.id
source_security_group_id = aws_security_group.eks_cluster.id
to_port = 65535
type = "egress"
}
And we finally have everything to create our EKS cluster.
Creating the EKS cluster:
We'll create the cluster using the eks_cluster
resource. We'll call it eks-cluster
and use everything we created before. But we also want to log events in CloudWatch for the cluster, so let's create the log group first:
1
2
3
4
5
resource "aws_cloudwatch_log_group" "cluster" {
# Reference: https://docs.aws.amazon.com/eks/latest/userguide/control-plane-logs.html
name = "/eks/eks-cluster/cluster"
retention_in_days = 0
}
Ok, now we are ready to put everything together and set up the EKS cluster:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# EKS Cluster
resource "aws_eks_cluster" "main" {
name = "eks-cluster"
# reference for log_types:
# https://docs.aws.amazon.com/eks/latest/userguide/control-plane-logs.html
enabled_cluster_log_types = ["api", "audit"]
role_arn = aws_iam_role.cluster.arn
# reference for EKS versions:
# https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html
version = "1.23"
vpc_config {
# Security Groups considerations reference:
# https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html
security_group_ids = [aws_security_group.eks_cluster.id, aws_security_group.eks_nodes.id]
subnet_ids = concat(aws_subnet.public.*.id, aws_subnet.private.*.id)
endpoint_private_access = true
endpoint_public_access = true
public_access_cidrs = ["0.0.0.0/0"]
}
depends_on = [
aws_iam_role_policy_attachment.cluster_AmazonEKSClusterPolicy,
aws_cloudwatch_log_group.cluster
]
}
Now we can happily terraform apply
, and we'll have our EKS cluster up and running.
Configuriong kubectl
to connect to the cluster
Now that we have our cluster up and running, we need to configure kubectl
to connect to it. We can do that by running the following command:
1
aws eks --region us-east-1 update-kubeconfig --name eks-cluster
Modify the region and cluster name if you choose different ones.
Final thoughts
Using Terraform to create the EKS cluster seems more convoluted than just typing a couple of lines with eksctl
, but it has the benefit of describing what was deployed and having it as part of our Infrastructure as Code.
I hope this article was helpful and shed some light on how to set up your initial EKS cluster.
References
- AWS Documentation - Amazon EKS VPC and subnet requirements and considerations
- AWS Load Balancer controller
- AWS documentation - EKS create cluster
- Amazon - EKS nodes
- CIDR Calculator - https://rdicidr.rderik.com/
- Terraform resoruce -
aws_vpc
- Terraform resource -
aws_iam_role
andaws_iam_role_policy_attachment
- Terraform resource -
aws_security_group
- Terraform resource -
eks_cluster
- Terraform resource -
eks_cluster
- Terraform resource -
aws_subnet