Refactoring Terraform into Modules Safely
I worked on an older Terraform project that managed a full EKS stack: VPC, cluster, node groups, storage, and around ten Kubernetes add-ons. The whole stack lived in one main.tf file with more than a thousand lines.
It worked. It was also hard to change. Adding one add-on meant scrolling through networking, IAM, EKS, and Helm resources just to find the right place.
.
├── main.tf # 1000+ lines: VPC, EKS, add-ons, storage
├── variables.tf
├── outputs.tf
├── providers.tf
├── backend.tf
└── vars.tfvars
I wanted modules, but I did not want Terraform to recreate a live VPC with an EKS cluster attached to it. The refactor had to move code and state together.
The Risk
Terraform tracks resources by state address.
# before
aws_vpc.main
# after
module.vpc.aws_vpc.main
Those two addresses point to the same real VPC, but Terraform does not know that unless you tell it. If you move the resource block into a module and run terraform apply, Terraform can plan a destroy/create because the old address disappeared and a new address appeared.
For an EKS foundation, that plan tears down the base layer.
Use terraform state mv to rename the state address without touching the real infrastructure:
- Copy resources into a module.
- Add the module call in the root.
- Move each old state address to its new module address.
- Run
terraform plan. - Continue only when the plan says no changes.
Pick Module Boundaries
Start with lifecycle and dependencies, not folder names.
Group one concern. VPC resources belong together: aws_vpc, subnets, route tables, internet gateway, NAT gateway, and EIPs.
Keep shared lifecycle together. NAT gateways and their EIPs usually move as one unit.
Cut dependencies in one direction. Sketch the module flow:
module "vpc" -> module "eks" -> module "addons"
# vpc_id subnet_ids cluster_name
# subnet_ids oidc_arn oidc_arn
Each arrow should map to an output on the left and an input on the right. If two groups need values from each other, they probably belong in the same module.
For this stack, I split the code into:
vpc: networking resourceseks: cluster, node groups, OIDC, cluster IAMaddons: controllers, monitoring add-ons, Kubernetes manifestsstorage: storage classes, EBS/EFS-related resources
Audit State First
Before writing modules, list the current state:
terraform state list
Every managed resource needs a destination address. Build a mapping table before you run any state mv.
For the VPC module, the mapping looked like this:
| Old | New |
|---|---|
aws_vpc.main | module.vpc.aws_vpc.main |
aws_internet_gateway.igw[0] | module.vpc.aws_internet_gateway.igw[0] |
aws_eip.eip[0] | module.vpc.aws_eip.eip[0] |
aws_nat_gateway.nat[0] | module.vpc.aws_nat_gateway.nat[0] |
aws_subnet.private[0] | module.vpc.aws_subnet.private[0] |
aws_subnet.private[1] | module.vpc.aws_subnet.private[1] |
aws_subnet.private[2] | module.vpc.aws_subnet.private[2] |
aws_subnet.public[0] | module.vpc.aws_subnet.public[0] |
aws_subnet.public[1] | module.vpc.aws_subnet.public[1] |
aws_subnet.public[2] | module.vpc.aws_subnet.public[2] |
Data sources (data.*) do not need state moves. Terraform reads them during planning. If several modules need the same data source, keep it in the root and pass values down as variables. If only one module uses it, put it inside that module.
Write One Module
Move one module at a time. Do not move VPC, EKS, add-ons, and storage in one commit.
Each module gets the same three files:
modules/vpc/
├── main.tf
├── variables.tf
└── outputs.tf
Copy the resource blocks exactly. Do not rename resources, change tags, clean up variables, or improve naming during the move. Save those changes for a later commit after the state move produces a clean plan.
Example VPC module:
# modules/vpc/variables.tf
variable "region" { type = string }
variable "env" { type = string }
variable "eks_name" { type = string }
variable "vpc_cidr_block" { type = string }
variable "private_subnets" { type = list(string) }
variable "public_subnets" { type = list(string) }
variable "enable_nat_gateway" { type = bool; default = true }
variable "enable_dns_support" { type = bool; default = true }
variable "enable_dns_hostnames" { type = bool; default = true }
# modules/vpc/main.tf
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr_block
enable_dns_support = var.enable_dns_support
enable_dns_hostnames = var.enable_dns_hostnames
tags = {
Name = "${var.region}-${var.env}-${var.eks_name}-vpc"
}
}
resource "aws_internet_gateway" "igw" {
count = 1
vpc_id = aws_vpc.main.id
tags = { Name = "${var.env}-${var.eks_name}-igw" }
}
resource "aws_eip" "eip" {
count = var.enable_nat_gateway ? 1 : 0
domain = "vpc"
}
resource "aws_nat_gateway" "nat" {
count = var.enable_nat_gateway ? 1 : 0
allocation_id = aws_eip.eip[0].id
subnet_id = aws_subnet.public[0].id
depends_on = [aws_internet_gateway.igw[0]]
}
resource "aws_subnet" "private" {
count = length(var.private_subnets)
vpc_id = aws_vpc.main.id
cidr_block = var.private_subnets[count.index]
availability_zone = "${var.region}${["a", "b", "c"][count.index]}"
tags = { Name = "${var.env}-${var.eks_name}-private-${count.index}" }
}
resource "aws_subnet" "public" {
count = length(var.public_subnets)
vpc_id = aws_vpc.main.id
cidr_block = var.public_subnets[count.index]
availability_zone = "${var.region}${["a", "b", "c"][count.index]}"
tags = { Name = "${var.env}-${var.eks_name}-public-${count.index}" }
}
# modules/vpc/outputs.tf
output "vpc_id" {
value = aws_vpc.main.id
}
output "private_subnet_ids" {
value = aws_subnet.private[*].id
}
output "all_subnet_ids" {
value = concat(
aws_subnet.private[*].id,
aws_subnet.public[*].id
)
}
Declare Non-HashiCorp Providers
Terraform assumes providers come from the hashicorp namespace unless you say otherwise. If a module uses a provider like gavinbunney/kubectl, declare it inside the module.
# modules/addons/providers.tf
terraform {
required_providers {
kubectl = {
source = "gavinbunney/kubectl"
version = "~> 1.19.0"
}
}
}
Do not configure the provider inside the module. Keep region, credentials, cluster endpoint, and token settings in the root. The module declares what it needs, and the root passes the configured provider in.
module "addons" {
source = "./modules/addons"
providers = {
kubectl = kubectl
}
}
Update the Root Module
Replace the moved resource blocks with a module call. Then update references to use module outputs.
# main.tf
module "vpc" {
source = "./modules/vpc"
region = var.region
env = var.env
eks_name = var.eks_name
vpc_cidr_block = var.vpc_cidr_block
private_subnets = var.private_subnets
public_subnets = var.public_subnets
enable_nat_gateway = var.enable_nat_gateway
}
module "eks" {
source = "./modules/eks"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.all_subnet_ids
}
module "addons" {
source = "./modules/addons"
providers = {
kubectl = kubectl
}
cluster_name = module.eks.cluster_name
oidc_provider_arn = module.eks.oidc_provider_arn
}
Run terraform init after adding module sources.
terraform init
Move the State
Move every old address to its new module address.
terraform state mv \
aws_vpc.main \
module.vpc.aws_vpc.main
terraform state mv \
'aws_subnet.private[0]' \
'module.vpc.aws_subnet.private[0]'
terraform state mv \
'aws_nat_gateway.nat[0]' \
'module.vpc.aws_nat_gateway.nat[0]'
Keep indexes and keys exactly as Terraform lists them.
aws_subnet.private[0]
module.vpc.aws_subnet.private[0]
For for_each resources with string keys, quote the full address:
terraform state mv \
'aws_iam_role_policy_attachment.this["AmazonEKSWorkerNodePolicy"]' \
'module.eks.aws_iam_role_policy_attachment.this["AmazonEKSWorkerNodePolicy"]'
For repeated resources, script the move. Add set -e so the script stops on the first failed move.
#!/bin/bash
set -e
for key in $(terraform state list | grep 'aws_subnet.private'); do
suffix=$(echo "$key" | sed 's/aws_subnet.private//')
terraform state mv \
"aws_subnet.private${suffix}" \
"module.vpc.aws_subnet.private${suffix}"
done
Run the script for one module, then plan before touching the next module.
Expect a No-Change Plan
After the state moves, run:
terraform plan
You want this output:
No changes. Your infrastructure matches the configuration.
If Terraform shows changes, stop. Do not apply. Find the resource with a diff and trace the changed input back to the module call or output. The common causes are:
- A tag changed during the copy.
- A default value differs between root and module variables.
- A list order changed.
- A reference now points to a different output.
- A provider alias did not get passed into the module.
Once the plan is clean, commit the module move. Then repeat the same process for the next module.
The final structure looked like this:
.
├── main.tf # module calls only
├── variables.tf
├── outputs.tf
├── providers.tf
├── backend.tf
└── modules/
├── vpc/
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
├── eks/
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
└── addons/
├── main.tf
├── variables.tf
├── outputs.tf
└── providers.tf
The folder layout helped, but the safety came from the process: move code and state together, one module at a time, and require a no-change plan before moving on.
