Effective Terraform and Terragrunt Practices for Cloud Infrastructure Management on GCP
To effectively manage cloud infrastructure on GCP using Terraform and Terragrunt, it’s crucial to adopt a structured and well-defined approach. Employing modularity, version control, and variable usage streamlines infrastructure provisioning and configuration. Additionally, leveraging remote state storage facilitates collaboration and consistency among team members. By adhering to these practices, organizations can achieve efficient and scalable cloud infrastructure management on GCP.
This blog is a collaboration of Rahul Kumar Singh and Rohan Singh, Senior Cloud Infrastructure Engineer at SADA comes with wonderful experience in Google Cloud, DevOps, Infrastructure, and Automation. Also, he is a Google Cloud Champion Innovator for Modern Architecture and actively contributes to the Google Cloud community. We have plans to write more blogs in the future. Right now we are also working together on a company project as well as writing technical blogs in collaboration.
Terraform
The best way, I repeat, the best way to codify your infrastructure is via Terraform. Terraform allows us to manage and provision infrastructure via declarative syntax and maintains its state. This means that you can describe your desired infrastructure state, and Terraform will take care of provisioning and managing the resources to match that state.
Although there are some pain points that we as infrastructure engineers face when it comes to provision and managing multiple infra environments correspondingly their states and the count of similar resource types. Running multiple terraform commands to manage different infra simultaneously is not possible due to the locking system and there is no way to parameterize the remote backends (following DRY practice).
Terragrunt
To save our day, Gruntwork.io engineers have designed Terragrunt, an assistive Terraform thin Wrapper tool assisting in keeping our configurations DRY, working with multiple Terraform modules, and managing remote states.
What does Thin Wrapper mean???
Let’s understand it this way, Terragrunt in itself can never deploy anything and it needs Terraform code to manage and work with resources. In simple English Terragrunt complements Terraform. Terragrunt acts as an external module(similar to the terraform parent module) that refers to external public repositories.
Think both this way-
Terraform: Think of a recipe book that provides detailed instructions on preparing a specific dish.
Terragrunt: Imagine a collection of recipe cards, each representing a step in the cooking process. You can combine these cards to create different recipes or modify them for specific dietary needs.
Note: We assume that the reader has an understanding of Terraform and Terragrunt.
The article will walk you through the standardized approach of implementing Terraform and Terragrunt together.
Let’s get started
- First and foremost, before you start any terragrunt activities ensure you referring to a standardized terraform module, either the public Google Cloud one or your own.
- If an organization has compliance and guidelines for code standards to follow, the recommendation is to create and manage your own Terraform Modules like this one. You can take inspiration from Google Cloud modules and tweak them accordingly like needed something out of the box.
- Design your own logic in locals.tf like passing one variable value that sets the name of the VM instance, the custom Service Account, the public IP of the VM, additional disks, etc with suffixes or prefixes.
#Example Terraform locals.tf
external_ip = var.create_external_ip == true ? google_compute_address.gce_static_ip.0.address : null
instance_name = var.instance_name
sa_id = format("%s-sa", var.instance_name)
device_name = format("%s-%s", var.instance_name, “disk”)
network_tags = tolist(toset(var.network_tags))
region = data.google_client_config.google_client.region
zone = format("%s-%s", local.region, var.zone)
- Keep your code as dynamic as you can without messing it up (should cater to different desires/requirements)
How should your terragrunt directory look?
- Start with environmentally segregated folders like dev, qa, preprod, prod. This allows a better isolation of resources and state per environment.
- Create a dedicated directory out of the environment folder that consolidates all of the providers (entails 3rd parties), remote state, and other generic configuration files. This benefits by separating important configuration files at a central location which is referred to by all the aforementioned environments without duplicating the files hence following the DRY (Don’t Repeat Yourself) principle.
The below picture will be referenced for the new couple of points:
- Zooming in one of the environments, taking prod env as an example here. Within an env, you can have more segregation meaning multiple cloud providers like AWS, and GCP infra code under the same env folder. This is for those who have a hybrid setup if you don’t have a hybrid setup then let’s assume you only have one cloud provider let’s say GCP. [Refer to number 2 in the above picture]
- Within the GCP folder for any environment, you can organize various subfolders based on the associated projects. If you require project-specific state storage or prefer centralized storage, maintain a single folder and navigate within it. [Refer to number 3 in the above picture]
- We would recommend segregating state per project for ease of maintaining resources per project. The reason for this recommendation is to avoid mixing your network components code with infra code(GCE, GKE, GCS, etc). [Refer to number 4 in the above picture]
- Further, you can have region-wise segregation of your resources for better maintenance and understanding. [Refer to number 5 in the above picture]
- Finally, you can main resources(GCE, GKE, GCS, etc) folder here with their respective terragrunt.hcl files. If you need the same resource multiple times you can have various subfolders within the main resources folder. [Refer to number 6 in the above picture]
Apart from these directories, we would also need to have some supporting hcl files at various levels. Let’s see some example
- Let’s say you want to have standard common tags and labels (for example “provisioned by = terragrunt”) in all the resources irrespective of environment, you can have one file in the environment folder at the environment level where you pass all labels and tags. Also from here, the cloudsdk values are being passed that would be utilized by different clouds.
#example common-to-all-env.hcl file
locals {
cloudsdk = {
global_config_dir = run_cmd("--terragrunt-quiet", "gcloud", "info", "--format", "value(config.paths.global_config_dir)")
}
labels = {
provisioned-by = "tf-tg"
owned-by = "infra-team",
}
}
- Now we have common tags and labels, every env one is also required. For that, you can have one file within each env folder containing env-specific code for example “env = prod”. This file content can also be utilized if you need an env value in your resource nomenclature.
#example environment-specific.hcl file
locals {
env = prod
labels = {
env = prod
}
}
- For setting up the value of the cloud provider (for example, provider = “GCP”) which would be called eventually, you can have this code in a standalone file in a specific cloud directory
#example cloud-provider.hcl file
locals {
provider = "GCP"
}
- To centralize the value of your GCP project you can one this set in a standalone called project.hcl file located per GCP project folder.
#example gcp-project.hcl
locals {
gcp_project_id = "prj-loki"
network_name = "odin-shared-vpc"
shared_vpc_project_id = "prj-odin-shared-vpc"
}
- To centralize the value of region you can one this set in a standalone called region.hcl file located per region folder.
#example gcp-region.hcl file
locals {
gcp_region = "us-central1"
}
Below is a demo code for creating a GCE Instance via terragrunt and Terraform. The one-line explanation is written along with the code in the comments.
#source repo link of the terraform module, could be a public repo or private repo
terraform {
source = "git::ssh://git@github.com/enter/terraform/gce/module.git//folder/if/any?ref=v1.1.1"
}
#dependency of VPC network, required for GCE instance. Notice the way we are passing the value.
dependency "vpc" {
config_path = "../../../../../${include.root.inputs.shared_vpc_project_id}/${include.root.inputs.gcp_region}/vpc/network"
mock_outputs = {
network = {
name = "dummy-network-name"
}
}
mock_outputs_allowed_terraform_commands = ["validate", "init", "plan"]
}
#dependency of hashicorp vault, coming from the configuration folder
include "hashicorp-vault" {
path = "${dirname(find_in_parent_folders())}/snippets/hashicorp-vault.hcl"
expose = true
}
#GCP (and gcp beta) provider dependency, remote state and other values
include "gcp" {
path = "${dirname(find_in_parent_folders())}/snippets/gcp.hcl"
}
#If you have the following folder structure, and the following contents for ./child/terragrunt.hcl, this will include
# and merge the items in the terragrunt.hcl file at the root.
#
# .
# ├── terragrunt.hcl
# └── child
# └── terragrunt.hcl
include "root" {
path = find_in_parent_folders()
expose = true
}
#values, notice most of the values we are referring to are coming from the locals of the additional files hence the base terragrunt files of specific resources in different env folder would be the same
inputs = {
gcp_project_id = include.root.inputs.gcp_project_id
network_name = dependency.network.outputs.network.name
vpc_subnetwork_name = "projects/${include.root.inputs.shared_vpc_project_id}/regions/${include.root.inputs.gcp_region}/subnetworks/odin-subnetwork"
environment = include.root.inputs.env
region = include.root.inputs.gcp_region
instance_name = local.name
boot_disk_device_name = local.name
labels = merge(
{
component = "${local.name}-gce"
},
include.root.inputs.labels,
)
}
locals {
name = basename(get_terragrunt_dir())
}
Outro
This is purely subjective to our best understanding. We have curated this list based on our experiences with different categories of clients over the years and discussed this with numerous teams of engineers. The goal of this article is to enlighten people to understand the importance of following standards practice for infrastructure from the preliminary stage. It is important to note that this article is not exhaustive and your specific needs may vary.
Read my other published blog here:
Read Rohan’s blog:
Connect with us on LinkedIn: Rahul Kumar Singh and Rohan Singh