Testing Terraform code with Go and Terratest
by Pedro Santos
April 18, 2022
As a cloud engineer, I love Terraform. With Terraform, I don’t have to worry about keeping track of infrastructure changes or compute dependencies between each component. Terraform is also cloud-agnostic, so all the Terraform knowledge I’ve accrued over the years can quickly transfer between cloud providers and even into Kubernetes clusters.
While Terraform protects the user against many common mistakes, errors still creep up. An error I’ve encountered many times was a network security group misconfiguration that prevented VMs from communicating inside a Vnet. The Terraform code was syntactically correct but did not work as intended.
Given a spec, it’s valuable to check whether your infrastructure-as-code (IaC) works as expected. For example, you’d like to make sure you can reach your public-facing webservers from the internet, but not your databases. Or perhaps ensure your network security rules do not expose SSH/RDP ports to the internet.
This blog post will walk you through a methodology for testable IaC and provide an example implementation to get you started.
Terratest
I’ve chosen Terratest to define my IaC test. In essence, it wraps Terraform’s CLI interface into a Go API. Terratest allows you to programmatically pass inputs as variables to Terraform and retrieve outputs from the Terraform state.
It’s important to note that his approach is not exclusive to Go. For example, pytest-terraform implements a similar functionality in python.
Overall, it’s the team’s choice what tool they should use for testing infrastructure. This blog post’s advice is still relevant no matter the library used and the cloud provider targeted.
High-level description of a testable IaC
The testing framework will follow this script:
- Define a test fixture in Terraform referencing the code.
- Use Terratest to pass in inputs to the test fixture.
- Use Terratest to init and apply the test fixture to a lab subscription.
- Use Terratest to retrieve the outputs from the test fixture.
- Take the Terraform outputs and use them to retrieve the actual behavior of the deployed resources. Retrieving the cloud resources can be done using either the cloud’s SDK or connecting directly to compute resources, for example, SSH’ing into a VM.
- Verify the actual behavior matches with the expected behavior.
This methodology has some similarities to unit testing, but they behave more like integration tests or end-to-end tests. The tests tend to run for upwards of 30 minutes, create cloud resources, and are susceptible to cloud errors or network issues. Due to the long deployment times, you might not test all possible combinations of inputs. You must carefully choose what tests to run and maximize the number of tests run in parallel.
Example
The following is an example deployment with some tests. By no means this represents a real workload, nor it follows best practices of Go/Terraform. It is, however, a helpful template to get you started. We’ll be using the Azure cloud for this example, but this advice will work for any cloud provider.
Consider the task of implementing a network solution for a generic workload with the following constraints:
- The network must be in the
West Europe
region. - The network must be in the CIDR block
10.20.0.0/16
. - The network must be inside a new resource group.
- The resource group’s name must be in the format
${WORKLOAD}-rg
, where WORKLOAD is a user-input.
Deciding what and how to test is an art in itself. For this example, we will focus on testing if we are meeting the requirements.
We’ll organize the code in the following structure:
example_network_module
├── src
│ └── main.tf
└── test
└── spec_test.go
According to the spec, we defined src/main.tf
as such:
provider "azurerm" {
version = "=2.20.0"
features {}
}
data "azurerm_subscription" "current" {}
variable "workload_name" {
type = string
}
resource "azurerm_resource_group" "rg" {
name = "${var.workload_name}-rg"
location = "West Europe"
}
resource "azurerm_virtual_network" "vnet" {
name = "${var.workload_name}-vnet"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
address_space = ["10.20.0.0/16"]
}
output "resource_group_name" {
value = azurerm_resource_group.rg.name
}
output "virtual_network_name" {
value = azurerm_virtual_network.vnet.name
}
output "subscription_id" {
value = data.azurerm_subscription.current.subscription_id
}
So far, so good. The code creates a resource group with the appropriate naming scheme and in the right region. Inside that resource group, the code also creates a network with a suitable CIDR block.
package test
import (
"context"
"testing"
"github.com/Azure/azure-sdk-for-go/services/network/mgmt/2020-03-01/network"
"github.com/Azure/azure-sdk-for-go/services/resources/mgmt/2015-11-01/resources"
"github.com/gruntwork-io/terratest/modules/azure"
"github.com/gruntwork-io/terratest/modules/terraform"
"github.com/stretchr/testify/assert"
)
func TestSpecs(t *testing.T) {
t.Parallel()
tfOpts := &terraform.Options{
TerraformDir: "../src",
Vars: map[string]interface{}{
"workload_name": "terratest",
},
}
defer terraform.Destroy(t, tfOpts)
terraform.InitAndApply(t, tfOpts)
resourceGroupName := terraform.Output(t, tfOpts, "resource_group_name")
virtualNetworkName := terraform.Output(t, tfOpts, "virtual_network_name")
subscriptionID := terraform.Output(t, tfOpts, "subscription_id")
assert.Equal(t, "terratest-rg", resourceGroupName)
authorizer, err := azure.NewAuthorizer()
if err != nil {
assert.FailNow(t, "Cannot create authorizer")
}
// Test Location
resourceGroupClient := resources.NewGroupsClient(subscriptionID)
resourceGroupClient.Authorizer = *authorizer
resourceGroup, err := resourceGroupClient.Get(
context.Background(), resourceGroupName,
)
if err != nil {
t.Log(err)
assert.FailNow(t, "Cannot get resource group")
}
assert.Equal(
t,
"westeurope", *resourceGroup.Location,
"Location must be West Europe",
)
// Test network CIDR block
virtualNetworkClient := network.NewVirtualNetworksClient(subscriptionID)
virtualNetworkClient.Authorizer = *authorizer
virtualNetwork, err := virtualNetworkClient.Get(
context.Background(), resourceGroupName, virtualNetworkName, "",
)
if err != nil {
assert.FailNow(t, "Cannot get network")
}
assert.Equal(
t,
"10.20.0.0/16", (*virtualNetwork.AddressSpace.AddressPrefixes)[0],
"Network must be in the 10.20.0.0/16 block",
)
}
Depending on the cloud provider, it may be necessary to authenticate the user account.
In Azure, the easiest way is to install the az
CLI tool and login with the command:
$ az login
Make sure the subscription you’ll use is a non-production one. On Azure, you can set the default subscription with the following command:
$ az account set -subscription "My-test-subscription"
On the test
folder, install the Go dependencies:
# -t is for installing test dependencies; -v is for verbose output
$ go get -t -v ./...
Finally, run the tests with:
$ go test -timeout 30m -parallel 10
and confirm that our implementation satisfies the requirements.
I’ve hosted the complete example here.
Conclusion
Like business code, building testable infrastructure creates more straightforward and modular code. Good tests prevent a broader class of infrastructure errors and give the infrastructure team (and their stakeholders) more confidence when deploying IaC.
Unfortunately, testable infrastructure is not without disadvantages. Adding a testing methodology means the infrastructure team now has to know an additional programming language on top of Terraform. Testable infrastructure is also harder to change and takes longer to develop. In the example from the last chapter, adding tests doubled the number of lines of code.
I believe, in most situations, the tradeoff leans towards having at least a few tests. If your team wants to embrace testable IaC, make sure you discuss the choice with the stakeholders and stress the need to balance the speed of development and quality-of-service.
Happy coding!