Terraform is the most popular tools in the market for Infrastructure as Code (IaC). It provides excellent solution for DevOps engineers to automate various infrastructure tasks.
Having infrastructure provisioned via code provides high level of stability, consistency and speed of building or changing infrastructure. It also gives better visibility of the changes being applied.
There are plenty of benefits of using Terraform for your IaC configuration, for example:
- Provision infrastructure across different cloud providers with minimal adjustments. That simplifies management of large scale and multi cloud infrastructures.
- Efficiently deploy, release, scale, and monitor infrastructure for multi-tier applications. N-tier application architecture lets you scale application components independently and provides a separation of concerns.
- At a large organization, terraform can be the best solution for repetitive infrastructure requests. You can use Terraform to build a “self-serve” infrastructure model that lets product teams manage their own infrastructure independently. You can create and use Terraform modules that codify the standards for deploying and managing services in your organization, allowing teams to efficiently deploy services in compliance with your organization’s practices. Terraform Cloud can also integrate with ticketing systems like ServiceNow to automatically generate new infrastructure requests.
- Enforce policies on the types of resources teams can provision and use. You can use Sentinel, a policy-as-code framework, to automatically enforce compliance and governance policies before Terraform makes infrastructure changes.
- Interact with Software Defined Networks (SDNs) to automatically configure the network according to the needs of the applications running in it. This helps in reducing deployment times by eliminating the time used for ticket-based workflow.
For example, when a service registers with HashiCorp Consul, Consul-Terraform-Sync can automatically generate Terraform configuration to expose appropriate ports and adjust network settings for any SDN that has an associated Terraform provider. Network Infrastructure Automation (NIA) allows you to safely approve the changes that your applications require without having to manually translate tickets from developers into the changes you think their applications need.
- Kubernetes is an open-source workload scheduler for containerized applications. Terraform lets you both deploy a Kubernetes cluster and manage its resources (e.g., pods, deployments, services, etc.). You can also use the Kubernetes Operator for Terraform to manage cloud and on-prem infrastructure through a Kubernetes Custom Resource Definition (CRD) and Terraform Cloud.
- Rapidly spin up and decommission infrastructure for development, test, QA, and production. Using Terraform to create disposable environments as needed is more cost-efficient than maintaining each one indefinitely.
- Create, provision, and bootstrap a demo on various cloud providers. This lets end users easily try the software on their own infrastructure and even enables them to adjust parameters like cluster size to more rigorously test tools at any scale.
As an example, an application could consist of a pool of web servers that use a database tier, with additional tiers for API servers, caching servers, and routing meshes. Terraform allows you to manage the resources in each tier together, and automatically handles dependencies between tiers. For example, terraform will deploy a database tier before provisioning the web servers that depend on it.
To get the best of Terraform, it’s essential to follow best practices for managing your terraform code.
Sometime ago I came across terraform code for big migration project to AWS. The structure of terraform code was based on copies of the code across too many subprojects. The hierarchy was a split of different modules for each runtime environment (e.g., prod, uat, dev …etc) and under each environment there was another split per subproject of very similar code. That code worked and provisioned the required infrastructure, but it was not easily maintainable. Having unclean and difficult to maintain terraform code carries very high risk to your infrastructure. Code between subprojects can easily go out of sync as each one is maintained independently of each other. That can lead to accidental misconfiguration or even destruction of part or all provisioned services.
What is the solution then? Solution is to simply follow best practice. Keep it tidy, clean, repeatable, source controlled and safe. That will allow engineers to read the code easily and spot issues quickly. It will also allow faster application of global changes (e.g., tags, security groups …etc)
Let’s take a step back
The structure initially used for that terraform code of the project was suitable for one module (aka subproject). When more and more subprojects added, many copies of the code were made with slight differences. Those differences were mainly in values of variables.
For example, if you need to add a tag to all EC2 instances, you need to go thought tens of subprojects code and update them.
In this example, each subproject includes full set of code to provision services and configure them. The difference between projects is mainly around some tags to identify the project and different security groups
That project ended up in that state because of:
- First module was created without thinking of the future state. With infrastructure code, it’s good practice to have some design sessions with the involved engineers and technical stakeholders who have some idea about the roadmap of the project as whole. That will give good background to design the structure of code upon.
- When need for restructure was seen, it was left in backlog for future improvement for not delaying delivery of the project. Ended up with huge technical debt difficult to clear. Most of engineering teams fall into that when they don’t follow best practices from beginning.
- Engineers started filling gaps here and there when seen rather than fixing root cause. That’s one the worst practices to keeping filling the gaps rather than considering restructure. The earliest that restructure will start, the faster your project will come to better state.
It should be rule of thumb for such project to always “measure twice, cut once”. If time was given to plan and design the structure early on the process, it would have been much simpler to correct the path before feeling lost and exhausted.
How to resolve this?
You can resolve that with some changes to your process of creating and applying your code. Here are the most common best practices successful engineering teams would follow:
- Use Terraform Modules
Move all the code framework to a source module and keep subproject module to provide the custom desired service configuration (e.g., instance type, ebs volume size …etc). Your services are then built in a consistent way. In case of any change, you won’t need to go through all subprojects to make code change. You can just make the change at source module then apply your code.
That will also allow you to ensure compliance of your infrastructure with you defined standards. E.g., standard tags, standard security groups, specific ebs type, certain encryption keys, key pairs for instances …etc.
- Separate Variables from Modules
When using Terraform modules, keep them as generic as possible for the desired structure. Variable types are defined within the module. Keep the values in separate var-file so you don’t need to manipulate code more frequently to change variables.
As an example, an application consists of a pool of web servers that use a database tier, with additional tiers for API servers, caching servers, and routing meshes. The module will then have all code to build such infrastructure. Any project require that infrastructure would just use the module as source and provide custom values for the variables used (e.g., instance type, disk size, security group to use …etc)
- Keep terraform state safe
Terraform state is the golden source for your provisioned services. It’s very critical and important to keep it safe and secure.
Keeping terraform state in a safe backend with locking enabled is key to ensure your code and services are in sync (Backends Configuration). Having locking enabled for your state will ensure any parallel application of terraform code against your infrastructure is blocked. Applying the code should be sequential to avoid any race condition of one set of code creates something and another drops it.
One of the common bad practices in this is to keep terraform state on local disk. If that is gone for any reason, you will be in trouble to make changes. You can still import your services back into terraform but that’s a manual and error-prune process
For example, you can configure your code to write the state into one of the supported backend (e.g., Hashicorp consul)
- Use Standard Format and Naming Convention
When code is ill-formatted, you will require more focus to read and more time to understand and make any changes. Having a well-formatted code will ensure good readability, easier maintenance and less errors.
Code format can include indentation, spacings, splitting code and variables into related sections, using same delimiters …etc
Terraform comes with “terraform fmt” command which meant to reformat your code to a canonical format and style per Terraform Style Conventions
Terraform comes also with “terraform validate” command which checks for syntax errors and typos as well as internal consistency. That doesn’t check the state nor existing infrastructure. It is used for general verification of code.
You may also consider using a Linter tool (e.g., GitHub – terraform-linters/tflint) to ensure your code is well formatted, follow best practises and doesn’t refer to deprecated methods. A linter tool like TFLint is a framework and each feature is provided by plugins. It find possible errors (like illegal instance types) for Major Cloud providers (AWS/Azure/GCP), warn about deprecated syntax, unused declarations and enforces best practices and naming conventions.
That makes it easier to spot differences or code issues at early stage.
- Code Reviews
Many engineers see code reviews as bottleneck to get things done. Code reviews are proving day after day as excellent part of the process.
- It is good practice to have another pair of eyes to acknowledge the change or to advise of better solution.
- It is great way of knowledge sharing among team members so they can provide better support in absence of original contributor.
- It enforces developers to follow certain coding practices throughout the sprint development phase. This approach standardizes the source code, making it convenient for all developers (even new ones) to study and understand it easily.
- It enforces developer collaborations. This encourages developers to interact with each other regarding their code and exchange their thoughts.
- It ensures that any misinterpretation associated with the scope or requirements are resolved at the earliest. This also helps ensure that teams do not end up missing out on critical features.
Try to have two reviewers for every change as standard. If you cannot due to staff shortage or for any other reasons, aim for one but never zero.
Keep Security in Mind
Terraform is a tool would do what it’s told to do. Above, we explained few good practices to get your code into better shape which will then add more stability to your provisioned infrastructure. That’s not the only thing though to have a good infrastructure. Security of that infrastructure is a key to mark it as good one. Imagine you provision a stack like those examples mentioned above but left that open to world to access. You application can easily be compromised and your data can get stolen. You need to ensure security measures are put in place to avoid such scenario.
Those measures need to be maintained and monitored as part of any new changes to ensure no new risks are introduced.
As your infrastructure grows, it becomes more important to continuously scan your infrastructure for vulnerabilities and loopholes to keep your infrastructure safe and secure.
Oak9 got a solution to intelligently analyse and remediate security and compliance design gaps as business and application context change, automatically. Visit How oak9 Works – oak9 for more details of what Oak9 offers. If you need to know where any potential security gaps are, take the terraform security test by clicking the “Start for Free” button above or schedule a demo of oak9.