Building a GitOps Process for Your Cloud Infrastructure

Provisioning and managing cloud resources present security challenges. Manually configuring these services is error prone and leads to security risks. Managing your infrastructure becomes even harder as team size grows. Without a proper infrastructure development process, your team’s ability to meet infrastructure requirements will slow to a crawl.


Limits of AWS Web Console

Imagine you are creating and managing resources in AWS. Using the web console, you are able to quickly create and configure these resources. You document your infrastructure in a diagram, updating it as your infrastructure needs changes. Initially using the web console and documenting it works ok for your needs. But as your needs grow and change over time, and the documentation gets neglected, it becomes harder and harder to manage your infrastructure. This process is slow, error prone and unsustainable. It becomes clear that a better process is needed to manage the growing needs of your organization.

Manual configuration with web console

Manual configuration with web console.


Provision Cloud Resources with Terraform

Infrastructure as Code (IaC) enables provisioning of resources through code instead of manual configurations. You decide to use Terraform, an IaC tool. Now your provisioned resources are clearly defined in a set of configuration files. These files become the source of truth. Terraform is declarative, so you simply declare the services you need and it handles how to get those services up and running. The configuration files act as documentation of the provisioned services. 

But in a growing team with varying terraform development skills, developers still occasionally configure resources through the web console. These manual changes aren’t captured in the Terraform configuration files, resulting in infrastructure drift. By breaking the inviolability of the official source of truth, the manual changes cause deployment stoppages until they are remediated into the terraform config files.  

Meanwhile, coordinating the terraform state file among developers becomes unmanageable for concurrent fulfillment of infrastructure changes. Neither sharing the state file nor storing it in the repository are options since the state file contains secrets. A better process is needed.

Manual Terraform management mixed with web console changes

Manual Terraform management mixed with web console changes.


GitOps: IaC with Software Development Best Practices

Your team needs a process that upholds Terraform as the source of truth and securely manages the state file for your team. So your team decides to implement GitOps. GitOps is a set of practices that applies version control, code review and automation for continuous integration and deployment to your IaC configuration files. By following the GitHub Flow process, pull requests are created for terraform changes that are validated and reviewed before they are integrated into the main codebase. This collaborative review process adds a layer of security to infrastructure changes. And it has the added benefit of upskilling developers on Terraform. Using git, the team also keeps records of changes made to the infrastructure, helping to see and revert changes easily.

Using GitHub Actions, a CI/CD automation tool, your team decides to define a workflow that triggers on pull requests. As part of the validation checks, steps are defined in the pull request workflow to validate the syntax of the Terraform file(s) and generate a speculative plan for each pull request. Changes are only ready for review if these validation checks pass. 

As part of enabling GitHub Action access to the terraform state file, your team migrates the state file to AWS. To serialize access to the state file, you implement a locking mechanism according to terraform best practices.

Terraform with GitHub Flow.  State file in AWS.

Terraform with GitHub Flow.  State file in AWS.


Checkov Ensures Cloud Security

But the complexity of AWS services still leaves potential security risks from misconfigurations. With the many services that AWS offers, it’s hard to track what are best practices on using these services. So you start investigating tools that will help identify these vulnerabilities. Checkov, an open-sourced static analysis tool can scan your Terraform configuration files. It has over a thousand policies sourced by the community on common misconfigurations of various cloud service providers. Your team decides to incorporate it into your GitOps process as a validation check on pull requests to catch potential security issues before they are merged into the main codebase and deployed to the infrastructure.  

To secure and prevent future infrastructure drift, your team locks down IAM roles. Developers are limited on write/delete access to AWS resources. Special privileges for manual configuration are only used in emergencies. And those manual configurations are remediated in Terraform configuration quickly to prevent deployment stoppage.

Checkov scan with GitHub Action Workflow

Checkov scan with GitHub Action Workflow.

Your team decides to only allow manual deployments since infrastructure changes are highly disruptive. To deploy changes, your team defines a separate GitHub Action workflow that can only be manually triggered. Your team uses environments to define deployment targets. Environments allow configuration of secrets and protection rules. With rules, you can require reviewers for deployment jobs, adding another layer of security before changes are deployed to your infrastructure.

Manual deployment with required reviewers.

A GitOps Workflow Example 

As a demo of a GitOps workflow using Terraform and GitHubAction, here is an example repo of a simple S3 bucket provisioning in AWS. On a pull request, it scans changes with Checkov and subsequently checks the format, validates the syntax, and generates a speculative plan from the Terraform configuration files.  

In the example, Checkov recommends changing public access to the bucket, adding versioning, logging, encryption, and lifecycle settings. The rest of the terraform configuration addresses some of these issues while a Checkov config file skips some of the checks as a demonstration.

A separate GitHub Action workflow for manual deployment is located here. This workflow applies to a defined `production` environment that includes a list of approvers as part of the deployment production rule.


GitOps Organizes Cloud Infrastructure 

Adopting a solid GitOps framework with Infrastructure as Code, improves your team’s ability to securely manage your cloud infrastructure. The provisioned resources are enshrined in code. Source control provides auditability of changes. Changes go through reviews and validations. Tooling catches common configuration pitfalls. Deployments occur through manual triggers and approvals. And IAM policies enforce access to resources and prevent infrastructure drift. Embracing this framework empowers your team to gain comprehensive control over cloud infrastructure.

Continue the conversation.

Lab Zero is a San Francisco-based product team helping startups and Fortune 100 companies build flexible, modern, and secure solutions.