How we use Infrastructure as Code to manage Cloudflare at enterprise scale: shifting to the left

One of Cloudflare’s most important systems is the platform. Utilizing our products to secure and enhance our own services, we are our own Customer Zero. Within our security division, a dedicated Customer Zero team uses its unique position to provide a constant, high-fidelity feedback loop to product and engineering that drives continuous improvement of our products. And we do this at a global scale — where a single misconfiguration can propagate across our edge in seconds and lead to unintended consequences. You are familiar with the feeling if you have ever hesitated before pushing a change to production, sweating because you are aware that one insignificant error could lock every employee out of a critical application or shut down a production service. We worry about the possibility of unintended consequences because it is real. This presents an interesting challenge: How do we ensure hundreds of internal production Cloudflare accounts are secured consistently while minimizing human error?

Even though the Cloudflare dashboard is great for analytics and observability, manually clicking through hundreds of accounts to make sure their security settings are the same is a surefire way to make mistakes. We switched from treating our configurations as manual point-and-click tasks to treating them as code in order to preserve both our sanity and our security. To move security checks to the earliest stages of development, we adopted “shift left” principles. For us, this was not a lofty corporate objective. It was a survival mechanism to catch errors before they caused an incident, and it required a fundamental change in our governance architecture.

Contents

1 What it means to us to Shift
2 A production IaC operating model
3 Our company’s IaC stack
4 Policies and guidelines as a code
5 Policy definition as code
6 Establishing the baseline

What it means to us to Shift

Left Moving validation steps earlier in the software development lifecycle (SDLC) is referred to as “shifting left.” In practice, this entails directly incorporating testing, security audits, and policy compliance checks into the CI/CD pipeline. Instead of discovering issues or misconfigurations after deployment, we identify them at the merge request stage, when the cost of remediation is lowest.

When we think about applying shift left principles at Cloudflare, four key principles stand out:
Consistency:

It must be simple to copy and use configurations across accounts.

Scalability: It is possible to quickly apply large changes to multiple accounts.
Observability: Configurations must be auditable by anyone for current state, accuracy, and security.
Governance: Guardrails must be proactive — enforced before deployment to avoid incidents.

A production IaC operating model

All production accounts are now managed using Infrastructure as Code (IaC) to support this model. Every change is recorded and associated with a user, a commit, and an internal ticket. The dashboard is still used by teams for analytics and insights, but code is used to make important production changes. This model ensures that every change is peer-reviewed, and even though the security team sets policies, the owning engineering teams actually implement them. This setup is grounded in two major technologies: Terraform and a custom CI/CD pipeline.

Our company’s IaC stack

We chose Terraform for its mature open-source ecosystem, strong community support, and deep integration with Policy as Code tooling. Furthermore, using the Cloudflare Terraform Provider internally allows us to actively dogfood the experience and improve it for our customers.

Our CI/CD pipeline is based on Atlantis and is integrated with GitLab in order to manage the scale of hundreds of accounts and approximately 30 merge requests per day. We also use a custom go program, tfstate-butler, that acts as a broker to securely store state files.

Terraform uses tfstate-butler as an HTTP backend. Security was the primary motivation for the design: it ensures that each state file has its own unique encryption key to limit the blast radius of any potential compromise. A centralized monorepo defines all internal account configurations. Individual teams own and deploy their specific configurations and are the designated code owners for their sections of this centralized repository, ensuring accountability. Check out How Cloudflare uses Terraform to manage Cloudflare to learn more about this configuration.

Policies and guidelines as a code

Establishing a solid security baseline for all internal production Cloudflare accounts is essential to the entire shift left strategy. A set of security policies that are defined in code (Policy as Code) is the baseline. This baseline is more than just a set of guidelines; rather, it is a necessary security configuration that we enforce across the platform, such as a maximum session length, mandatory logs, particular WAF configurations, and other similar things. This setup is where policy enforcement shifts from manual audits to automated gates. Through the Atlantis Conftest Policy Checking feature, we make use of the Open Policy Agent (OPA) framework and its policy language, Rego.

Policy definition as code

Rego policies specify the fundamental security requirements for all Cloudflare provider resources. About 50 policies are in our possession at the moment. For example, here is a Rego policy that validates only @cloudflare.com emails are allowed to be used in an access policy:

Establishing the baseline

Every merge request (MR) is subject to the policy check, which verifies compliance before deployment. The output of a policy check can be seen right in the GitLab MR comment thread.

In order to enforce policies, there are two methods:

Warning: Allows the merge despite leaving a comment on the MR.

Deny: Completely prevents the deployment. If the policy check determines the configuration being applied in the MR deviates from the baseline, the output will return which resources are out of compliance.

How we use Infrastructure as Code to manage Cloudflare at enterprise scale: shifting to the left

What it means to us to Shift

A production IaC operating model

Our company’s IaC stack

Policies and guidelines as a code

Policy definition as code

Establishing the baseline

admin

You may also like

Recent posts

Recent Post

Follow us

Most popular

Most discussed