Terraform drift happens when the real-world infrastructure deviates from what’s defined in the Terraform state file. This causes inconsistencies and surprises when you run terraform plan or terraform apply. Here are the main reasons drift can occur:

Some of the key reasons as drift can occur.

🔧 1. Manual Changes in Cloud Console

Someone updates, deletes, or creates resources directly from the cloud provider’s UI (like AWS Console, Azure Portal, etc.), bypassing Terraform.

💻 2. Changes via Other Tools

Resources are modified by other IaC tools (e.g., CloudFormation, ARM templates) or by scripts (e.g., bash, PowerShell) outside of Terraform.

🔄 3. Auto-Scaling or Auto-Healing

Services like Auto Scaling Groups, Kubernetes, or Managed Services may create or destroy infrastructure automatically.

🧠 4. Terraform Bugs or Misbehavior

A bug in a Terraform provider or incorrect schema versioning can cause the state to become inaccurate.

🧱 5. Third-party Actions

Third-party services or CI/CD pipelines that recreate or modify infrastructure without involving Terraform.

🕵️ 6. State File Modifications

Manual editing or corruption of the .tfstate file can desync the state from the actual infra.

🧮 7. External Resource Dependencies

Resources that are not managed by Terraform but are referenced within Terraform-managed resources might change, causing inconsistencies.

🕓 8. Out-of-Band API Updates

Direct API calls (e.g., via AWS CLI, SDKs) that modify infrastructure outside Terraform’s knowledge.

📦 9. Resource Defaults or Computed Values

Cloud providers may change default values or “computed” attributes (like generated names or IDs) which Terraform can’t always detect until refresh.

🔄 10. Data Source Updates

Data sources (e.g., aws_ami, aws_ssm_parameter) can change over time, and if used in resources, they may cause unexpected diffs during apply.

Lets have a look at one of the scenarios when a drift occurs.

1. Terraform Architecture Basics

When you run Terraform, it uses:

.tf files ➔ desired configuration
.tfstate file ➔ last known real-world infrastructure snapshot
Providers ➔ plugins that talk to cloud APIs (e.g., AWS, Azure)

The Terraform Provider (e.g., terraform-provider-aws) contains the logic to communicate with the actual cloud (via REST API / gRPC calls).

2. How Terraform State Works

.tfstate is a JSON file containing detailed metadata of every resource Terraform created:
- IDs
- Properties (instance type, security groups, tags, etc.)
- Relationships between resources
It is not the source of truth — it’s a cached snapshot of the infrastructure at the time of last apply.

After a manual change in the cloud, the .tfstate is now outdated, because:

Terraform still thinks it manages the “old” version.
Cloud reality is different.

3. What Happens During `terraform plan` (Deeper View)

When you run terraform plan, the process is:

➔ Step 1: Provider Schema Initialization

Terraform loads the provider (e.g., AWS Provider).
It reads the schema of each resource (aws_instance, aws_s3_bucket, etc.)
It knows which attributes are “readable”, “writable”, “computed”, “required”, etc.

Example

➔ Step 2: State Reconciliation

For each resource:

Terraform reads the resource data from the cloud provider using API calls.

Example AWS API calls:

DescribeInstances
DescribeSecurityGroups
GetBucketAcl

These calls fetch the current real values from AWS, Azure, GCP, etc.

👉 This is called Refresh Phase (before plan is fully calculated).

➔ Step 3: Diff Computation

Terraform:

Compares the fetched live data (API result) against the stored .tfstate values.
It calculates a “diff”:
- If a property is different, it is marked as ~ (needs update).
- If a resource is missing, it is marked as - (needs recreate).
- If extra resources exist outside Terraform control (and you use import, taint, etc.), special handling occurs.

Example logic internally:

➔ Step 4: Planning the Changes

Terraform generates an execution plan based on the diff.
It decides:
- Whether an in-place update is enough
- Or if a resource destroy + recreate is needed
It respects lifecycle meta-arguments like:
- ignore_changes
- create_before_destroy
- prevent_destroy

Execution Plan Output: This is what you see in the CLI.

4. Special Scenarios

Case	Terraform Behavior
Resource property changed (e.g., `instance_type`)	Terraform marks resource for update `~`
Resource deleted manually	Terraform may mark resource for recreation `-/+`
Resource modified with `ignore_changes`	Terraform ignores and suppresses that drift
Resource recreated manually (with new ID)	Terraform may “want” to destroy and reapply, or show a conflict

5. Internal Concepts

Refresh-only plan (newer Terraform versions): allows you to separately refresh state without planning changes.
Pluggable Providers: Each cloud has its provider plugin that implements refresh/diff logic.
Graph Building: Terraform builds a dependency graph of all resources first to plan correctly.
State Locking: (e.g., via S3+DynamoDB) ensures that no concurrent drift detection or changes corrupt the state.

🧠 Example Timeline

Let’s say you manually resized an EC2 from t2.micro ➔ t3.micro.

When you run terraform plan:

Terraform refreshes the EC2 instance by calling DescribeInstances.
Sees that the instance type is now t3.micro.
Compares with .tfstate (which says t2.micro).
Flags a drift (instance_type changed).
Shows a ~ modification in the plan.
apply will set the instance back to t2.micro (if that’s what the .tf says), possibly needing an instance replacement depending on AWS policies.

🧩 In Short:

Step	Behind-the-scenes Activity
Manual Change	Cloud infra drifts from Terraform state
Terraform Refresh Phase	Providers fetch live resource data
Terraform Diff Phase	Compare live vs. expected state
Terraform Plan Output	Show required changes to fix drift
Terraform Apply	Correct the drift by enforcing `.tf` code

🌟 Bonus Tip:

You can only detect drift when you run a Terraform operation (plan, apply, or refresh).

Situation	Result
You run `plan` but don’t apply	Drift remains.
You use `ignore_changes` in resource block	Drift will be ignored.
You manually delete the instance	Terraform will try to recreate it.
You manually recreate with same tags but different ID	Terraform will recreate the missing one, not reuse it.

Terraform by default has no background drift monitoring — unless you use Terraform Cloud or an external tool. There are tools available which can be used to detect drifts outside of terraform and validate during your infrastructure orchestration.

If you liked the article then I will cover some more on drift detection scenarios and drift detection tools which you can integerate in your IaC Automation Orchestration.

🚀 Terraform drift can silently break your infrastructure if left unchecked! I hope this deep-dive helped you understand how and why drift happens, and what really goes on behind the scenes during drift detection.

If you found this valuable, drop a like, share it with your cloud engineering community, and repost to help others who might be struggling with unexpected Terraform surprises. 🌟
💬 Got your own drift horror story? Comment below — I’d love to hear your experiences and maybe even feature them in my next post where I’ll explore real-world drift scenarios and essential drift detection tools you can integrate into your CI/CD pipelines! Stay tuned! 🔥

#Terraform #InfrastructureAsCode #DevOps #CloudComputing #IaC #AWS #Azure #GCP #TerraformDrift #DevOpsCommunity #CloudEngineering #SRE #Automation #CloudNative #CICD #InfrastructureManagement

Follow on LinkedIn

Comments

Leave a Reply Cancel reply