๐Ÿš€ UNDER THE HOOD: Terraform Drift Detection (Manual Changes)

๐Ÿš€ UNDER THE HOOD: Terraform Drift Detection (Manual Changes)

Terraform drift happens when the real-world infrastructure deviates from whatโ€™s defined in the Terraform state file. This causes inconsistencies and surprises when you run terraform plan or terraform apply. Here are the main reasons drift can occur:

Some of the key reasons as drift can occur.


๐Ÿ”ง 1. Manual Changes in Cloud Console

  • Someone updates, deletes, or creates resources directly from the cloud providerโ€™s UI (like AWS Console, Azure Portal, etc.), bypassing Terraform.

๐Ÿ’ป 2. Changes via Other Tools

  • Resources are modified by other IaC tools (e.g., CloudFormation, ARM templates) or by scripts (e.g., bash, PowerShell) outside of Terraform.

๐Ÿ”„ 3. Auto-Scaling or Auto-Healing

  • Services like Auto Scaling Groups, Kubernetes, or Managed Services may create or destroy infrastructure automatically.

๐Ÿง  4. Terraform Bugs or Misbehavior

  • A bug in a Terraform provider or incorrect schema versioning can cause the state to become inaccurate.

๐Ÿงฑ 5. Third-party Actions

  • Third-party services or CI/CD pipelines that recreate or modify infrastructure without involving Terraform.

๐Ÿ•ต๏ธ 6. State File Modifications

  • Manual editing or corruption of the .tfstate file can desync the state from the actual infra.

๐Ÿงฎ 7. External Resource Dependencies

  • Resources that are not managed by Terraform but are referenced within Terraform-managed resources might change, causing inconsistencies.

๐Ÿ•“ 8. Out-of-Band API Updates

  • Direct API calls (e.g., via AWS CLI, SDKs) that modify infrastructure outside Terraformโ€™s knowledge.

๐Ÿ“ฆ 9. Resource Defaults or Computed Values

  • Cloud providers may change default values or “computed” attributes (like generated names or IDs) which Terraform canโ€™t always detect until refresh.

๐Ÿ”„ 10. Data Source Updates

  • Data sources (e.g., aws_ami, aws_ssm_parameter) can change over time, and if used in resources, they may cause unexpected diffs during apply.

Lets have a look at one of the scenarios when a drift occurs.

1. Terraform Architecture Basics

When you run Terraform, it uses:

  • .tf files โž” desired configuration
  • .tfstate file โž” last known real-world infrastructure snapshot
  • Providers โž” plugins that talk to cloud APIs (e.g., AWS, Azure)

The Terraform Provider (e.g., terraform-provider-aws) contains the logic to communicate with the actual cloud (via REST API / gRPC calls).

2. How Terraform State Works

  • .tfstate is a JSON file containing detailed metadata of every resource Terraform created:
    • IDs
    • Properties (instance type, security groups, tags, etc.)
    • Relationships between resources
  • It is not the source of truth โ€” itโ€™s a cached snapshot of the infrastructure at the time of last apply.

After a manual change in the cloud, the .tfstate is now outdated, because:

  • Terraform still thinks it manages the “old” version.
  • Cloud reality is different.

3. What Happens During terraform plan (Deeper View)

When you run terraform plan, the process is:

โž” Step 1: Provider Schema Initialization

  • Terraform loads the provider (e.g., AWS Provider).
  • It reads the schema of each resource (aws_instance, aws_s3_bucket, etc.)
  • It knows which attributes are “readable”, “writable”, “computed”, “required”, etc.

Example

โž” Step 2: State Reconciliation

For each resource:

  • Terraform reads the resource data from the cloud provider using API calls.

Example AWS API calls:

  • DescribeInstances
  • DescribeSecurityGroups
  • GetBucketAcl

These calls fetch the current real values from AWS, Azure, GCP, etc.

๐Ÿ‘‰ This is called Refresh Phase (before plan is fully calculated).

โž” Step 3: Diff Computation

Terraform:

  • Compares the fetched live data (API result) against the stored .tfstate values.
  • It calculates a “diff”:
    • If a property is different, it is marked as ~ (needs update).
    • If a resource is missing, it is marked as - (needs recreate).
    • If extra resources exist outside Terraform control (and you use import, taint, etc.), special handling occurs.

Example logic internally:

โž” Step 4: Planning the Changes

  • Terraform generates an execution plan based on the diff.
  • It decides:
    • Whether an in-place update is enough
    • Or if a resource destroy + recreate is needed
  • It respects lifecycle meta-arguments like:
    • ignore_changes
    • create_before_destroy
    • prevent_destroy

Execution Plan Output: This is what you see in the CLI.

4. Special Scenarios

CaseTerraform Behavior
Resource property changed (e.g., instance_type)Terraform marks resource for update ~
Resource deleted manuallyTerraform may mark resource for recreation -/+
Resource modified with ignore_changesTerraform ignores and suppresses that drift
Resource recreated manually (with new ID)Terraform may “want” to destroy and reapply, or show a conflict

5. Internal Concepts

  • Refresh-only plan (newer Terraform versions): allows you to separately refresh state without planning changes.
  • Pluggable Providers: Each cloud has its provider plugin that implements refresh/diff logic.
  • Graph Building: Terraform builds a dependency graph of all resources first to plan correctly.
  • State Locking: (e.g., via S3+DynamoDB) ensures that no concurrent drift detection or changes corrupt the state.

๐Ÿง  Example Timeline

Let’s say you manually resized an EC2 from t2.micro โž” t3.micro.

When you run terraform plan:

  1. Terraform refreshes the EC2 instance by calling DescribeInstances.
  2. Sees that the instance type is now t3.micro.
  3. Compares with .tfstate (which says t2.micro).
  4. Flags a drift (instance_type changed).
  5. Shows a ~ modification in the plan.
  6. apply will set the instance back to t2.micro (if that’s what the .tf says), possibly needing an instance replacement depending on AWS policies.

๐Ÿงฉ In Short:

StepBehind-the-scenes Activity
Manual ChangeCloud infra drifts from Terraform state
Terraform Refresh PhaseProviders fetch live resource data
Terraform Diff PhaseCompare live vs. expected state
Terraform Plan OutputShow required changes to fix drift
Terraform ApplyCorrect the drift by enforcing .tf code

๐ŸŒŸ Bonus Tip:

You can only detect drift when you run a Terraform operation (plan, apply, or refresh).

SituationResult
You run plan but don’t applyDrift remains.
You use ignore_changes in resource blockDrift will be ignored.
You manually delete the instanceTerraform will try to recreate it.
You manually recreate with same tags but different IDTerraform will recreate the missing one, not reuse it.


Terraform by default has no background drift monitoring โ€” unless you use Terraform Cloud or an external tool. There are tools available which can be used to detect drifts outside of terraform and validate during your infrastructure orchestration.

If you liked the article then I will cover some more on drift detection scenarios and drift detection tools which you can integerate in your IaC Automation Orchestration.

๐Ÿš€ Terraform drift can silently break your infrastructure if left unchecked! I hope this deep-dive helped you understand how and why drift happens, and what really goes on behind the scenes during drift detection.


If you found this valuable, drop a like, share it with your cloud engineering community, and repost to help others who might be struggling with unexpected Terraform surprises. ๐ŸŒŸ
๐Ÿ’ฌ Got your own drift horror story? Comment below โ€” Iโ€™d love to hear your experiences and maybe even feature them in my next post where Iโ€™ll explore real-world drift scenarios and essential drift detection tools you can integrate into your CI/CD pipelines! Stay tuned! ๐Ÿ”ฅ

#Terraform #InfrastructureAsCode #DevOps #CloudComputing #IaC #AWS #Azure #GCP #TerraformDrift #DevOpsCommunity #CloudEngineering #SRE #Automation #CloudNative #CICD #InfrastructureManagement

Follow on LinkedIn

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *