State Data in Terraform

Terraform’s state is the persistent data structure that allows declarative configuration to work in practice. Without state, Terraform would have no reliable way to understand what already exists, what it manages, and what needs to change.

This chapter explains:

  • What Terraform state data consists of
  • Where it is stored
  • How Terraform uses it internally
  • How resources, data sources, outputs, and metadata are represented

Why State Exists

Terraform is declarative. You describe the desired end state of your infrastructure. To determine what actions are necessary, Terraform must compare:

  • Configuration (what you declare now)
  • Real infrastructure (what exists in the provider)
  • Previous known state (what Terraform believes exists)

The state file bridges configuration and real-world infrastructure. It allows Terraform to compute a safe and minimal execution plan.


What Comprises Terraform State?

The state file (typically terraform.tfstate) is a JSON document. You should never edit it manually, but understanding its structure is important.

At a high level, state contains:

  1. Resource mappings
  2. Data source mappings
  3. Outputs
  4. Metadata
  5. Dependency graph information

Let’s examine each.


1. Resource Mapping

The most critical part of state is the mapping between:

  • A resource in your configuration
  • A real object in a provider (e.g., cloud resource)

Example configuration:

resource "aws_s3_bucket" "assets" {
  bucket = "my-app-assets"
}

Terraform stores in state:

  • The resource address (aws_s3_bucket.assets)
  • The provider-specific ID (e.g., bucket name or ARN)
  • All known attributes returned by the provider
  • Dependency information

This enables Terraform to:

  • Update the correct real-world object
  • Detect drift
  • Destroy the correct infrastructure when requested

Without state, Terraform would not know which S3 bucket belongs to which configuration block.


2. Data Source Mapping

Data sources are read-only queries to providers.

Example:

data "aws_vpc" "default" {
  default = true
}

Unlike resources, data sources do not create infrastructure. However, Terraform still records their evaluated results in state.

Why?

  • To cache resolved values
  • To allow dependency resolution
  • To enable consistent planning within a run

Important distinction:

  • Resources → managed objects (Terraform owns lifecycle)
  • Data sources → fetched objects (Terraform reads but does not manage)

In the state file, data sources are stored similarly to resources but marked as data instances and without lifecycle ownership.


3. Outputs

Outputs are values exported from a module:

output "bucket_name" {
  value = aws_s3_bucket.assets.id
}

State stores:

  • Output name
  • Output value
  • Whether it is sensitive

This enables:

  • Cross-module communication
  • terraform output
  • Remote state data usage
  • Integration with automation systems

If another project uses:

data "terraform_remote_state" "infra" { ... }

It reads outputs directly from stored state.

Outputs therefore act as a public interface of your infrastructure module.


4. Metadata

State also contains metadata such as:

  • Terraform version used
  • Serial number (incremented each write)
  • Lineage (unique ID for the state)
  • Backend configuration
  • Provider configuration references

The serial number prevents concurrent writes and helps backends detect conflicts.

The lineage ensures that Terraform does not accidentally merge unrelated states.

Metadata ensures safety, consistency, and concurrency protection.


5. Dependency Graph Information

Terraform builds a dependency graph during planning. Some of that structure is stored implicitly in state through references and resource relationships.

This allows Terraform to:

  • Apply changes in correct order
  • Destroy in reverse dependency order
  • Identify implicit dependencies via interpolation

Although the full graph is rebuilt each run, state contains enough attribute information to reconstruct relationships.


Where Is State Stored?

Local State (Default)

By default:

terraform.tfstate
terraform.tfstate.backup

Stored locally in your working directory.

This is suitable only for:

  • Personal projects
  • Experiments
  • Non-collaborative workflows

It is not safe for teams due to lack of locking.


Remote Backends

For production use, state should be stored remotely.

Common backends:

  • S3-compatible storage (e.g. Amazon S3)
  • Google Cloud Storage
  • Azure Blob Storage
  • Terraform Cloud / Terraform Enterprise
  • HTTP backends

Remote backends provide:

  • State locking
  • Versioning
  • Encryption
  • Access control
  • Team collaboration

State locking prevents two engineers from running apply simultaneously and corrupting infrastructure.


How Terraform Uses State

Terraform operates in a sequence:

1. Refresh Phase

Terraform queries providers and compares real infrastructure to state.

If drift is detected:

  • State is updated
  • Differences appear in the plan

2. Plan Phase

Terraform compares:

  • Desired configuration
  • Current state

It computes actions:

  • Create
  • Update
  • Replace
  • Destroy

3. Apply Phase

After successful execution:

  • State is updated to reflect new infrastructure reality
  • Serial number increments

State is therefore both:

  • A record of managed infrastructure
  • A mechanism for computing future changes

Resource vs Data Source Mapping in State

Understanding this distinction is important architecturally.

AspectResourceData Source
Creates infrastructureYesNo
Lifecycle managedYesNo
Stored in stateYesYes
Can be destroyedYesNo
Used for dependency resolutionYesYes

Data sources behave like cached lookups, whereas resources represent owned infrastructure objects.


State and Drift Detection

Drift occurs when infrastructure changes outside Terraform (e.g., manual cloud console modification).

Because state contains previously known attributes, Terraform can:

  • Detect attribute mismatches
  • Propose corrective updates
  • Reconcile infrastructure

This is one of the most important reasons state must be accurate and protected.


Sensitive Data in State

State may contain:

  • Passwords
  • API keys
  • Private IP addresses
  • Connection strings

Even if marked sensitive in outputs, the raw values still exist in state.

Therefore:

  • Remote backend encryption is critical
  • Access to state must be tightly controlled
  • State files must never be committed to version control

Internal Structure (Conceptual Overview)

A simplified state structure looks like:

{
  "version": 4,
  "terraform_version": "1.x.x",
  "serial": 12,
  "lineage": "uuid",
  "resources": [
    {
      "type": "aws_s3_bucket",
      "name": "assets",
      "instances": [
        {
          "attributes": {
            "id": "my-app-assets",
            "arn": "...",
            "region": "eu-central-1"
          }
        }
      ]
    }
  ],
  "outputs": {
    "bucket_name": {
      "value": "my-app-assets",
      "sensitive": false
    }
  }
}

Actual state files are more complex, but this illustrates the conceptual components.


Architectural Implications

In my opinion, state management is the most operationally critical aspect of Terraform.

Good practices include:

  • Always use a remote backend in team environments
  • Enable versioning on state storage
  • Enable locking
  • Restrict write access
  • Treat state as sensitive data
  • Avoid splitting infrastructure into too many tiny states without reason
  • Separate unrelated domains into different states

State boundaries are architectural boundaries.


Summary

Terraform state:

  • Maps configuration to real infrastructure
  • Stores resource and data source information
  • Exposes outputs
  • Maintains metadata for safety and locking
  • Enables drift detection
  • Powers plan and apply operations

Without state, Terraform would be a static templating engine.
With state, it becomes a reliable infrastructure management system.

If you want, the next chapter could cover advanced topics such as:

  • State migration
  • Importing existing infrastructure
  • State refactoring (terraform state mv)
  • Backend architecture patterns
  • Monorepo vs multi-state strategies

About Author

Mathias Bothe To my job profile

I am Mathias from Heidelberg, Germany. I am a passionate IT freelancer with 15+ years experience in programming, especially in developing web based applications for companies that range from small startups to the big players out there. I create Bosycom and initiated several software projects.