Terraform’s state is the persistent data structure that allows declarative configuration to work in practice. Without state, Terraform would have no reliable way to understand what already exists, what it manages, and what needs to change.
This chapter explains:
- What Terraform state data consists of
- Where it is stored
- How Terraform uses it internally
- How resources, data sources, outputs, and metadata are represented
Why State Exists
Terraform is declarative. You describe the desired end state of your infrastructure. To determine what actions are necessary, Terraform must compare:
- Configuration (what you declare now)
- Real infrastructure (what exists in the provider)
- Previous known state (what Terraform believes exists)
The state file bridges configuration and real-world infrastructure. It allows Terraform to compute a safe and minimal execution plan.
What Comprises Terraform State?
The state file (typically terraform.tfstate) is a JSON document. You should never edit it manually, but understanding its structure is important.
At a high level, state contains:
- Resource mappings
- Data source mappings
- Outputs
- Metadata
- Dependency graph information
Let’s examine each.
1. Resource Mapping
The most critical part of state is the mapping between:
- A resource in your configuration
- A real object in a provider (e.g., cloud resource)
Example configuration:
resource "aws_s3_bucket" "assets" {
bucket = "my-app-assets"
}
Terraform stores in state:
- The resource address (
aws_s3_bucket.assets) - The provider-specific ID (e.g., bucket name or ARN)
- All known attributes returned by the provider
- Dependency information
This enables Terraform to:
- Update the correct real-world object
- Detect drift
- Destroy the correct infrastructure when requested
Without state, Terraform would not know which S3 bucket belongs to which configuration block.
2. Data Source Mapping
Data sources are read-only queries to providers.
Example:
data "aws_vpc" "default" {
default = true
}
Unlike resources, data sources do not create infrastructure. However, Terraform still records their evaluated results in state.
Why?
- To cache resolved values
- To allow dependency resolution
- To enable consistent planning within a run
Important distinction:
- Resources → managed objects (Terraform owns lifecycle)
- Data sources → fetched objects (Terraform reads but does not manage)
In the state file, data sources are stored similarly to resources but marked as data instances and without lifecycle ownership.
3. Outputs
Outputs are values exported from a module:
output "bucket_name" {
value = aws_s3_bucket.assets.id
}
State stores:
- Output name
- Output value
- Whether it is sensitive
This enables:
- Cross-module communication
terraform output- Remote state data usage
- Integration with automation systems
If another project uses:
data "terraform_remote_state" "infra" { ... }
It reads outputs directly from stored state.
Outputs therefore act as a public interface of your infrastructure module.
4. Metadata
State also contains metadata such as:
- Terraform version used
- Serial number (incremented each write)
- Lineage (unique ID for the state)
- Backend configuration
- Provider configuration references
The serial number prevents concurrent writes and helps backends detect conflicts.
The lineage ensures that Terraform does not accidentally merge unrelated states.
Metadata ensures safety, consistency, and concurrency protection.
5. Dependency Graph Information
Terraform builds a dependency graph during planning. Some of that structure is stored implicitly in state through references and resource relationships.
This allows Terraform to:
- Apply changes in correct order
- Destroy in reverse dependency order
- Identify implicit dependencies via interpolation
Although the full graph is rebuilt each run, state contains enough attribute information to reconstruct relationships.
Where Is State Stored?
Local State (Default)
By default:
terraform.tfstate terraform.tfstate.backup
Stored locally in your working directory.
This is suitable only for:
- Personal projects
- Experiments
- Non-collaborative workflows
It is not safe for teams due to lack of locking.
Remote Backends
For production use, state should be stored remotely.
Common backends:
- S3-compatible storage (e.g. Amazon S3)
- Google Cloud Storage
- Azure Blob Storage
- Terraform Cloud / Terraform Enterprise
- HTTP backends
Remote backends provide:
- State locking
- Versioning
- Encryption
- Access control
- Team collaboration
State locking prevents two engineers from running apply simultaneously and corrupting infrastructure.
How Terraform Uses State
Terraform operates in a sequence:
1. Refresh Phase
Terraform queries providers and compares real infrastructure to state.
If drift is detected:
- State is updated
- Differences appear in the plan
2. Plan Phase
Terraform compares:
- Desired configuration
- Current state
It computes actions:
- Create
- Update
- Replace
- Destroy
3. Apply Phase
After successful execution:
- State is updated to reflect new infrastructure reality
- Serial number increments
State is therefore both:
- A record of managed infrastructure
- A mechanism for computing future changes
Resource vs Data Source Mapping in State
Understanding this distinction is important architecturally.
| Aspect | Resource | Data Source |
|---|---|---|
| Creates infrastructure | Yes | No |
| Lifecycle managed | Yes | No |
| Stored in state | Yes | Yes |
| Can be destroyed | Yes | No |
| Used for dependency resolution | Yes | Yes |
Data sources behave like cached lookups, whereas resources represent owned infrastructure objects.
State and Drift Detection
Drift occurs when infrastructure changes outside Terraform (e.g., manual cloud console modification).
Because state contains previously known attributes, Terraform can:
- Detect attribute mismatches
- Propose corrective updates
- Reconcile infrastructure
This is one of the most important reasons state must be accurate and protected.
Sensitive Data in State
State may contain:
- Passwords
- API keys
- Private IP addresses
- Connection strings
Even if marked sensitive in outputs, the raw values still exist in state.
Therefore:
- Remote backend encryption is critical
- Access to state must be tightly controlled
- State files must never be committed to version control
Internal Structure (Conceptual Overview)
A simplified state structure looks like:
{
"version": 4,
"terraform_version": "1.x.x",
"serial": 12,
"lineage": "uuid",
"resources": [
{
"type": "aws_s3_bucket",
"name": "assets",
"instances": [
{
"attributes": {
"id": "my-app-assets",
"arn": "...",
"region": "eu-central-1"
}
}
]
}
],
"outputs": {
"bucket_name": {
"value": "my-app-assets",
"sensitive": false
}
}
}
Actual state files are more complex, but this illustrates the conceptual components.
Architectural Implications
In my opinion, state management is the most operationally critical aspect of Terraform.
Good practices include:
- Always use a remote backend in team environments
- Enable versioning on state storage
- Enable locking
- Restrict write access
- Treat state as sensitive data
- Avoid splitting infrastructure into too many tiny states without reason
- Separate unrelated domains into different states
State boundaries are architectural boundaries.
Summary
Terraform state:
- Maps configuration to real infrastructure
- Stores resource and data source information
- Exposes outputs
- Maintains metadata for safety and locking
- Enables drift detection
- Powers plan and apply operations
Without state, Terraform would be a static templating engine.
With state, it becomes a reliable infrastructure management system.
If you want, the next chapter could cover advanced topics such as:
- State migration
- Importing existing infrastructure
- State refactoring (
terraform state mv) - Backend architecture patterns
- Monorepo vs multi-state strategies