Configuring Query Plan Caching in the Apollo Router

Query planning is the Apollo Router’s dominant CPU cost, and caching the resulting plans is the single highest-leverage tuning step for a busy federated graph. This page shows how to configure the in-memory plan cache, add a distributed Redis plan cache shared across the fleet, warm the cache so new pods start hot, and verify the cache is actually working through metrics.

When to Use This Pattern

  • Your router shows high CPU and p99 latency spikes that correlate with newly started or scaled-up pods (cold plan caches).
  • You run more than one router instance and want them to share warm plans rather than each re-planning the same operations.
  • You serve a large but bounded set of operations (ideally via persisted queries) where plan reuse is high.

This builds directly on the router setup in Apollo Router Configuration and Deployment; have a working router.yaml and a composed supergraph before tuning the cache.

Prerequisites

  • Apollo Router v1.40+ (distributed plan cache is generally available in recent 1.x)
  • A composed Federation v2 supergraph served by the router
  • For the distributed cache: a reachable Redis instance (single node or Redis cluster)
  • Metrics wired up so you can observe cache hit ratio — see Observability and Distributed Tracing in Federation

How Plan Caching Works

For every incoming operation the router must produce a query plan: the ordered, parallelised set of subgraph fetches that resolves the operation against the supergraph. Planning is deterministic for a given operation and supergraph, so the plan can be cached and reused for every subsequent identical operation. The router keys the plan cache on a hash of the operation and the supergraph schema, which means a new supergraph (after a schema publish) correctly invalidates stale plans.

There are two cache tiers. The in-memory tier is a per-process LRU cache, always on, holding a bounded number of plans. The optional distributed tier sits in Redis and is shared across every router pod, so a plan computed by one pod is reusable by all the others. With both tiers enabled the router checks memory first, then Redis, then plans and writes through to both.

It helps to be precise about why this matters. Planning a federated operation means walking the supergraph, deciding which subgraph owns each requested field, ordering the fetches to respect entity dependencies, and parallelising everything that is independent. For a deep query spanning several subgraphs this is real CPU work — often the most expensive single thing the router does per request. The result, however, is purely a function of the operation text and the supergraph schema, so it never needs to be recomputed for an identical operation against an unchanged schema. The plan cache exploits exactly that: pay the planning cost once, amortise it across every subsequent identical request. On a graph serving a bounded operation set, this drives steady-state planning CPU toward zero, leaving the router’s work dominated by the cheap parts — parsing, validation, fetch dispatch, and response merging.

The two tiers solve two different problems. The in-memory tier eliminates re-planning within a single long-lived pod. The distributed tier eliminates re-planning across the fleet and, critically, across pod lifecycles: when a pod restarts or a new one scales up, its in-memory cache is empty, but the Redis tier lets it serve a previously computed plan on the very first request instead of paying the planning cost cold. That is the difference between an autoscale event that smoothly absorbs load and one that briefly spikes CPU and latency on every new pod.

Implementation Walkthrough

The complete plan-caching configuration lives under supergraph.query_planning.cache in router.yaml. The annotated block below enables both tiers and is production-representative.

# router.yaml — query plan caching, annotated
supergraph:
  query_planning:
    # Bound how many "warm-up" plans the router precomputes on a schema reload.
    warmed_up_queries: 100        # re-plan the 100 hottest ops before serving the new schema
    cache:
      in_memory:
        limit: 1024               # max query plans held in this pod's memory (LRU eviction)
      redis:                      # OPTIONAL distributed tier, shared across all pods
        urls:
          - redis://plan-cache.svc.cluster.local:6379
        timeout: 5ms              # fail fast to memory/planning if Redis is slow
        ttl: 24h                  # expire cached plans after a day as a safety valve
        namespace: "router:plans" # key prefix so plans don't collide with other Redis users

Three settings carry most of the weight. in_memory.limit is the per-pod plan budget — size it to your distinct-operation cardinality, not your request volume, because plan reuse is what matters. The redis block turns on the shared tier; its short timeout is deliberate, so a slow Redis degrades gracefully to in-memory planning rather than adding latency to every request. warmed_up_queries controls cache warm-up across schema reloads: when a new supergraph is published, the router re-plans its hottest operations against the new schema before swapping in the new plans, avoiding a planning-CPU cliff at the moment of reload.

For graphs that enforce persisted queries, the persisted query manifest doubles as a warm-up source: because the set of allowed operations is known and finite, the router can compute and cache a plan for every one of them, driving the steady-state hit ratio toward 100%. Persisted query enforcement is the most reliable way to keep the plan cache effective, since it bounds the operation cardinality the cache has to cover.

Sizing the Cache

Set in_memory.limit from the number of distinct operations your clients send, with headroom. If telemetry shows 600 distinct operations across a day, a limit of 1024 keeps essentially all of them resident; a limit of 200 would thrash, evicting hot plans and forcing re-planning. When operation cardinality is unbounded (ad-hoc client-generated queries), no in-memory size fully covers it — the durable fix is persisted query enforcement to bound the set, with the distributed Redis tier absorbing the residual misses across the fleet.

A cached plan is small — kilobytes — so the memory cost of a generous limit is modest, and over-provisioning the in-memory tier is usually the right call. The failure mode to avoid is the opposite: a limit set well below your operation cardinality, which produces constant eviction and re-planning that looks like unexplained CPU and p99 latency. If you are unsure of your cardinality, start with a limit comfortably above your best estimate, watch the hit ratio, and tighten only if memory pressure becomes a real constraint. Cardinality, not request volume, is the only input that matters here: a billion requests across two hundred operations need a cache for two hundred plans.

Verification Steps

Confirm the cache is configured and effective in three steps.

1. Validate the config loads. A malformed cache block fails router startup, so a clean start already confirms the schema is valid.

./router --config router.yaml --supergraph supergraph.graphql --dev

2. Send the same operation twice and watch planning happen once. With debug logging, the first request logs a planning event and the second logs a cache hit.

APOLLO_ROUTER_LOG=debug ./router --config router.yaml --supergraph supergraph.graphql &
Q='{"query":"{ topProducts { name reviews { body } } }"}'
curl -s localhost:4000/graphql -H 'content-type: application/json' -d "$Q" >/dev/null  # plans
curl -s localhost:4000/graphql -H 'content-type: application/json' -d "$Q" >/dev/null  # cache hit

3. Confirm the hit ratio in metrics. The router exposes plan-cache counters via its telemetry exporter. Scrape the Prometheus endpoint and check the cache hit/miss series:

curl -s localhost:9090/metrics | grep query_planning
# expect apollo_router_query_planning_* counters with hits climbing

A healthy steady state shows the hit counter far exceeding the miss counter; a hit ratio that stays low points to an undersized in_memory.limit or unbounded operation cardinality. To prove the distributed tier, restart a pod and confirm it serves cached plans immediately (a Redis hit) rather than re-planning from cold — its planning CPU should not spike on startup.

Common Mistakes & Gotchas

  • Sizing the cache to request volume instead of operation cardinality. Ten million requests across 300 distinct operations need a cache that holds ~300 plans, not ten million. Sizing by traffic wastes memory; sizing by cardinality is correct.
  • No Redis timeout. Omitting redis.timeout (or setting it high) means a degraded Redis adds latency to every request because the router waits on the distributed lookup before falling back. Keep it in the low-millisecond range so the cache stays a pure optimisation.
  • Expecting the cache to survive schema changes for free. A new supergraph invalidates plans keyed on the old schema, so the first requests after a publish re-plan. Configure warmed_up_queries so the router precomputes hot plans against the new schema and avoids a CPU cliff at swap time.

Frequently Asked Questions

Does the query plan cache need Redis?

No. The in-memory tier is always on and sufficient for single-pod or low-cardinality deployments. Add the Redis tier when you run multiple router pods and want them to share warm plans so scaled-up or restarted pods start hot.

What invalidates a cached query plan?

A change to the operation or to the supergraph schema. Plans are keyed on a hash of both, so publishing a new subgraph schema (which recomposes the supergraph) correctly invalidates the affected plans; the ttl on the Redis tier is an additional safety expiry, not the primary invalidation mechanism.

How do persisted queries interact with the plan cache?

They make it far more effective. A persisted query manifest is a finite, known set of operations, so the router can warm a plan for every one of them and keep the steady-state hit ratio near 100%, eliminating planning CPU as a bottleneck.