Apollo Router Configuration and Deployment

The Apollo Router is the single ingress for every operation in a production federated graph, which makes its configuration file the most operationally significant artifact you maintain. A router that is mis-tuned on timeouts, header propagation, or health checks will turn an isolated subgraph hiccup into a graph-wide outage. This guide walks through router.yaml structure, supergraph SDL loading, header propagation, traffic shaping, health checks, deployment topologies, and the rover supergraph compose to router-launch pipeline — everything needed to run the router covered in Federated GraphQL Operations in Production reliably under load.

Prerequisites

How the Router Loads Its Schema

The router needs two inputs: a configuration file (router.yaml) describing how to behave, and a supergraph SDL describing the graph it serves. The SDL can be supplied two ways, and the choice shapes your whole deployment model.

In local mode you pass a composed SDL file on the command line. The router reads it once at startup and, with --hot-reload, watches the file for changes. This mode suits GitOps pipelines and air-gapped environments where the supergraph is a versioned build artifact.

rover supergraph compose --config supergraph.yaml > supergraph.graphql
./router --config router.yaml --supergraph supergraph.graphql --hot-reload

In managed mode the router omits --supergraph and instead authenticates to Apollo’s Uplink, polling for the latest composed supergraph. When any subgraph is published, the registry recomposes and the router picks up the new SDL with no restart.

APOLLO_KEY=service:my-graph:xxxx \
APOLLO_GRAPH_REF=my-graph@prod \
./router --config router.yaml

The diagram below traces a single request through the router’s internal stages so the configuration sections that follow map onto concrete lifecycle steps.

Apollo Router request lifecycle A request enters the router, is parsed and validated, hits the query plan cache, has headers propagated and traffic shaping applied, executes fetches across subgraphs, and is merged into a single response. Request client doc Parse validate Plan cache hit / plan Headers + shaping subgraph fetch subgraph fetch subgraph fetch Merge + respond

router.yaml Structure

The configuration file is a set of top-level sections, each governing one concern. The most load-bearing sections are supergraph, headers, traffic_shaping, health_check, and telemetry. The router validates the file on startup and on hot reload, refusing to start (or refusing to swap config) on a schema-invalid file, so a bad edit fails fast rather than silently. Treat router.yaml as code: version it, review changes, and validate it in CI by starting the router in --dev mode against a test supergraph, because a config typo discovered at deploy time is a deploy-time outage.

Configuration values can also be supplied through environment variables using the ${env.VAR} syntax, which keeps secrets and per-environment values (Redis URLs, OTLP endpoints, allowed origins) out of the checked-in file. This is the idiomatic way to ship one router.yaml across dev, staging, and production while varying only the injected environment. Avoid the temptation to maintain divergent config files per environment; a single file with environment substitution is far easier to reason about and review.

# router.yaml — annotated production baseline
supergraph:
  listen: 0.0.0.0:4000          # public GraphQL endpoint
  introspection: false          # disable schema introspection in production
  query_planning:
    cache:
      in_memory:
        limit: 512              # cached query plans retained in memory

health_check:
  enabled: true
  listen: 0.0.0.0:8088          # separate port so probes bypass app traffic
  path: /health

sandbox:
  enabled: false                # no Apollo Sandbox UI on a production endpoint

cors:
  origins:
    - https://app.example.com   # explicit allow-list, never "*" with credentials

limits:
  max_depth: 15                 # reject pathologically deep client queries
  max_aliases: 30
  http_max_request_bytes: 2000000

Header Propagation

By default the router does not forward client headers to subgraphs. You opt in per header, which is the correct security posture: forward exactly what subgraphs need (auth, locale, tenant) and nothing else. Wildcard propagation leaks headers into subgraph logs and can poison shared cache keys.

headers:
  all:                          # applies to every subgraph
    request:
      - propagate:
          named: authorization  # forward the client's bearer token
      - propagate:
          named: accept-language
      - insert:
          name: x-router-version
          value: "1.x"          # static header added to every subgraph call
  subgraphs:
    products:                   # per-subgraph overrides
      request:
        - propagate:
            named: x-currency

Header propagation is what feeds the claims that subgraph authorization directives rely on; if you enforce access control with Directive Patterns for Cross-Service Authorization, the authorization header propagate rule is mandatory.

Traffic Shaping

traffic_shaping is the section that contains a misbehaving subgraph. It governs timeouts, retries, request deduplication, and concurrency limits, applied either to all subgraphs or per subgraph.

traffic_shaping:
  router:
    global_rate_limit:
      capacity: 1000            # tokens
      interval: 1s
  all:
    timeout: 5s                 # per-subgraph fetch deadline
    deduplicate_query: true     # collapse identical in-flight subgraph fetches
    experimental_retry:
      min_per_sec: 10
      ttl: 10s
      retry_percent: 0.2        # retry budget as a fraction of requests
  subgraphs:
    reviews:
      timeout: 2s               # tighter deadline for a non-critical subgraph
      global_rate_limit:
        capacity: 200
        interval: 1s

Set per-subgraph timeouts below your client-facing SLA so a slow subgraph fails its fetch and returns partial data rather than holding the whole operation open. Retries should carry a budget (retry_percent) so a struggling subgraph is not hammered into a deeper outage.

The mental model for traffic shaping is bulkheading: each subgraph gets its own timeout, rate limit, and retry budget so a failure in one cannot consume the resources the others need. Without it, a federated graph fails as a unit — one slow subgraph backs up the router’s connection pool and degrades every operation, even those that never touch the slow service. With per-subgraph limits, the blast radius of a subgraph incident is contained to the operations that genuinely depend on it, and everything else keeps serving. deduplicate_query complements this by collapsing identical in-flight fetches: when a hot entity is requested concurrently by many client operations, the router issues one subgraph fetch and shares the result, which both reduces subgraph load and smooths latency under bursts. These settings cost nothing in the common case and save you in the incident case, so enable them from day one rather than waiting for the first outage to motivate them.

Configuration Spec Table

Config key Valid values Composition-time vs runtime Effect
supergraph.listen host:port Runtime Bind address for the public endpoint
supergraph.introspection true/false Runtime Allow/deny schema introspection
supergraph.query_planning.cache.in_memory.limit integer Runtime Number of plans held in memory
headers.all.request[].propagate.named header name Runtime Forward a named client header to subgraphs
traffic_shaping.all.timeout duration (5s) Runtime Per-subgraph fetch deadline
traffic_shaping.all.deduplicate_query true/false Runtime Collapse identical concurrent fetches
health_check.enabled true/false Runtime Expose the health endpoint
limits.max_depth integer Runtime Reject queries deeper than the limit
cors.origins list of origins Runtime Browser CORS allow-list

All router.yaml keys are runtime configuration; the supergraph SDL is the composition-time artifact, produced separately by rover supergraph compose and validated against the federation spec before the router ever sees it.

Step-by-Step Deployment

1. Author supergraph.yaml. List each subgraph with its routing URL and schema source.

# supergraph.yaml
federation_version: =2.9.0
subgraphs:
  accounts:
    routing_url: http://accounts:4001/graphql
    schema: { subgraph_url: http://accounts:4001/graphql }
  products:
    routing_url: http://products:4002/graphql
    schema: { subgraph_url: http://products:4002/graphql }

2. Compose the supergraph. Produce the SDL artifact and treat it as a build output.

rover supergraph compose --config supergraph.yaml > supergraph.graphql

3. Write router.yaml. Start from the annotated baseline above and adjust timeouts, CORS, and limits to your environment.

4. Run the router locally to validate. A failed config or composition surfaces immediately.

./router --config router.yaml --supergraph supergraph.graphql --dev

5. Containerise. Bake config and (in local mode) SDL into an image, or mount them.

FROM ghcr.io/apollographql/router:v1.52.0
COPY router.yaml /dist/config/router.yaml
COPY supergraph.graphql /dist/config/supergraph.graphql
ENTRYPOINT ["/dist/router", "--config", "/dist/config/router.yaml", \
            "--supergraph", "/dist/config/supergraph.graphql"]

6. Deploy to Kubernetes with probes. Wire the health endpoint to liveness and readiness probes.

# deployment.yaml (excerpt)
spec:
  containers:
    - name: router
      image: registry.example.com/router:1.52.0
      ports:
        - { containerPort: 4000 }   # GraphQL
        - { containerPort: 8088 }   # health
      livenessProbe:
        httpGet: { path: /health, port: 8088 }
        initialDelaySeconds: 5
      readinessProbe:
        httpGet: { path: /health, port: 8088 }
      resources:
        requests: { cpu: "1", memory: "512Mi" }
        limits:   { cpu: "2", memory: "1Gi" }

Deployment Topologies

For a container/VM deployment, run the router behind a load balancer with at least two instances for availability; the router is effectively stateless (its state is loaded SDL plus rebuildable caches), so instances are interchangeable. Health checks drive the load balancer’s view of which instances are ready, so wire the /health endpoint into the balancer’s target health probe and the router will be drained automatically when it fails. For Kubernetes, run a Deployment with a HorizontalPodAutoscaler keyed on CPU — query planning is the CPU-bound work — and a shared Redis plan cache so scaled-up pods start warm rather than re-planning from cold. In managed federation, every pod independently polls Uplink, so adding pods needs no coordination.

A few topology choices repay attention. Set resource requests and limits explicitly: the router’s memory is predictable, so a tight limit is safe, while CPU should have headroom for planning bursts when the cache is cold. Use a PodDisruptionBudget so a node drain never takes the whole router fleet down at once. Prefer a rolling update strategy with maxUnavailable: 0 and a maxSurge of one or more, so new pods come up and pass readiness before old ones are removed — this is what makes a supergraph rollout zero-downtime in local mode. Run the GraphQL and health ports as separate containerPort entries (and separate Kubernetes Services if you expose them differently) so internal probes never traverse your edge. Finally, place the router close to its subgraphs — same region, ideally same cluster network — because every entity boundary is a subgraph round trip and cross-region hops multiply on deep query plans.

For a multi-region deployment, run an independent router fleet per region, each fanning out to region-local subgraphs, with a global load balancer routing clients to the nearest region. Share nothing across regions except the schema registry; a region-spanning Redis plan cache adds latency that defeats its purpose, so give each region its own plan cache. This keeps the critical path — client to router to subgraph and back — inside one region’s network for the common case.

Composition Pipeline Integration

The router runtime sits downstream of the check-then-publish loop. In CI, validate every subgraph change before it can break the running graph:

# .github/workflows/router-pipeline.yml (excerpt)
jobs:
  check:
    steps:
      - run: rover subgraph check my-graph@prod --name products --schema ./schema.graphql
        env: { APOLLO_KEY: ${{ secrets.APOLLO_KEY }} }
  publish:
    needs: check
    if: github.ref == 'refs/heads/main'
    steps:
      - run: |
          rover subgraph publish my-graph@prod \
            --name products --schema ./schema.graphql \
            --routing-url http://products:4002/graphql
        env: { APOLLO_KEY: ${{ secrets.APOLLO_KEY }} }

In managed mode this publish triggers recomposition and the router fleet hot-swaps the new SDL automatically. In local mode, add a step that runs rover supergraph compose and rolls the router deployment with the new artifact. The check stage draws on the same validation discipline as Schema Validation in CI/CD Pipelines.

Performance & Scale Considerations

Query planning is the router’s dominant CPU cost and is cacheable; an unwarmed plan cache or a high distinct-operation cardinality shows up as elevated planner CPU and p99 spikes on cold pods. Size the plan cache to your operation cardinality and share it via Redis across the fleet — see Configuring Query Plan Caching in the Apollo Router. Beyond planning, deduplicate_query collapses identical concurrent subgraph fetches into one, which is a cheap win for graphs with hot entities. The router itself adds single-digit-millisecond overhead per request when plans are cached; nearly all latency lives in subgraph fetches, which is why per-subgraph timeouts and observability matter more than micro-tuning the router.

Scale the router by CPU, not memory. Planning is CPU-bound and the router uses every core in the process, so a HorizontalPodAutoscaler keyed on CPU utilisation tracks real load well; memory is low and stable, so it rarely triggers scaling. Set the HPA’s target utilisation with headroom for the planning bursts that accompany cache misses, and pair scale-out with a shared plan cache so the new pods are immediately useful rather than CPU-bound on cold planning. Watch the connection pool to subgraphs as a separate scaling signal: if traffic_shaping timeouts are firing, adding router pods will not help — the subgraph is the bottleneck, and the right response is to fix the subgraph or tighten its timeout so its failures stay contained. The discipline is to attribute latency correctly before reaching for a tuning knob: the router is rarely the cause, but it is always the place you see the symptom first.

Failure Modes & Debugging

INTROSPECTION_DISABLED on a client tool. Tools that introspect the schema fail when supergraph.introspection: false. This is intentional in production; point developer tooling at a non-production variant or Apollo Studio instead of re-enabling introspection on the public endpoint.

request timed out subgraph errors under load. When traffic_shaping.all.timeout fires, the router returns a partial response with a timeout error for the affected path. If these spike, the subgraph — not the router — is the bottleneck; confirm with per-subgraph fetch latency in your traces before touching the timeout value.

401 Unauthorized from a subgraph that works in isolation. The authorization header is not being propagated. Add an explicit propagate: { named: authorization } rule under headers.all.request; the router drops all headers by default.

Config swap rejected on hot reload. A --hot-reload config edit that fails validation is logged and ignored; the router keeps running the last good config. Check router logs for the validation error rather than assuming the new config took effect.

Frequently Asked Questions

Can I run the Apollo Router without Apollo Studio or managed federation?

Yes. Pass a locally composed supergraph.graphql with --supergraph. The router needs no Apollo account in this mode; you give up zero-touch updates and metrics-aware checks, which you can replace with your own GitOps recompose-and-roll pipeline.

How do I forward authentication to subgraphs?

Add a propagate: { named: authorization } rule under headers.all.request. The router forwards no client headers by default, so this opt-in is required for any subgraph that reads the bearer token, including subgraphs that enforce authorization directives.

What is the difference between the GraphQL port and the health-check port?

They are intentionally separate so liveness and readiness probes do not consume application request capacity and so you can expose /health internally while the GraphQL endpoint sits behind your edge. Configure the health port under health_check.listen.

Should I enable retries in traffic shaping?

Enable them only with a retry budget (retry_percent) and for idempotent read traffic. Unbudgeted retries amplify load on a struggling subgraph and can turn a brief slowdown into a sustained outage.

How do I roll out a new supergraph without dropping requests?

In managed federation the router hot-swaps the SDL with no restart. In local mode, recompose, build a new image or update the mounted file, and let Kubernetes do a rolling update; readiness probes hold traffic off pods until they have loaded the new SDL.