Gateway Routing Strategies for Federated APIs

The gateway is the one component that sees every federated operation, decides which subgraph resolves which field, and pays the latency cost of every boundary crossing. Choosing the right routing strategy — how it plans queries, how it discovers subgraphs, how it shapes traffic and forwards context — is what separates a federated graph that feels like a single fast API from one that buckles under fan-out. This guide, part of GraphQL Federation Architecture & Design, covers query planning mechanics, static versus dynamic dispatch, the router.yaml configuration that controls it, the composition checks that keep routing tables valid, and the performance trade-offs you tune at scale.

Prerequisites

Concept Deep-Dive: Query Planning and Execution Flow

The router decomposes one incoming GraphQL operation into a set of subgraph requests and an order to run them in. It parses the operation AST, looks up each field’s owning subgraph in the supergraph metadata, resolves @key directives for any boundary crossings, and emits a query plan — a directed acyclic tree of fetches. Independent branches become parallel fetches; entity dependencies become sequenced _entities fetches. The fewer and shallower those fetches, the lower the latency.

How a plan is built

  1. Root resolution. The planner maps each root field to its owning subgraph and groups root fields by subgraph so a single service answers as much as possible in one call.
  2. Entity fetching. When a selection crosses a boundary, the planner inserts an _entities fetch carrying __typename plus the @key fields — the same stub mechanism described in Designing Cross-Service Type References.
  3. Dependency flattening. Sequential dependencies (B needs a field from A) stay ordered; independent branches are flattened into concurrent requests.

The shape of that tree is everything. Deep dives on collapsing redundant fetches and reordering branches live in Query Plan Optimization Strategies for Federated APIs.

Reading a query plan

Enable verbose planning to audit routing decisions before they reach production:

# Apollo Router — log the generated query plan
APOLLO_ROUTER_LOG=debug,apollo_router::query_planner=trace ./router \
  --supergraph supergraph.graphql \
  --config router.yaml

In the trace, look for Sequence nodes (blocking, sequential fetches — candidates to parallelise), Parallel nodes (concurrent dispatch — good), and Fetch nodes (verify the service points at the subgraph you expect).

The diagram traces one operation through the router and three subgraphs, showing where the plan parallelises and where it must sequence.

Gateway query plan execution flow The client sends an operation to the router, which plans parallel root fetches to the products and reviews subgraphs and a sequenced entities fetch to the accounts subgraph before merging the response. Client Apollo Router query plan products reviews accounts parallel root fetch parallel root fetch sequenced _entities

Routing Patterns: Static vs. Dynamic Subgraph Dispatch

Static routing bakes subgraph endpoints into the supergraph or router.yaml at build time. It is the right default for stable, low-churn environments: endpoints rarely move, and the routing table is reviewable in version control.

Dynamic dispatch resolves subgraph URLs at runtime via service discovery (Consul, Kubernetes DNS SRV records, or a service mesh) with health-aware load balancing. Use it for multi-region deployments, blue/green rollouts, and autoscaling pools where addresses are not known until startup. Clean Type Ownership and Shared Schema Contracts keep routing tables stable regardless of which mode you pick — the router can only route safely if each field has one unambiguous owner.

Region-aware routing. When a request carries x-region: eu-west-1, intercept it in a router Rhai script or coprocessor and route to the regional subgraph pool, falling back to global endpoints when the regional pool is unhealthy:

// router.rhai — set a routing hint a coprocessor consumes at fetch time
fn supergraph_service(service) {
  service.map_request(|request| {
    let region = request.headers["x-region"];
    if region != () {
      request.context["routing_region"] = region;
    }
    request
  });
}

The coprocessor then resolves region-specific subgraph URLs at the fetch layer based on routing_region.

Directive & Config Spec Table

Key / directive Where Syntax Effect Phase
subgraph_traffic_shaping.all.deduplicate_query router.yaml deduplicate_query: true Collapses identical in-flight subgraph requests Runtime
subgraph_traffic_shaping.<svc>.timeout router.yaml timeout: 2s Per-subgraph request timeout Runtime
experimental_retry router.yaml min_per_sec, retry_on_http_errors Bounded retries on retryable subgraph errors Runtime
health_check router.yaml listen, path Exposes router liveness for orchestration Runtime
@provides(fields:) subgraph SDL @provides(fields: "name") Lets a subgraph return external fields inline, removing a fetch from the plan Composition + runtime
@requires(fields:) subgraph SDL @requires(fields: "weight") Forces a pre-fetch of external fields before this field resolves Composition + runtime

Step-by-Step Implementation

Step 1 — Configure declarative routing and traffic shaping

Start with per-subgraph timeouts, dedup, compression, and bounded retries in router.yaml. Per-subgraph timeouts stop one slow service from blocking a wide query.

# router.yaml
supergraph:
  listen: 0.0.0.0:4000

traffic_shaping:
  all:
    deduplicate_query: true
  subgraphs:
    accounts:
      timeout: 2s
      compression: gzip
    inventory:
      timeout: 3s
      experimental_retry:
        min_per_sec: 10
        retry_on_http_errors: true

health_check:
  listen: 0.0.0.0:8088
  enabled: true
  path: /health

Step 2 — Tune the subgraph HTTP client for fan-out

Under concurrent fan-out, default connection pools cause head-of-line blocking and thread starvation. If you run a custom router plugin or a JS gateway data source, set pool sizing and keep-alives explicitly.

use reqwest::Client;
use std::time::Duration;

let client = Client::builder()
    .pool_idle_timeout(Duration::from_secs(90))
    .pool_max_idle_per_host(32)   // >= expected concurrent fetches per subgraph
    .tcp_keepalive(Duration::from_secs(30))
    .connect_timeout(Duration::from_secs(2))
    .timeout(Duration::from_secs(5))
    .build()?;

Step 3 — Forward auth and trace context across hops

The router must propagate Authorization, x-request-id, and W3C trace headers to every subgraph or you lose distributed traces and break authorization. Configure header propagation declaratively:

# router.yaml
headers:
  all:
    request:
      - propagate:
          named: "authorization"
      - propagate:
          named: "x-request-id"
      - propagate:
          matching: "^x-b3-.*$"   # W3C / Zipkin trace context

Authorization decisions that ride on these headers are implemented subgraph-side via directive patterns for cross-service authorization.

Step 4 — Deploy with a reviewed, versioned config

Treat router.yaml and the composed supergraph as deploy artifacts. The full production deployment workflow — image, config, supergraph delivery, and rollout — is covered in Apollo Router Configuration and Deployment.

Composition Pipeline Integration

Routing only works if the supergraph the router loads is valid, so composition is part of the routing story. Compose with Rover and gate subgraph changes with rover subgraph check so no team can publish a schema that breaks field-to-subgraph routing.

# Build the supergraph the router will route against
rover supergraph compose --config supergraph.yaml --output supergraph.graphql

# Gate a subgraph change against the registry before it ships
rover subgraph check "$APOLLO_GRAPH_REF" \
  --name inventory \
  --schema ./inventory/schema.graphql

Wire both into CI as required checks; the end-to-end pipeline is in Federated Schema Validation in CI/CD Pipelines.

Performance & Scale Considerations

Federated routing adds measurable overhead versus a monolith, and the cost is dominated by boundary crossings.

Strategy Latency impact Complexity Best fit
Synchronous fan-out Higher tail latency on slow subgraphs Low Internal tools, low traffic
Async / @defer Better TTFB, graceful degradation High Consumer dashboards, large payloads
Persisted operations Removes parse/validate overhead per request Medium Mobile clients, locked-down APIs
Edge / response caching Near-zero on cache hits High (invalidation) Public read-heavy endpoints

Plan shape beats hardware. A query crossing four subgraphs carries 150–300 ms of baseline network overhead before logic runs; cutting one sequential _entities fetch via @provides or a flatter plan usually beats scaling the subgraphs. Reordering and de-duplicating fetches is the subject of Query Plan Optimization Strategies for Federated APIs.

Connection pools are a real ceiling. Watch active_connections versus idle_connections. If active consistently pins at pool_max_idle_per_host, you are queuing fetches; raise the pool or lower subgraph latency.

Cache the plan and the response. The router caches query plans, so an unbounded operation set defeats the cache — favour persisted operations. Response and entity caching at the router shave the boundary hops entirely; see Caching Strategies for Federated GraphQL.

Failure Modes & Debugging

N+1 subgraph requests / cascading latency. Symptom: a query plan with far more Fetch nodes than boundary crossings, sequential where it could be parallel. Root cause: over-applied @requires, or list fields that fan out per element. Fix: audit the plan in staging, minimise @requires, and confirm _entities batches are not split. Walk plans with APOLLO_ROUTER_LOG=...query_planner=trace.

Hardcoded subgraph URLs break on scale events. Symptom: fetch errors during autoscaling or regional failover. Root cause: static endpoints that no longer resolve. Fix: move to DNS-based discovery or a service mesh and verify the router resolves the new address at startup.

Dropped auth/trace headers across hops. Symptom: broken authorization at subgraphs and orphaned traces with no child spans. Root cause: the router did not propagate Authorization or trace context. Fix: add the headers propagation block from Step 3 and verify spans connect; tracing setup is in Observability and Distributed Tracing in Federation.

Thread starvation under load. Symptom: latency cliff at a concurrency threshold. Root cause: pool_max_idle_per_host below the concurrent fetch count. Fix: raise the pool to at least the expected concurrent subgraph requests, then re-measure.

Frequently Asked Questions

How does the gateway decide which subgraph to route a field to?

It reads the supergraph schema metadata, which records each field’s owning subgraph. During planning it groups the selection set by owner and emits one fetch per subgraph, inserting _entities fetches where the selection crosses a boundary.

What is the latency overhead of federated routing versus a monolith?

The overhead is the inter-service network cost — typically 15–40 ms of added tail latency for a moderate cross-subgraph join, scaling with the number of boundary crossings and RTT. Good plan shape, connection pooling, and caching keep it bounded.

Can I run custom routing logic for canary or A/B traffic?

Yes. Apollo Router supports Rhai scripts and coprocessors. Intercept the request, inspect a header or token, and set context that a coprocessor uses to override the subgraph endpoint — no schema change required.

How do I keep one slow subgraph from failing the whole query?

Set per-subgraph timeout and bounded experimental_retry in traffic_shaping, and return partial data with structured errors for the timed-out branch instead of failing the whole operation.

Should I use Apollo Router or @apollo/gateway?

Apollo Router (Rust) is the production default for throughput and lower latency; the JS gateway is simpler to extend in Node. The trade-offs are compared in Apollo Router vs Apollo Gateway Production Trade-offs.