Query Plan Optimization Strategies for Federated APIs

When a federated query feels slow, the cause is almost never the subgraphs themselves — it is the shape of the query plan the router built to satisfy the operation, and the number of cross-service round trips that plan encodes. This guide shows how the Apollo Router constructs a query plan, how to read one, and how to reshape your schema so the planner emits fewer, more parallel fetches.

This page extends the broader treatment of Gateway Routing Strategies for Federated APIs, narrowing the focus to the execution tree the router produces and the levers you control to make it cheaper.

When to use this pattern

  • A federated operation has acceptable per-subgraph latency, but end-to-end latency is high because fetches run sequentially rather than in parallel.
  • You see N+1 _entities calls in traces — the same subgraph being hit once per parent object instead of once per batch.
  • A single field forces a long dependency chain (subgraph A → B → C) that you suspect could be flattened or parallelised.

Prerequisites

How the router builds a query plan

The query planner runs once per unique operation (then caches the result). It takes the operation AST plus the supergraph schema — which records, for every field, the set of subgraphs that can resolve it — and produces a tree of fetch instructions. That tree is built from a small, fixed vocabulary of node types:

Plan node Meaning Cost implication
Fetch A single GraphQL request to one subgraph (root fields or an _entities call) One network round trip
Sequence Child nodes that must run in order; later fetches depend on earlier results Latency = sum of children — the expensive case
Parallel Child nodes with no data dependency; dispatched concurrently Latency = max of children — the cheap case
Flatten Reshapes a parent result into the representations array that the next _entities fetch consumes No network cost, but marks an entity boundary crossing

The planner’s core job is to assign as many Fetch nodes as possible to Parallel branches and to keep Sequence chains short. A Sequence only appears when a downstream subgraph genuinely needs a value produced upstream — most commonly the @key fields it will receive inside an _entities request. Every Flatten node sitting between two Fetch nodes represents a subgraph hop: the parent’s data is collapsed into a list of entity representations, and the child fetch resolves the missing fields for those representations.

The diagram below contrasts a plan that fans out in parallel with one that has been forced into a sequential chain by avoidable cross-service dependencies.

Parallel versus sequential federated query plans Left: a Parallel node dispatching two independent Fetch nodes at once. Right: a Sequence node forcing three Fetch nodes to run one after another through Flatten boundaries. Parallel plan (fast) Sequential plan (slow) Parallel Fetch: accounts root: me Fetch: products root: topSellers latency = max(A, B) Sequence Fetch: products Flatten Fetch: reviews Flatten Fetch: users latency = A + B + C

Reading a query plan

The fastest way to see what the planner decided is to ask rover to emit the plan, or to enable the planner trace in the router. With a composed supergraph in hand, dump the plan for a specific operation:

# Compose locally, then inspect the plan the router would build
rover supergraph compose --config supergraph.yaml > supergraph.graphql

# Run the operation through the router with planner tracing on
APOLLO_ROUTER_LOG="info,apollo_router::query_planner=trace" \
  ./router --supergraph supergraph.graphql --config router.yaml

A plan is printed as nested nodes. The structure you want to learn to scan is the alternation of Fetch, Flatten, and the two container nodes. Here is a representative plan for a query that asks for a list of products and, for each, its reviews and each review’s author:

QueryPlan {
  Sequence {
    Fetch(service: "products") {          # 1 round trip: get product ids
      { topProducts { __typename id name } }
    },
    Flatten(path: "topProducts.@") {       # reshape products into representations
      Fetch(service: "reviews") {          # 1 round trip: batch _entities for ALL products
        ... on Product { reviews { body author { __typename id } } }
      }
    },
    Flatten(path: "topProducts.@.reviews.@.author") {
      Fetch(service: "accounts") {         # 1 round trip: batch _entities for ALL authors
        ... on User { name }
      }
    }
  }
}

Three things to read off this plan immediately. First, every Fetch is a network round trip, so this operation costs three. Second, the Flatten paths use @ to mean “every element of this list” — that is the signal that the _entities call is batched: the reviews subgraph receives one request carrying every product representation, not one request per product. Third, the whole thing is a Sequence, because the reviews fetch needs product ids and the accounts fetch needs author ids; this is genuine data dependency, not a planner failure.

Reducing cross-service hops

Each Fetch after the first exists because a field lives in a different subgraph than its parent and the planner had to cross an entity boundary. You reduce hops by reducing those boundary crossings.

The most common avoidable hop is a field that could be co-located with its parent but is defined elsewhere. If Product.name is owned by products but a client always also requests Product.brandName, and brandName lives in a catalog subgraph keyed on the same id, the planner must emit a second _entities fetch. Marking the field @shareable so overlapping types resolve locally — and resolving it in the subgraph that already holds the parent — collapses two fetches into one. The general rule: fields that are read together should be resolvable together.

@provides is the targeted tool for this. When subgraph A holds an entity reference and can opportunistically return a field normally owned by subgraph B, annotate the relationship so the planner skips the hop on that path:

extend schema
  @link(url: "https://specs.apollo.dev/federation/v2.9",
        import: ["@key", "@external", "@provides"])

type Query {
  # On this path the planner can read author.name from accounts-cache
  # without a separate _entities fetch into the accounts subgraph.
  latestReviews: [Review!]! @provides(fields: "author { name }")
}

type Review @key(fields: "id") {
  id: ID!
  author: User!
}

type User @key(fields: "id") {
  id: ID! @external
  name: String! @external   # normally owned by accounts; provided here
}

Use @provides only on the specific query paths where the parent subgraph genuinely already has the data. Blanket @provides forces denormalised storage and tends to drift out of sync. The complementary pattern, pulling fields into a resolver via @requires for computed fields, does the opposite — it adds a dependency, so apply it deliberately and watch the plan afterward.

@key selection to minimise entity fetches

The fields you choose for @key shape how many _entities fetches the planner needs and how big each representations payload is. Two guidelines matter most.

Use a single, stable, already-present key. If the parent subgraph already returns id for an entity, key on id. If you instead key on a composite like @key(fields: "sku region"), the parent subgraph must select and carry both fields into every representation, and any subgraph that wants to resolve the entity must be able to produce that exact tuple. Compound keys are legitimate when the entity truly has no single identifier, but they inflate representation size and reduce the planner’s ability to batch.

Keep keys consistent across subgraphs. If products keys Product on id but reviews declares @key(fields: "upc"), the planner cannot reuse one representation set to satisfy both — it may need an extra fetch just to translate identifiers. Aligning keys lets a single Flatten/Fetch pair serve every downstream subgraph.

# products subgraph — owns the entity, defines the canonical key
type Product @key(fields: "id") {
  id: ID!
  name: String!
}

# reviews subgraph — SAME key, so the planner reuses one representation set
extend schema @link(url: "https://specs.apollo.dev/federation/v2.9",
                    import: ["@key", "@external"])

type Product @key(fields: "id") {
  id: ID! @external
  reviews: [Review!]!
}

Batching _entities calls and parallelisable fan-out

The planner batches by default: when the same subgraph must resolve a field for many parent objects, it emits one _entities request whose representations array holds all of them. Your job is to not break that batching, and to make sure your __resolveReference handles the batch efficiently. A reference resolver that issues one database query per representation reintroduces the N+1 problem inside the subgraph even though the router did its part. Resolve references with a per-request batch loader — see Batching Entity Resolution with DataLoader.

import DataLoader from 'dataloader';
import { buildSubgraphSchema } from '@apollo/subgraph';
import { gql } from 'graphql-tag';

// One loader per request; the router sends ALL ids in a single _entities call,
// so a single batched DB query resolves the whole Flatten node.
const resolvers = {
  Product: {
    __resolveReference(ref: { id: string }, ctx: { loader: DataLoader<string, any> }) {
      return ctx.loader.load(ref.id);
    },
  },
};

For fan-out, the goal is the opposite of batching: you want independent branches of the operation to land in a Parallel node. The planner does this automatically when two root fields (or two entity branches) have no data dependency on each other. You preserve that parallelism by not introducing artificial dependencies — for example, do not route a value through a shared subgraph if both branches could fetch it directly. If you see a Sequence where you expected a Parallel, trace the dependency: there is a field on the slow branch whose key or @requires input is produced by the other branch.

Avoiding deep sequential dependencies

A plan with a Sequence four levels deep (A → B → C → D) has latency equal to the sum of four round trips, and no amount of subgraph tuning fixes that — only flattening the dependency does. Deep chains usually come from entity references that themselves reference entities in yet another subgraph. Two mitigations:

Co-locate the hot path. If a client repeatedly walks Product → Review → User → Org, consider whether the leaf field it actually needs (say the org name) can be @provides-ed one or two hops up, removing the deepest fetch from the common path.

Restructure the @key graph so references point at one shared subgraph rather than chaining. If B, C, and D all key on id and can each be reached directly from A’s result, the planner can issue B, C, and D as a Parallel group after A, turning a four-deep Sequence into a two-level Sequence(A, Parallel(B,C,D)).

Verification steps

After any schema change aimed at the plan, recompose and re-inspect before deploying:

# 1. Recompose and confirm the supergraph still composes cleanly
rover supergraph compose --config supergraph.yaml > supergraph.graphql

# 2. Re-run the target operation with planner tracing and diff the node count
APOLLO_ROUTER_LOG="info,apollo_router::query_planner=trace" \
  ./router --supergraph supergraph.graphql --config router.yaml

Confirm in the new plan that: the total Fetch count dropped (or the deepest Sequence got shallower); Flatten paths still end in @ (batching intact); and branches you expected to parallelise now sit under a Parallel node. Pair this with a load test — a lower fetch count should show up as reduced p95 latency, not just a tidier tree.

Common mistakes and gotchas

Optimising the subgraph instead of the plan. Shaving 5ms off a subgraph that sits at the end of a five-deep Sequence barely moves end-to-end latency. Read the plan first; fix the chain, then tune the slow fetch.

Breaking batching with per-parent arguments. If a field takes an argument that varies per parent object, the planner often cannot batch it into one _entities call and falls back to per-object fetches. Prefer resolving such fields from the owning subgraph or restructuring the argument.

Adding @requires without re-reading the plan. @requires is convenient but every use adds a dependency edge that can turn a Parallel into a Sequence. Always recompose and inspect the plan after introducing one.

Frequently Asked Questions

Why does my query plan show a Sequence when the two subgraphs seem independent?

There is a hidden data dependency — usually a @key field or a @requires input that one branch produces and the other consumes. Dump the plan and follow the Flatten path: the field feeding the second fetch tells you which upstream result it waits on. Remove or relocate that dependency and the planner will promote the branches to a Parallel node.

Does the router cache query plans, and does that change optimisation?

Yes. The planner runs once per unique operation signature and caches the resulting tree, so planning cost is amortised across requests. That means your optimisation target is the executed tree (round trips and their parallelism), not planning time. Plan caching is configured separately in the router and is covered under production operations.

How many cross-service hops is too many?

There is no fixed number, but each additional Fetch in a Sequence adds one full subgraph round trip to the critical path. A good rule is to keep the deepest Sequence to two or three fetches on hot operations, and to ensure everything that can be parallel actually sits in a Parallel node.