GraphQL Federation Architecture & Design
GraphQL Federation lets large engineering organisations decompose a monolithic schema into independently owned subgraphs while presenting a single, unified API to clients. This section covers the foundational principles, cross-cutting workflows, and strategic trade-offs required to design a resilient federated graph — from drawing domain boundaries to validating composition in CI/CD and operating a managed federation registry.
Each subgraph is a valid GraphQL service that declares its own domain types and resolvers. A router composes those subgraphs into a single supergraph and plans every incoming query against that composed schema, fetching only the fields each subgraph owns and stitching the results back together. Get the architecture right and teams ship independently against one graph; get it wrong and you build a distributed monolith with all the coupling of the original and none of the simplicity. The guides below — covering defining subgraph boundaries for microservices, designing cross-service type references, gateway routing strategies, resolving schema conflicts, type ownership and shared schema contracts, schema validation in CI/CD pipelines, and schema registry and managed federation — walk through each decision in production-grade detail.
Core Concepts Overview
Federation rests on a handful of ideas that, once internalised, make every downstream decision tractable. This section names the six that matter most and points to the deeper guide for each.
Subgraph ownership and boundaries. A subgraph is the unit of ownership: one team, one deployment cadence, one slice of the domain. Federation succeeds or fails on the clarity of those boundaries. Partition types and resolvers according to data ownership and bounded context, not technical convenience — splitting by “reads vs. writes” produces a distributed monolith. The full treatment lives in defining subgraph boundaries for microservices, which also covers the tactical work of splitting a monolith GraphQL schema into subgraphs.
Entities and cross-service references. An entity is a type whose representation is shared across subgraphs, keyed by one or more fields declared with the @key directive. The owning subgraph defines the canonical type; other subgraphs reference it by stub. This is how an Order in one service can expose a user: User field resolved by an entirely different service. The mechanics, including circular-reference handling, are covered in designing cross-service type references.
Type ownership and shared contracts. Every field needs a single authoritative source. When two subgraphs legitimately contribute the same field, they negotiate a shared contract using @shareable, @override, and @inaccessible. The composition engine validates these contracts before merging, so ownership violations surface at build time. See type ownership and shared schema contracts.
Schema composition and conflict resolution. Composition is the build-time step that merges every subgraph’s SDL into one supergraph schema. When definitions disagree — a field typed Int in one place and Float in another — composition fails with a named error rather than shipping an inconsistent graph. Resolving schema conflicts in Apollo Federation catalogues the conflict classes and their fixes.
Gateway routing and query planning. At runtime the router parses each operation, builds a query plan that decomposes it into per-subgraph fetches, executes those fetches in dependency order, and assembles the response. Routing configuration governs timeouts, batching, and how the plan is executed. The detail lives in gateway routing strategies for federated APIs.
Governance, validation, and the registry. A graph with many owners needs guardrails: composition checks on every pull request, a registry that tracks which subgraph version is live, and an approval workflow for breaking changes. Schema validation in CI/CD pipelines covers the pipeline mechanics, and schema registry and managed federation covers operating the registry that ties it all together.
How the concepts compose
These six ideas are not independent; they form a dependency chain that runs through every section below. Ownership is the root: until you have decided which subgraph owns a type, you cannot declare its @key, and without a @key there is no entity to reference across services. Once ownership is settled, cross-service references become the load-bearing structure of the graph — a single Order query can pull a User from accounts, a Product from the catalog, and a Review from a fourth service, and the only reason that works is that each owning subgraph published a key and a reference resolver.
Shared contracts are the controlled exception to single ownership. Most fields have exactly one owner, but a handful — a price computed identically in two places, an identifier two domains both expose — are legitimately shared, and the contract directives let you say so explicitly rather than letting composition guess. Composition is the gate that holds the whole arrangement honest: it refuses to build a supergraph in which two subgraphs disagree about a type, so an ownership mistake fails the build instead of corrupting a response. Routing is where the design meets traffic, turning the static supergraph into a per-query execution plan. Governance wraps all of it, ensuring the people changing the graph cannot break it for everyone else. Hold this chain in mind as you read; every guide in this section is a deeper look at one link.
Architecture Diagram
The diagram below shows the three planes of a federated graph: the design-time composition pipeline that produces the supergraph SDL, the runtime router that plans and executes queries, and the subgraphs that own the data.
Read the diagram in two passes. The solid purple edges are the runtime path: a client sends one operation to the router, the router consults the query plan it derived from the supergraph SDL, and it fans out _entities fetches to whichever subgraphs own the requested fields. The dashed pink edges are the design-time path: each subgraph publishes its SDL, the CI/CD pipeline runs rover supergraph compose to merge and validate them, and the resulting supergraph is published to the registry that the router loads. The two planes are deliberately decoupled — composition can fail in CI without taking the running router down, which is exactly the property that lets teams deploy independently.
The single most important thing the diagram encodes is that the client never knows the topology. It sends one query to one endpoint and receives one response; the fan-out to accounts, products, and reviews is entirely the router’s concern. This is what distinguishes federation from a hand-rolled API gateway that proxies to backend services: the router does not just forward requests, it plans them. Given a query that asks for an order, its products, and each product’s reviews, the planner works out that it must first fetch the order, collect the product keys it returns, batch-fetch those products, collect the review keys, and batch-fetch the reviews — a four-stage plan derived purely from the supergraph SDL and the @key directives, with no per-service routing code anywhere. When you change a boundary, you change the plan; when you add a subgraph, the planner incorporates it on the next composition. That indirection is the whole value proposition, and every architectural decision in the sections below is ultimately about shaping the plans the router will produce.
Key Directives & Config Reference
These are the directives and configuration touchpoints you will reach for most often. Every subgraph imports the directives it uses through the federation spec link; the router and composition pipeline are configured through router.yaml and supergraph.yaml respectively.
| Directive / key | Where | Effect |
|---|---|---|
@key(fields: "id") |
subgraph SDL | Declares the type an entity and names the fields that identify it across subgraphs. |
@shareable |
subgraph SDL | Marks a field that more than one subgraph may resolve with identical logic. |
@external |
subgraph SDL | Declares a field as owned elsewhere; this subgraph references but does not resolve it. |
@requires(fields: "…") |
subgraph SDL | Asks the router to fetch named external fields before resolving this field. |
@provides(fields: "…") |
subgraph SDL | Hints that this resolver can return the named fields, letting the planner skip a hop. |
@override(from: "svc") |
subgraph SDL | Migrates ownership of a field from another subgraph without breaking clients. |
@inaccessible |
subgraph SDL | Hides a field from the public API surface during transitional states. |
@tag(name: "…") |
subgraph SDL | Attaches metadata used by contracts and the registry. |
federation_version |
supergraph.yaml |
Pins the composition algorithm version, e.g. =2.9.0. |
routing_url |
supergraph.yaml |
The endpoint the router calls for each subgraph. |
traffic_shaping |
router.yaml |
Per-subgraph timeouts, retries, and deduplication. |
Canonical Implementation Pattern
A production-representative subgraph declares the federation link, marks its entities with @key, and implements __resolveReference so the router can hydrate a partial entity from a key. The pattern below is the smallest thing that is genuinely correct under Federation v2.
import { ApolloServer } from '@apollo/server';
import { startStandaloneServer } from '@apollo/server/standalone';
import { buildSubgraphSchema } from '@apollo/subgraph';
import { gql } from 'graphql-tag';
const typeDefs = gql`
extend schema
@link(url: "https://specs.apollo.dev/federation/v2.9",
import: ["@key", "@shareable"])
type Product @key(fields: "id") {
id: ID!
name: String!
price: Float! @shareable # also resolved by the catalog subgraph, identical logic
}
type Query {
product(id: ID!): Product
}
`;
const resolvers = {
Query: {
product: (_: unknown, { id }: { id: string }) => fetchProductById(id),
},
Product: {
// The router calls this with a key when another subgraph references a Product.
__resolveReference(ref: { id: string }) {
return fetchProductById(ref.id);
},
},
};
const server = new ApolloServer({
schema: buildSubgraphSchema({ typeDefs, resolvers }),
});
const { url } = await startStandaloneServer(server, { listen: { port: 4001 } });
The owning subgraph is the only place Product is fully defined. Any other subgraph that needs to attach fields to Product references it with a @key stub and its own contributions — the deeper mechanics of that reference, including @external, @requires, and @provides, are covered in designing cross-service type references. For production routing you compose the supergraph and launch the router against the result:
rover supergraph compose --config supergraph.yaml > supergraph.graphql
./router --config router.yaml --supergraph supergraph.graphql
Pass only the minimum required key fields in entity representations. The router discards extra fields, but transmitting them wastes bandwidth and increases serialization latency on every cross-subgraph hop.
The pattern generalises to multi-key entities and to types that several subgraphs extend. A Product might be keyed by both id and sku so that subgraphs holding only one identifier can still reference it; you declare both with repeated @key directives and implement a __resolveReference that accepts either shape. When a second subgraph wants to attach a field to Product — say a reviews service adding Product.reviews — it does not redefine the whole type. It references Product with a matching @key stub, marks the key field @external, and contributes only its own fields:
# reviews subgraph: extends Product without owning it
extend schema
@link(url: "https://specs.apollo.dev/federation/v2.9",
import: ["@key", "@external"])
type Product @key(fields: "id") {
id: ID! @external # owned by the products subgraph
reviews: [Review!]! # contributed by this subgraph
}
type Review @key(fields: "id") {
id: ID!
rating: Int!
body: String!
}
This is the everyday shape of a federated graph: each subgraph is small, owns its slice, and extends shared entities through keys rather than copying them. The discipline that keeps it coherent is that exactly one subgraph resolves any given field, enforced at composition time.
Cross-Section Integration Points
Architecture decisions made here ripple straight into how subgraphs are implemented and how the graph is operated. The boundary you draw in defining subgraph boundaries for microservices dictates which types become entities, which in turn dictates the entity resolvers each team writes — covered across subgraph implementation and entity resolution, including implementing entity resolvers with @key directives. A type-ownership decision to share a field with @shareable shows up later as a reference-resolver performance question, because every shared boundary is a potential extra network hop.
The same continuity extends to operating the graph. The routing strategy you pick determines query-plan shape, and that shape is what you observe, cache, and tune in production — the subject of federated GraphQL operations in production. Composition correctness designed here is enforced there at deploy time, and the registry workflow in schema registry and managed federation is the seam where design-time governance meets runtime rollout.
Common Failure Modes & Composition Errors
Most federation incidents trace back to a small set of recurring mistakes. Each one surfaces with a recognisable signature.
Overusing @shareable on mutable fields. Two subgraphs resolving the same field from different data sources will return inconsistent results depending on which one the planner picks. Only share fields whose resolver logic is identical and idempotent; anything stateful belongs to a single owner. The conflict-resolution rules are detailed in resolving schema conflicts in Apollo Federation.
Field type mismatches across subgraphs. When the same field is declared Int! in one subgraph and Float in another, composition fails with Field "Type.field" has incompatible types across subgraphs. The fix is to agree on a single type in a shared contract before publishing.
Missing __resolveReference. A subgraph that declares an entity with @key but never implements the reference resolver causes the router to return null for every cross-subgraph fetch of that type — with no composition error, because the gap is at runtime, not build time.
Deeply nested cross-service traversals. Each @key boundary the planner crosses adds a network hop. A query that walks four entities across four subgraphs produces a four-stage plan; flatten the schema or use @provides where the data is co-located.
Treating the router as a business-logic layer. Authorization, transformation, and aggregation belong in subgraphs, not in router coprocessors. Pushing logic into the router recreates the monolith you federated to escape and makes the router a single point of contention.
Uncoordinated changes to shared contracts. A breaking change to a @shareable field or a shared enum that ships from one subgraph without the others updating produces cascading composition failures at deploy time — or worse, an inconsistent graph if the registry is not enforcing checks. Version shared types deliberately and require composition checks on every publish; this is precisely what the governance section guards against.
Each of these has a clean signature once you have seen it: a null field that never errors points at a missing reference resolver; a composition error naming a type points at an ownership or contract conflict; a latency cliff on one query path points at a deep cross-service traversal. Building that pattern-recognition is most of what separates a team that operates federation comfortably from one that fights it.
CI/CD & Tooling Integration
The Rover CLI is the connective tissue between subgraph repositories and the supergraph. Three commands do most of the work. rover subgraph check validates a proposed subgraph SDL against the registered graph and reports both composition errors and operation-breaking changes; run it on every pull request. rover subgraph publish registers a new subgraph version and triggers recomposition. rover supergraph compose builds the supergraph SDL locally for tests or for routers that load from a file rather than the registry.
# In a pull-request check: fail the build on breaking changes or composition errors.
rover subgraph check my-graph@prod \
--schema ./products.graphql \
--name products
# After merge: publish the new subgraph version and recompose.
rover subgraph publish my-graph@prod \
--schema ./products.graphql \
--name products \
--routing-url https://products.svc.internal/graphql
Wiring these into a pipeline is the subject of schema validation in CI/CD pipelines, and operating the registry that check and publish talk to — including approval gates for breaking changes — is covered in schema registry and managed federation.
Two pipeline disciplines matter regardless of which CI system you use. First, rover subgraph check must run against the live graph variant, not a stale local copy, so it sees the real operation traffic and the real schemas of the other subgraphs — a check that does not know what clients actually request cannot tell whether a field removal is breaking. Second, check and publish are deliberately separate: check is read-only and belongs on the pull request, while publish mutates the registry and belongs after merge, behind whatever approval the graph requires. Collapsing them lets an unreviewed change reach the registry. For graphs with many subgraphs, parallelise the per-subgraph checks but serialise publishes, because each publish recomposes and you want a stable base for the next check.
Decision Guide
Use the table below to choose between the major architectural options. None of these is universally correct; each row names the condition that should tip the decision.
| Decision | Choose A when… | Choose B when… |
|---|---|---|
| Federation v1 vs v2 | (v1) maintaining a legacy graph you cannot recompose yet | (v2) any greenfield or actively maintained graph — v2 is the standard |
Apollo Router vs @apollo/gateway |
(gateway) local development or a tiny graph | (router) any production traffic; the Rust router is faster and supports advanced routing |
@shareable vs single owner |
(shareable) identical, idempotent resolver logic in both subgraphs | (single owner) any field backed by mutable state or a distinct data source |
| Compose from file vs registry | (file) a single team and a simple deploy | (registry) multiple owners needing managed federation and approval workflows |
| Denormalise at boundary vs strict joins | (denormalise) read-heavy paths tolerating eventual consistency | (joins) real-time freshness with DataLoader batching to control N+1 |
The recurring theme across these rows is that federation’s defaults are right for a multi-team production graph and wrong for a small one. v2, the router, single ownership, the registry, and strict joins all assume independent teams, real traffic, and a need for governance; if you do not have those, you are paying overhead for properties you will not use. Conversely, once you do have them, fighting the defaults — composing from a file with three teams publishing, or pushing logic into the router to avoid a hop — costs more than it saves. The art is matching the architecture to where the organisation actually is, and revisiting the choice as it grows. A graph that started life composed from a file with one team will, if it succeeds, eventually need managed federation; planning the SDL and the boundaries so that transition is a configuration change rather than a rewrite is the mark of a design that anticipated its own scale.
Frequently Asked Questions
When should an organisation choose GraphQL Federation over a monolithic schema?
Federation pays off when multiple autonomous teams own distinct domains with independent deployment cycles, or when a single schema codebase has become a bottleneck for development velocity. A small team with one deploy pipeline rarely needs it — the composition and routing overhead is real, and a well-modularised monolithic schema is simpler to operate.
How does the federation router handle N+1 query problems?
The router’s query planner batches all entity references of the same type into a single _entities fetch per subgraph per execution stage. N+1 patterns inside a subgraph are the subgraph’s own responsibility, typically solved with DataLoader. The deeper treatment is in optimizing reference resolvers for performance.
Can different teams use different GraphQL server implementations?
Yes. The Federation specification defines the subgraph contract — _service, _entities, and the directive set. Any server that correctly implements that contract, regardless of language or framework, composes into the supergraph alongside the others.
What is the impact of schema composition on deployment velocity?
Correct composition pipelines enable independent subgraph deployments. Breaking changes are caught at the CI stage by rover subgraph check, so teams ship features without coordinating global releases. The trade-off is that you must keep the check fast and the registry authoritative.
How do I migrate an existing graph from Federation v1 to v2?
Replace v1 extend type syntax with the v2 @link schema extension that imports the directives you use, then recompose. v2 changes the semantics of value-type sharing — fields shared across subgraphs now require @shareable — so expect composition to surface previously implicit conflicts. Migrate one subgraph at a time and gate each step on rover subgraph check.
Where does authorization belong in a federated graph?
In the subgraphs that own the data, expressed with directives like @authenticated, @requiresScopes, and @policy, not in the router as a catch-all gate. Field-level authorization patterns are covered under subgraph implementation and entity resolution. The router should propagate identity and headers, not make domain authorization decisions.
What is “managed federation” and do I need it?
Managed federation moves the supergraph out of a static file and into a registry the router polls, so subgraphs can be published and recomposed without redeploying the router. You need it once more than one team publishes to the graph; until then, composing from a file is simpler. See schema registry and managed federation.
Related
- Defining Subgraph Boundaries for Microservices
- Designing Cross-Service Type References
- Gateway Routing Strategies for Federated APIs
- Resolving Schema Conflicts in Apollo Federation
- Type Ownership and Shared Schema Contracts
- Schema Validation in CI/CD Pipelines
- Schema Registry and Managed Federation
- Subgraph Implementation & Entity Resolution — cross-section guide on implementing the entities you design here
- Federated GraphQL Operations in Production — cross-section guide on operating the graph at runtime