Apollo Studio Schema Checks for Managed Federation

Schema checks run rover subgraph check against a registered graph variant to validate a proposed subgraph change for both composition safety and real-traffic impact before it is ever published. They are the gate that sits in front of every publish in Schema Registry and Managed Federation, turning “did this break composition?” and “did this break a live client?” into a single CI step.

When to use this pattern

  • You publish subgraphs to a managed registry and need to block breaking changes on pull requests before merge.
  • You want breaking-change verdicts scored against actual production traffic, not just static SDL diffing.
  • Multiple teams change overlapping types and you need composition validated across all subgraphs, not one in isolation.

Prerequisites

What a check actually validates

A single rover subgraph check runs two distinct analyses, and conflating them is the most common source of confusion.

Composition checks answer: if I publish this subgraph SDL, does the whole supergraph still compose? The registry merges your proposed SDL with every other subgraph’s current SDL for that variant and runs composition. This catches cross-subgraph problems an isolated schema review never could — an unshared overlapping field, a @key mismatch on a shared entity, a type-signature drift between subgraphs. Composition checks are about the graph, not just your service.

Operation (usage) checks answer: does this change break any operation clients are actually sending? The registry takes the set of operations recorded from real router traffic over a configured time window and replays them against the proposed schema. If a removed field, narrowed type, or tightened nullability would break a recorded operation, that operation is flagged. This is the difference between a theoretical breaking change and a real one — removing a field nobody queries is harmless; removing a field on the hot path is an incident.

A change can pass composition but fail operation checks (you removed a still-queried field), or pass operation checks but fail composition (your SDL is fine alone but conflicts with a sibling subgraph). Both must pass to be safe.

Breaking change detection from real traffic

The traffic-based model is what makes managed checks far more useful than a naive SDL diff. Every router streams operation signatures and field usage back to the registry. When you run a check, the registry knows exactly which fields are queried, by how many clients, and how often, within your configured window. A field removal is reported as breaking only if recorded operations reference it.

This lets you make changes that a static differ would always reject. If Product.legacyRating has zero requests in the last 90 days, removing it passes the operation check even though it is technically a deletion — the registry confirms no live client depends on it. Conversely, a field you think is unused but that one mobile client still queries will be caught with the exact operation name attached.

Annotated CLI + check configuration

The block below shows a rover subgraph check invocation with the check window and exclusions configured. In managed federation, check configuration (time range, excluded operations, severity thresholds) is set per variant in Apollo Studio under the variant’s Checks settings; the CLI consumes whatever the variant defines.

# --- rover subgraph check: the CI gate before publish ---------------------
# Validates ./schema.graphql against the live baseline for my-graph@production.
# Runs BOTH composition (vs sibling subgraphs) and operation (vs real traffic) checks.
rover subgraph check my-graph@production \
  --name products \                 # which subgraph in the variant to check
  --schema ./products/schema.graphql \  # the PROPOSED SDL under review
  --format json                     # machine-readable output for CI gating
# Exit code is non-zero if any check finds a FAILURE-severity change,
# so CI can simply rely on `set -e` / the step failing the build.

env:
  APOLLO_KEY:       service:my-graph:xxxxxxxxxxxx   # graph:read is enough for checks
  APOLLO_GRAPH_REF: my-graph@production             # check against the PROD baseline
# --- Variant Checks configuration (set in Apollo Studio, applied to checks) ---
# Conceptual representation of the per-variant check settings the CLI honors.
operation_checks:
  # Look at operations seen in the last 14 days when scoring breaking changes.
  time_range: 1209600          # seconds (14 days)
  # Ignore operations sent fewer than this many times — filters out noise/probes.
  query_count_threshold: 5
  # Or ignore operations below this % of total volume.
  query_count_threshold_percentage: 0.5
  # Exclude internal/synthetic clients from breaking-change scoring.
  excluded_clients:
    - "synthetic-canary"
    - "schema-test-harness"
  # Never let these specific operations block a check (e.g. deprecated dashboards).
  excluded_operations:
    - "DeprecatedAdminReport"

The two levers that matter most are time_range (how far back traffic is sampled — too short and you miss low-frequency-but-critical clients; too long and you can never retire anything) and excluded_operations (an explicit allowlist for changes you have decided to make despite a known consumer, typically after a deprecation window). Tuning these is how you balance velocity against safety.

Wiring checks into CI

Run the check on every pull request that touches a subgraph schema, against the variant whose baseline you intend to publish to. Always check against the same variant you will publish to — checking against @staging then publishing to @production validates the wrong baseline.

name: Subgraph Schema Check
on:
  pull_request:
    paths: ['products/schema.graphql']

jobs:
  schema-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install Rover
        run: |
          curl -sSL https://rover.apollo.dev/nix/latest | sh
          echo "$HOME/.rover/bin" >> $GITHUB_PATH
      - name: Run composition + operation checks
        run: |
          rover subgraph check my-graph@production \
            --name products \
            --schema ./products/schema.graphql
        env:
          APOLLO_KEY: ${{ secrets.APOLLO_KEY }}

Rover posts the check result back to Apollo Studio, which can surface it as a status on the PR. The check details page lists each flagged change with severity, the operations affected, and the clients sending them — the same data your reviewers need to decide whether to proceed. This complements the broader CI patterns in Schema Validation in CI/CD Pipelines and the deep dive in Federated Schema Validation in CI/CD Pipelines.

Verification steps

  1. Make a deliberately breaking change (remove a queried field) and run the check — confirm it reports a FAILURE with the affected operation names.
  2. Add a new optional field instead and re-run — confirm the check passes with the change classified as additive.
  3. Inspect the check details in Apollo Studio and verify the operation list reflects your real traffic window.
  4. Confirm the CI step’s exit code is non-zero on failure so the PR is actually blocked, not just annotated.

Expected output on a breaking change resembles:

Compared 3 schema changes against 1240 operations over the last 14 days.
✘ FAILURE  FIELD_REMOVED  Field `Product.legacyRating` was removed
   Affected operations: ProductCard (mobile-ios, 412 requests)
error: This check failed; see https://studio.apollographql.com/...

Common mistakes & gotchas

Checking against the wrong variant. Running the check against @staging while you publish to @production validates a baseline you are not shipping against. Pin APOLLO_GRAPH_REF to the publish target.

Empty traffic window passing everything. A brand-new variant or one with no router usage has no operations to score against, so operation checks pass trivially — even for genuine breaking changes. Composition still runs, but do not trust operation results until routers are reporting real traffic.

Treating excluded_operations as permanent. Excluding an operation to ship a change suppresses the signal forever. Pair every exclusion with a deprecation plan and remove it once the consumer migrates, or the gate quietly rots.

Frequently Asked Questions

What is the difference between a composition check and an operation check?

A composition check verifies the supergraph still composes when your proposed subgraph SDL is merged with all sibling subgraphs — it catches cross-service conflicts. An operation check replays real recorded client operations against the proposed schema to detect breaking changes that actually affect traffic. rover subgraph check runs both, and both must pass.

How does Apollo know which changes are breaking versus safe?

It scores proposed changes against operations recorded from live router traffic within the variant’s configured time_range. A removed or narrowed field is only flagged as breaking if recorded operations reference it, so unused fields can be removed without failing the check while still-queried fields are caught with the exact affected operations and clients.

Can I exclude certain operations or clients from breaking-change detection?

Yes. In the variant’s Checks configuration you can set excluded_operations, excluded_clients, and count/percentage thresholds so low-volume noise or synthetic traffic does not block a check. Treat exclusions as temporary and tie them to a deprecation plan.