concepts
Routing
Routing shifts a workload's traffic onto a model Understudy serves — gradually, observably, and reversibly. Your code keeps calling the same API with the same model name; the gateway decides per request which arm actually serves it.
What a route is
A route attaches two values to a workload: a public model_id from the catalog, and a route_traffic_pctfrom 0 to 100. At request time the gateway sends that percentage of the workload's traffic to the routed model and the remainder to your primary provider, untouched.
- New routes default to a full cutover (100%) — omit
route_traffic_pctand all matching traffic moves to the routed model. Pass a smaller value (e.g. 5%) to canary on live traffic first. - 0% is a clean pause: the route pointer is kept but all traffic goes primary — useful for rollback without losing the configuration.
- Clearing the route (
model_id: null) returns the workload to pure passthrough.
Deterministic splitting
The percentage split hashes the request id, not a coin flip. A retried request lands on the same arm it landed on the first time — so retries never flip-flop between models mid-incident, and an arm's captures are a consistent sample.
The catalog arm
Separate from configured routes: when a managed-mode request names a catalog model directly in body.model, the gateway resolves and serves it on the spot — no operator setup, no route required. This arm reports routed: false in observability because you chose the model, and it deliberately has no fallback.
Fallback
When a routedcall fails upstream — 5xx, 429, or timeout — the gateway retries the request against your primary provider, so an experiment's instability never becomes your outage. The response then reports x-understudy-route: fallback. Catalog-arm requests are the exception: a model you explicitly named either serves or errors, and is never silently substituted.
The three outcomes
| x-understudy-route | meaning |
|---|---|
primary | Served by the provider your code called — passthrough, or the unrouted share of a split. |
understudy | Served by the routed or catalog model from managed supply. |
fallback | A routed attempt failed and the request was re-served by your primary provider. |