understudydocs

tutorials

Replace a model on live traffic

The full replacement loop on one call site: isolate it as a workload, collect a baseline, route a 5% slice to a managed model, compare arms on real traffic, and ratchet to 100% — or roll back in one field. Plan on a few days of elapsed time; minutes of actual work.

1 — Isolate the call site

Pick one stable call site — one prompt shape, one job, ideally high-volume and tolerant (classification, extraction, scoring make better first candidates than open-ended generation). Create a workload for it and declare it in your client:

declare the call site
defaultHeaders: {
  "x-understudy-project": "concierge",
  "x-understudy-workload": "ad-relevance",
}

Deploy that header change. From this point the gateway can tell this call site apart from everything else you run.

2 — Baseline

Enable capture on the workload and let real traffic flow for a day or two. You're collecting the incumbent's behavior: inputs, outputs, latency, error rate. If volume is high, set a capture sample rate — a deterministic 10% of a busy call site beats 100% of a quiet afternoon.

3 — Route a 5% slice

On Routing, point the workload at a catalog model and set the traffic slice to 5%. A 5% slice and capture-on-routing are exactly what you want here. Nothing about your application changes; one request in twenty is now served by the candidate.

4 — Compare the arms

Two streams of evidence accumulate, distinguishable everywhere:

  • In captures: each entry's served model says which arm produced it; requested vs served shows the rewrite.
  • In your own logs: x-understudy-route is primary or understudy per response — join it to whatever quality signal you already track (user ratings, downstream task success, manual review).
  • Watch for fallback responses — the candidate failing upstream and your primary silently absorbing it. Occasional is fine; sustained means pause.

5 — Ratchet or retreat

Each step is one field on the Routing page, effective immediately on new requests:

  • Holding up → 5 → 25 → 50 → 100, sized to your traffic volume and risk tolerance. Re-check the comparison at each step; quality issues that hide at 5% surface at 50%.
  • Not holding up → traffic to 0%. Everything returns to the incumbent instantly; the route pointer and all the candidate's captures remain for diagnosis.