You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tech debt: PR stage pollutes the dev Foundry project with candidate prompt-agent versions
Context
For Foundry Prompt Agent flows, the PR gate workflow
(agentops-pr-prompt-agent.yml) stages an ephemeral candidate inside the dev Foundry project so that Cloud Evals can score
exactly the prompt the PR is proposing.
The staging step (agentops.pipeline.prompt_deploy stage) calls client.agents.create_version(agent_name="travel-agent", body={...})
against the dev project's endpoint, creating a new numbered version
of the agent (e.g. travel-agent:3, :4, :5, …).
This is intentional and the template comments call it out:
# Each PR run creates or reuses a candidate version in the dev# Foundry project. AgentOps deduplicates only when the prompt is# byte-identical to the current seed version's instructions; PR# candidates can therefore accumulate over time and may need to be# cleaned up out-of-band.
Why this is ugly
Pollution. Every PR with a prompt change creates a new numbered
version in dev that lives forever until someone deletes it by
hand (Foundry portal or SDK).
Auditability. Opening the dev project's Agents → Versions view
shows a mix of "deployed versions of record" and "abandoned PR
candidates". You cannot tell them apart without cross-referencing foundry-agent.json artifacts.
Risk for naive consumers. Any downstream app that resolves travel-agent by "latest published version" (instead of pinning
via foundry-agent.json) can accidentally pick up an un-merged
candidate.
Conceptual smell. "Creating something in the shared dev
project before the PR is approved" goes against the mental model
most teams have for environment isolation.
Why we do it anyway (today)
The Foundry Prompt Agent API has no notion of draft/ephemeral
versions — only persistent, numbered ones. Cloud Evals API needs an
addressable agent: name:version reference. The dev project is the
only place where staging gives a high-fidelity preview of how the
prompt will actually run (same model deployment, same content
safety, same network rules, same RBAC).
Mitigations explored and discarded so far:
Option
Why discarded today
Stage in the author's sandbox project
Each developer has their own; CI cannot pick one; sandbox env can diverge from dev → eval result unrepresentative.
Use a dedicated *-pr-staging Foundry project
More cost / RBAC / drift between staging and dev → eval result less faithful unless staging is kept identical to dev anyway.
Skip staging, evaluate the seed version already in dev
Eval no longer tests what the PR changed — defeats the gate.
Scheduled cleanup workflow. Ship a generated agentops-cleanup-candidates.yml (cron, weekly) that lists travel-agent:* versions in dev, cross-references the PR they
came from (via the version's metadata / git-sha tag we already
write), and deletes candidates whose PR is closed/merged + older
than N days. Keeps the current architecture; just stops the
accumulation.
Tag candidates explicitly in Foundry. When stage creates a
version, add a metadata tag like agentops:candidate=true plus agentops:pr=#<number> so portal viewers can filter, and
downstream consumers can refuse to resolve to a candidate.
Dedicated PR-staging Foundry project. Add a new environment
tier (pr-staging) between sandbox and dev. Generator gains a --stage-env option. Higher operational cost and risk of drift,
but conceptually clean.
Foundry product ask. Push Foundry team for a first-class
"draft / preview version" concept on Prompt Agents that does not
consume the version number sequence.
Acceptance criteria (for the next slice of work, whatever direction
we pick)
tutorial-prompt-agent-quickstart.md no longer needs the
"candidates can accumulate" caveat; the chosen mechanism handles it.
A user who has run 10 PRs in a row sees at most 1 candidate version
in dev's portal at any given time (or none, if we go to a separate
staging project).
Any consumer that resolves travel-agent to a candidate version by
mistake gets a clear signal (tag, refusal, or "not deployed of
record" status).
References
Template: src/agentops/templates/workflows/agentops-pr-prompt-agent.yml
(lines 13-19 spell out the known limitation).
Tech debt: PR stage pollutes the dev Foundry project with candidate prompt-agent versions
Context
For Foundry Prompt Agent flows, the PR gate workflow
(
agentops-pr-prompt-agent.yml) stages an ephemeral candidateinside the dev Foundry project so that Cloud Evals can score
exactly the prompt the PR is proposing.
The staging step (
agentops.pipeline.prompt_deploy stage) callsclient.agents.create_version(agent_name="travel-agent", body={...})against the dev project's endpoint, creating a new numbered version
of the agent (e.g.
travel-agent:3,:4,:5, …).This is intentional and the template comments call it out:
Why this is ugly
version in dev that lives forever until someone deletes it by
hand (Foundry portal or SDK).
shows a mix of "deployed versions of record" and "abandoned PR
candidates". You cannot tell them apart without cross-referencing
foundry-agent.jsonartifacts.travel-agentby "latest published version" (instead of pinningvia
foundry-agent.json) can accidentally pick up an un-mergedcandidate.
project before the PR is approved" goes against the mental model
most teams have for environment isolation.
Why we do it anyway (today)
The Foundry Prompt Agent API has no notion of draft/ephemeral
versions — only persistent, numbered ones. Cloud Evals API needs an
addressable
agent: name:versionreference. The dev project is theonly place where staging gives a high-fidelity preview of how the
prompt will actually run (same model deployment, same content
safety, same network rules, same RBAC).
Mitigations explored and discarded so far:
*-pr-stagingFoundry projectmodel:gpt-4o-miniPossible directions
In rough order of cost-vs-benefit:
agentops-cleanup-candidates.yml(cron, weekly) that liststravel-agent:*versions in dev, cross-references the PR theycame from (via the version's metadata / git-sha tag we already
write), and deletes candidates whose PR is closed/merged + older
than N days. Keeps the current architecture; just stops the
accumulation.
stagecreates aversion, add a metadata tag like
agentops:candidate=trueplusagentops:pr=#<number>so portal viewers can filter, anddownstream consumers can refuse to resolve to a candidate.
tier (
pr-staging) between sandbox and dev. Generator gains a--stage-envoption. Higher operational cost and risk of drift,but conceptually clean.
"draft / preview version" concept on Prompt Agents that does not
consume the version number sequence.
Acceptance criteria (for the next slice of work, whatever direction
we pick)
tutorial-prompt-agent-quickstart.mdno longer needs the"candidates can accumulate" caveat; the chosen mechanism handles it.
in dev's portal at any given time (or none, if we go to a separate
staging project).
travel-agentto a candidate version bymistake gets a clear signal (tag, refusal, or "not deployed of
record" status).
References
src/agentops/templates/workflows/agentops-pr-prompt-agent.yml(lines 13-19 spell out the known limitation).
src/agentops/pipeline/prompt_deploy.py:312-333(
_create_agent_version→client.agents.create_version).docs/tutorial-prompt-agent-quickstart.mdstep 13.conversation (PO walked through the mental model with the
workflow runner output from a live recording).