Introduction: why “safe co‑execution” matters

Agent co‑execution pairs autonomous agents with human judgment. Done well, it increases leverage without sacrificing control. Done poorly, it amplifies risk. This guide enumerates concrete UX patterns that make agent actions transparent, reversible, auditable, and human‑steerable—grounded in Zypsy’s work with AI‑driven products and our published transparency principles across data, code, transactions, and events. See: Data Transparency, Code Transparency, Transactions, and Smart‑Contract Events.

Design objectives and safety guarantees

Reversibility: every agent mutation is versioned and roll‑backable.
Verifiability: actions, inputs, and outputs are inspectable with provenance.
Consent and scope: users grant minimum necessary capabilities with timeboxing.
Feedforward clarity: preview planned changes before execution (Transactions).
Fault containment: rate limits, timeouts, and safe‑mode escalation paths.
Continuous audit: immutable event logs with human‑readable diffs and who/what/when context (Events).

Patterns catalog (at a glance)

Pattern	Problem it solves	Key UX affordances	Risk mitigated
Plan Preview (Feedforward)	Users can’t see what the agent will do	Natural‑language plan + structured diff, estimated blast radius, dependencies	Surprise changes, wrong scope
Scoped Capability Grants	Over‑privileged agents	Capability picker, timebox, dataset/namespace selectors	Lateral damage, data exfiltration
Dry‑Run/Simulate	Unknown side effects	“Simulate” mode, write‑blocked environment, outcome report	Irreversible errors
Step Gates (Audit Gates)	High‑risk transitions	Risk‑tiered approvals, double‑confirm, reason capture	Unreviewed critical changes
Change Bundles + Versioning	Fragmented edits	Atomic “bundle,” semantic version, release notes	Partial application, hard rollbacks
Rollback & Safe‑Mode	Bad deploys	One‑click revert, auto‑rollback on health checks	Prolonged incidents
UI Diff for Co‑Editing	Hidden agent edits	Inline before/after annotations	Undetected drift
Least‑Data Execution	Opaque data use	Data provenance panel, source badges (Data Transparency)	Unapproved data use
Event Ledger	Who did what, when	Human‑readable event stream with links to inputs/outputs (Events)	Disputed actions
Human‑in‑the‑Loop Queue	Bottlenecked reviews	Triage queue, SLAs, bulk approve/annotate	Review backlog risk

Versioning and rollback

Bundle changes: group all agent mutations into an atomic “change set” with a semantic version (e.g., 1.4.0‑agent.3) and human‑authored summary.
Immutable history: store input prompts, retrieved context, model version, tool calls, and outputs for each bundle (Code Transparency).
Revert semantics: support full rollback (to previous good state) and targeted rollback (revert specific files/records within a bundle).
Health‑guarded deploys: auto‑rollback on failed post‑checks; display status in the event ledger.

Example: object‑level rollback

# Product description (SKU 4821)

- “Ultra‑light running shoe with breathable mesh.”

+ “Ultra‑light road runner with engineered mesh; ideal for daily training.”
@@ metadata

- tags: [“shoe”, “running”]

+ tags: [“running-shoe”, “daily-trainer”, “road”]

One‑click “Revert bundle 1.4.0‑agent.3” restores the prior content and metadata.

Audit gates

Define risk tiers (low/medium/high) by blast radius, compliance surface, and reversibility.
Gate types: informational hold (user must view diff), single approver, dual control (two distinct roles), or stakeholder sign‑off with time‑boxed SLAs.
Capture reviewer intent: Approve, Edit, or Block with rationale; all recorded in the event ledger.
Escalation: if SLA breaches, auto‑demote scope or trigger safe‑mode.

Human‑in‑the‑loop controls

Inline controls: Approve, Edit inline, Request change, Defer, or Run in Dry‑Run.
Guardrails: daily execution budget, per‑tool rate limits, and kill switch.
Explainability: show source snippets, tool outputs, and uncertainties (confidence ranges) next to each proposed change (Data Transparency).

Sample UI diffs (before/after)

Content co‑editing

Title:

- "Q2 invoice email"

+ "Your Q2 invoice is ready"
Body:

- "Attached is your invoice."

+ "Your Q2 invoice is ready. View it securely in your account."
CTA:

- "Download"

+ "View invoice"

Git‑like configuration change

# .feature-flags.json

- { "recommendations": false, "betaAgent": false }

+ { "recommendations": true,  "betaAgent": true }

CRM bulk update (previewed sample rows)

- Tier: Standard  | Renewal: 2025-01-31

+ Tier: Premium   | Renewal: 2026-01-31   (reason: discount extended)

Agent orchestration

Anchor: #agent-orchestration

Orchestrator contracts: define allowed tools, data scopes, concurrency, and termination conditions per “lane” (e.g., content, code, CRM).
Safety budgets: per‑lane caps on records/files touched per run; escalate to review when exceeded.
Deterministic phases: Retrieve → Plan → Dry‑Run → Gate → Execute → Verify → Log → Notify; each phase emits events to the ledger.
Isolation: keep lanes and credentials separate; never share tokens across objectives.
Provenance surfaces: display which lane and tool performed each action with linked inputs/outputs (Code Transparency).

Related Zypsy work with complex orchestration and AI assistants: Copilot Travel, Captions, and AI security UX at Robust Intelligence.

Prompt management

Anchor: #prompt-management

Versioned templates: assign IDs and changelogs; bind each execution to a specific prompt version.
Parameter hygiene: visualize injected variables (user input, retrieved docs, system policy) separately; allow redaction of sensitive fields before run.
Safety schemas: pre‑/post‑conditions expressed as checks the agent must satisfy; block or gate on failure.
Evaluation loops: A/B prompt variants in Dry‑Run with quality rubrics; log scores and human ratings.
Drift detection: alert when output format or quality deviates; auto‑roll back prompt version.

Telemetry, logging, and transparency

Event ledger: append‑only log of prompts, tool calls, outputs, approvals, rollbacks, and notifications with actor identity.
Data lineage: clickable badges for each datum (source system, time, access path) per Data Transparency.
Code path clarity: disclose which parts of the stack are open vs. closed and where to inspect them per Code Transparency.

Risk scoring and progressive disclosure

Score each bundle by blast radius, reversibility, PII sensitivity, and compliance tags.
Map scores to UX: higher risk → stricter gates, fuller diffs, slower rollout; lower risk → lighter review.

Implementation checklist

Define lanes, tools, and scopes; set budgets and timeouts.
Implement Plan → Dry‑Run → Gate → Execute → Verify loop.
Ship UI diffs for content, config, and data records.
Add versioning and one‑click rollback for all bundles.
Instrument an event ledger and reviewer SLAs.
Stand up prompt versioning, evals, and drift alerts.
Test failure modes: rate‑limit, network loss, tool errors, and human override.

Work with Zypsy

Zypsy designs and ships agent UX with enterprise‑grade clarity: versioning, auditability, and human‑in‑the‑loop by default. Explore our capabilities, learn about our services‑for‑equity model in Design Capital, or contact us to co‑design safe co‑execution.