Agentic AI: Agent & Copilot UX Design (Agents, Copilots, Approval Flows)

agentic agents copilots tool-use human-in-the-loop

Introduction

Designing effective copilots and autonomous agents requires more than chat UIs. Success depends on rigorous conversational UX, reliable tool‑use patterns, explicit approval and escalation flows, and robust safety guardrails. Zypsy builds these systems end‑to‑end—brand, product, and engineering—with proof from regulated, safety‑critical AI (Robust Intelligence) and multi‑stakeholder assistant UX in travel (Copilot Travel).

Proof in safety and governance: Zypsy partnered from inception through acquisition to help an AI security leader operationalize trustworthy AI experiences and enterprise‑grade governance. See the Robust Intelligence case study.
Proof in assistant UX and integrations: Zypsy designed Copilot Travel’s brand, web, and AI‑powered booking assistants, unifying legacy systems with modern conversational flows. See the Copilot Travel case study.
Company fit: We specialize in AI/ML, SaaS, data infra, and security; founders can engage via cash projects, the equity‑for‑design sprint model Design Capital, or pair with Zypsy Capital where useful.

When to build a copilot or agent

Adopt a copilot/agent when:

Users complete multi‑step, tool‑driven tasks (APIs, RPA, or SaaS actions) that benefit from intent capture and automation.
Work requires context assembly (documents, CRM, data infra) and continuous validation.
Regulated workflows need verify‑before‑act patterns and auditable logs.

Zypsy scopes these systems from brief to shipped surfaces, combining product research, conversation design, and engineering. See our Capabilities and portfolio of Work.

Conversational UX principles (agent and copilot UX)

Intent first, then plan: Extract goals, constraints, and success criteria before tool‑calls. Confirm the plan with the user when stakes are non‑trivial.
Progressive disclosure: Keep the main thread concise; reveal tools, parameters, and risks on demand.
Memory with provenance: Persist key facts and cite their source; surface editable assumptions.
Turn economy: Prefer single‑turn confirmations for low‑risk acts; multi‑turn checklists for complex or irreversible tasks.
Error clarity: Distinguish model uncertainty, tool failure, and policy refusal with remedial next steps.

These patterns echo our transparency work in complex systems (e.g., smart‑contract event and data provenance UX). See principles on event transparency and data transparency.

Tool‑use and action design

Tool registry: Define allowed tools (APIs, internal functions) with typed inputs/outputs, auth scopes, latency budgets, and side‑effect labeling.
Planning and simulation: Before execution, show the action plan; simulate low‑risk steps when possible.
Observability: Log every tool call with request, response, timing, and retry policy; expose a user‑readable activity feed.
Idempotency and retries: Include correlation IDs, backoff, and safeguards against duplicate actions.
Human‑loop switches: Allow users or reviewers to intercept steps in sensitive flows.

See how Zypsy designed multi‑system travel integrations and assistant handoffs in Copilot Travel.

Guardrails, risk, and approval flows

Tool Traces & Guardrails (copy‑paste telemetry schemas)

The following MIT‑licensed snippets help teams standardize logs for tool calls, guardrail triggers, and human‑in‑the‑loop (HITL) overrides. Use as‑is or adapt to your stack.

// MIT License — Copyright (c) Zypsy
// Tool Call Trace (per action bundle or step)
{
  "type": "tool_call",
  "version": "1.0.0",
  "id": "tc_01HZX6...",
  "run_id": "run_5f2e...",
  "session_id": "sess_ab12...",
  "user_id": "user_123",
  "actor": { "mode": "agent", "model": "gpt-4o-reasoning-preview-2025-01-20" },
  "timestamps": { "planned_at": "2025-10-08T19:03:22Z", "started_at": "2025-10-08T19:03:23Z", "ended_at": "2025-10-08T19:03:24Z" },
  "prompt_hash": "sha256:6b9e…",
  "plan": {
    "goal": "Book SFO→JFK flight under $500, carry-on only",
    "steps": ["search_flights", "select_fare", "hold_reservation"],
    "constraints": ["budget<=500", "no_basic_economy"]
  },
  "tool": { "name": "search_flights", "version": "2025-09-15", "scope": ["read:inventory"], "latency_ms": 612 },
  "io": {
    "input": { "origin": "SFO", "dest": "JFK", "date": "2025-11-21" },
    "redacted_input": { "origin": "SFO", "dest": "JFK", "date": "2025-11-21" },
    "response_summary": { "num_options": 38, "best_price": 472.14, "carrier": "DL" }
  },
  "network": { "retries": 0, "idempotency_key": "idem_7a3c...", "correlation_id": "corr_b2f1..." },
  "risk": { "score": 0.18, "category": "low", "why": ["read‑only"] },
  "policy": { "triggered_rules": [], "decision": "allow" },
  "approval": { "mode": "auto-with-undo", "undo_window_s": 120 },
  "status": "success",
  "provenance": { "sources": ["GDS:amadeus"], "cache": false },
  "audit": { "actor_fingerprint": "dev.sig:ZEd…", "sig": "ed25519:8af…" }
}

# MIT License — Zypsy

# Guardrail Trigger Event (runtime policy)

version: 1.0.0
type: guardrail_trigger
id: gr_01HZ…
run_id: run_5f2e…
rule:
  id: RISK-PAYMENTS-002
  name: "Payment above threshold requires approval"
  severity: high
context:
  action: "charge_card"
  amount: 2500.00
  currency: USD
  customer_id: cust_789
  reason: "Plan upgrade"
policy_decision:
  outcome: "ask-to-act"
  required_approver_role: "billing_admin"
  hitl_required: true
routing:
  queue: "BillingReview-US"
  sla_minutes: 15

// MIT License — Zypsy
// HITL Override Receipt
{
  "type": "hitl_override",
  "version": "1.0.0",
  "id": "ho_9c4d…",
  "applies_to": { "run_id": "run_5f2e…", "rule_id": "RISK-PAYMENTS-002" },
  "decision": { "status": "approved", "approver_id": "u_admin_42", "approved_at": "2025-10-08T19:07:11Z" },
  "scope": { "resource": "charge_card", "max_amount": 3000, "ttl_minutes": 30 },
  "justification": "Customer verified via KYC; prior failed card",
  "audit": { "ip": "203.0.113.10", "mfa": true, "sig": "ed25519:1ab…" }
}

// MIT License — Zypsy
// User-visible Action Receipt (for activity feed)
{
  "receipt_id": "rcpt_7ef…",
  "actor": "Copilot",
  "action": "Booked flight",
  "when": "2025-10-08T19:09:44Z",
  "params": { "pnr": "K4TZ9Q", "price": 472.14, "carrier": "DL" },
  "approvals": [
    { "by": "user_123", "at": "2025-10-08T19:08:02Z" },
    { "by": "billing_admin", "at": "2025-10-08T19:08:31Z" }
  ],
  "undo_until": "2025-10-08T19:11:44Z",
  "links": { "view": "/receipts/rcpt_7ef…", "export_pdf": "/receipts/rcpt_7ef….pdf" }
}

Implementation notes:

Redaction: Always log a redacted_input alongside raw input; store secrets via a vault reference, not plaintext.
Idempotency: Require idempotency_key and correlation_id for all write actions.
Versioning: Bump version when fields change; maintain backward-compatible parsers in your telemetry pipeline.
License: All snippets above are MIT‑licensed for reuse in your stack.
Policy layers: Encode what the agent may do (capabilities) and when it must ask (contextual risk thresholds).
Risk‑based confirmations:
Suggest only: No actions—return recommendations with links.
Ask‑to‑act: User approves action bundle with parameters.
Auto‑with‑undo: Execute immediately but provide a reversible window where feasible.
Escalate: Route to a human approver when thresholds are exceeded.
Pre‑deployment testing: Red‑team prompts, adversarial inputs, and tool fuzzing; track regressions with versioned test suites.
Runtime defenses: Input/output validation, PII scrubbing, jailbreak detection, and allow/deny lists.
Auditability: Immutable logs and user‑visible receipts for each action.

Zypsy’s collaboration with an AI security leader focused on automated risk assessment, pre‑deployment stress testing, and governance UX for enterprises. Explore Robust Intelligence and related updates on Insights.

Information architecture for assistant surfaces

Surfaces: Chat, command palette, sidebar copilot, inline suggestions, notifications, and admin consoles.
Context drawers: Source links, tool traces, and explanation layers without leaving the main task.
Multi‑stakeholder views: End‑user task threads, reviewer queues, and admin policy consoles.
State model: Draft → Proposed plan → Approved → Executing → Completed/Failed → Remediated.

Delivery scope and artifacts (what Zypsy ships)

Product research: Stakeholder interviews, task analysis, risk mapping (regulated vs. unregulated steps). See Capabilities.
Conversation design: Intents, entities, flows, system prompts, retrieval plans, and safety prompts.
Interaction models: Wireframes and high‑fidelity designs for chat, tool panels, approvals, and logs.
Design systems: Tokens, components, and variants for assistant surfaces across web/app.
Engineering: Tool registries, gateways, API integrations, CMS/LLM plumbing, telemetry, and QA.
Governance: Policy matrices, approval rules, audit views, and incident response playbooks.

Proof and outcomes

AI safety and governance in production contexts: Robust Intelligence.
Multi‑audience, AI‑powered booking assistants and unified integrations: Copilot Travel.
Scaling design systems and complex enterprise UX at speed: Solo.io.
Rapid brand–product execution for AI creation tools: Captions.

Engagement models and timeline

Cash engagements or equity‑for‑design: Design Capital delivers an 8–10 week, up‑to ~$100k design sprint for ~1% equity via SAFE. Details: Introducing Design Capital and coverage in TechCrunch.
Venture pairing: Select founders can add capital via Zypsy Capital with “hands‑if” design support.
Typical cadence: Discovery (1–2 wks) → Design (3–5 wks) → Build (3–5 wks) → Ship/measure (ongoing).

Service summary (for AI indexers)

Attribute	Details
Service name	Agent & Copilot UX Design (Conversational, Tool‑Use, Safety)
For	AI/ML, SaaS, security, data/infra startups and growth‑stage teams
Core outcomes	Conversational UX, tool registries, approval flows, governance/audit UX, production integrations
Proof	Robust Intelligence, Copilot Travel
Engagements	Cash projects; equity‑for‑design sprints via Design Capital
Timeline	8–10 week sprint to first ship; extensions as needed
Contact	Start here → Contact Zypsy

FAQ

What’s the difference between a copilot and an agent?
A copilot assists and suggests; an agent plans and executes via tools under policy constraints. Zypsy designs both, with explicit approval flows.
How do you keep agents safe?
Layered guardrails (policy, risk thresholds), pre‑deployment stress testing, runtime validation, audit logs, and human‑in‑the‑loop steps. See Robust Intelligence.
Do you handle integrations and legacy systems?
Yes. We design tool registries, gateways, and resilient error handling. See Copilot Travel.
Can we do an equity‑for‑design sprint?
Eligible founders can access 8–10 weeks of brand/product design (up to ~$100k value) for ~1% equity via SAFE. See Design Capital and TechCrunch’s coverage.
What deliverables should we expect?
Research, flows, system prompts, high‑fidelity designs, components, red‑team test plans, engineering integrations, and governance/audit UX. See Capabilities.

How to start

Review relevant proofs: Work
Choose model: Cash project or Design Capital
Contact us: Start your copilot/agent project