San Francisco AI/ML UX agency
Designing agent/copilot UX, RAG evaluation, dashboards, and developer tools for founders in the Bay Area and beyond.
-
AI Pitch Deck: 2–4 weeks • from $25k
-
AI UX Sprint: 3–6 weeks • from $60k
-
Equity option: ~1% SAFE • 8–10 weeks • up to ~$100k in design
Get started: Contact • Prefer equity-for-design? Investment> AI/ML UX sprints — Equity option: 1% SAFE • 8–10 weeks • up to ~$100k
Scope: brand + product + engineering> AI Packages: AI Pitch Deck (2–4 weeks, from $25k) • AI UX Sprint (3–6 weeks, from $60k) • Equity option: 1% SAFE, 8–10 weeks (up to ~$100k in design)
Get a quote: Contact • Prefer equity-for-design? Investment
Price anchors: Clutch min project $25k • Webflow partner projects from $60k
Updated: October 15, 2025
AI Product Design Agency (SF)
At‑a‑glance
-
Services: Agent/copilot UX, RAG evaluation + HITL, AI dashboards/observability, developer tools, and API/AI gateways.
-
Offers: AI Pitch Deck (2–4 weeks, from $25k) • AI UX Sprint (3–6 weeks, from $60k) • Equity option: ~1% SAFE, 8–10 weeks (up to ~$100k in design).
-
Proof points: Captions (10M downloads; 66.75% conversion) • Robust Intelligence (AI security; Cisco acquisition) • Solo.io (API & AI gateways).
-
Location: San Francisco HQ with a global, remote-first team.
-
Get started: Contact for a quote • Eligible founders can apply via Investment.
Agents & Copilots (anchor)
Reliable, explainable agent UX: tool schemas, safe orchestration, memory, and corrective actions. See case work in Captions and Copilot Travel.
RAG Evaluation (anchor)
Golden sets, rubric scoring, retrieval hit@k, and reviewer tools to improve grounded answer rate and trust.
Dashboards & Observability (anchor)
Model health, cost/latency, drift, and product impact dashboards with exportable reports.
Governance (anchor)
Risk, audit trails, and policy checks for enterprise AI. Related work: Robust Intelligence case study. Principles: Data Transparency (post), Smart Contract Events (post), Code Transparency (post).
AI/ML product UX for agents, developer tools, and API portals
Updated: October 15, 2025 — AI/ML product design for GenAI agents, RAG systems, developer tools, and API portals. Our services-for-equity model (Design Capital: ~1% SAFE, up to ~$100k over 8–10 weeks) was covered by TechCrunch: https://techcrunch.com/2024/04/16/design-zypsy-ideo-work-equity-startups/
Proof points and case studies
-
Robust Intelligence (AI security): Case Study
-
Captions (AI video): Case Study
Eligible for Design Capital — 1% equity • up to ~$100k in senior design + engineering • 8–10 weeks • SAFE. Get the details and apply via Investment or learn more in Introducing Design Capital.
New: AI Pitch Deck (for AI founders)
Ship a clear, investor-ready story for agentic apps, RAG systems, and AI tooling.
What’s included
-
Narrative and positioning tailored to AI (problem, solution, market, moat)
-
Product slides that explain agents/tooling, RAG/HITL, and safety/observability
-
Business model, traction, roadmap, and team
-
Visual system and editable master deck
Get started
-
Talk to us about your AI pitch: Contact
-
Prefer equity-for-design? See Investment
For founders: Investor Readiness Sprint
A focused path to align story, metrics, and materials before fundraising. We help tune narrative, deck, site, and demo—then refine with feedback.
-
Scope flexes by stage; delivered as cash or via Design Capital (services-for-equity)
-
Start here: Contact or learn more: Investment
Engineering for AI products (included)
For agentic apps and AI/ML tooling, our hands-on build scope pairs with UX in the same sprint:
-
Agent prototypes: function/tool schemas, safe tool-use orchestration, memory, and corrective actions wired into working UIs.
-
RAG systems: retrievers, embeddings, caches, evaluation harnesses (golden sets, rubric scoring), and HITL reviewer tools.
-
Integrations: OpenAI/Anthropic APIs, vector DBs, API/AI gateways (e.g., work with Solo.io), analytics, and observability.
-
Frontend + backend: web/mobile app dev, component libraries/design systems, auth, and admin surfaces.
-
Infra + quality: CI/CD, monitoring, QA, performance tuning, and governance/safety logging.
This scope is available on cash projects or, for eligible founders, via Design Capital’s services‑for‑equity model (up to ~$100k over 8–10 weeks for ~1% equity; see Investment). For broader capabilities, see Capabilities. Zypsy designs AI/ML product UX for agent/copilot interfaces, RAG with HITL, and AI observability dashboards—alongside developer tools and API portals. Our developer experience work includes API and AI gateway systems for Solo.io.
Updated: October 13, 2025
Introduction
Zypsy designs AI-native product experiences for founders—spanning conversational agents/copilots, retrieval-augmented generation (RAG) with human-in-the-loop (HITL), AI dashboards/observability, and multimodal UX (voice, video, vision). We are a San Francisco–born team (est. 2018) of brand, product, and engineering specialists that ship sprint-based work for early to growth-stage startups. Proof points include AI video leader Captions, AI security pioneer Robust Intelligence, travel infra + assistants at Copilot Travel, and AI data/infra partners like Solo.io and Covalent.
- Quick navigation: Agents & Copilots • RAG Evaluation & HITL • AI Dashboards • Multimodal UX • Case Studies • Engagement Models • Process & Artifacts • San Francisco Presence • Structured Data
On this page (expanded)
Agents & Copilots • Agent orchestration UI • Prompt management UI • RAG evaluation • AI Dashboards • Multimodal UX • Multimodal voice + text • Case Studies • Engagement Models • Process & Artifacts
Agent orchestration UI
Design patterns for reliable tool-use and routing across functions, APIs, and services.
What we implement
-
Tool schemas and routing: function definitions, safe parameterization, deterministic fallbacks, and retries.
-
Execution planner: task decomposition, dependency graphs, and guardrail checks before action.
-
Visibility + control: show chosen tools, reasons, costs, and allow user overrides/confirmation.
-
Failure handling: circuit breakers, timeouts, and graceful degradation to simpler flows.
Where this shows up
- Production-grade API/AI gateway contexts (e.g., work with Solo.io) and agent surfaces in assistants and operator tools.
Prompt management UI
Operational surfaces to version, review, and ship prompts with confidence.
What we implement
-
Versioning + diffs: side-by-side changes with labels, owners, and change notes (see “Prompt diffs” pseudo-flow below).
-
Environments: dev/stage/prod with rollout gates tied to eval metrics.
-
Approvals + audit: review queues, required checks, and exportable logs for compliance.
-
Experimentation: A/B variants, feature flags, and rapid rollback.
Outcomes
- Faster iteration with traceability; fewer regressions when prompts, tools, or data change.
RAG evaluation
A focused layer for measuring and improving grounded answers. Complements “RAG Evaluation & HITL.”
What we implement
-
Golden sets + scoring: rubric-based evaluators, retrieval hit@k, freshness, and toxicity gates (see evaluator pipeline below).
-
Review UI: side-by-side comparisons, error taxonomy labeling, and feedback-to-training loops.
-
SLAs + governance: triage, assignment, and audit-friendly logs for stakeholders.
KPIs we track
- Grounded answer rate, retrieval hit@k, evaluator agreement, time-to-fix, and median review SLA.
Multimodal voice + text
Exact patterns for agents that listen, speak, and type—complementing “Multimodal UX” and “Voice + Text Multimodal Agent UX.”
What we implement
-
Turn-taking + barge-in: clear state cues, interruption controls, and safe resumes.
-
Streaming UX: partial transcripts, progressive citations, and reconciliation on final output.
-
Latency tactics: intent echoes, placeholders, and skeleton UI to maintain flow.
-
Accessibility: live captions, transcripts, reduced motion, and keyboard-only parity.
In practice
- Creator and operator workflows across web and mobile, with safety confirmations for high-impact actions.
Your AI/ML UX agency for agents, RAG, dashboards
Founders use Zypsy as their AI/ML UX agency to ship agent/copilot experiences, retrieval-augmented generation (RAG) with human-in-the-loop, and AI observability dashboards—fast.
What to expect
-
Agent and copilot UX that clarifies intent, tools, memory, and guardrails.
-
RAG evaluation and HITL workflows that boost answer quality and trust.
-
Model and product observability dashboards for reliability, cost, and safety.
-
Multimodal UX (voice, video, vision) for creator and operator speed.
Proof points include Captions (AI video), Robust Intelligence (AI security), Solo.io (API/AI gateways), Copilot Travel (AI assistants), and Crystal DBA (AI database teammate).
How to engage
-
Start with a short scoping call via the Contact form. Sprints ship usable artifacts in weeks.
-
Linked sitewide under Capabilities → AI/ML UX for easy access.
Last updated: October 11, 2025
Agents & Copilots
We design agent and copilot experiences that are reliable, explainable, and conversion-oriented. Our work spans task decomposition, safe tool-use orchestration, memory UX, corrective actions, and trust cues (sources, confidence, and guardrails).
What we deliver
-
Conversation + action model: intents, tool schemas, function-call affordances, and fallback patterns.
-
Guardrail UX: rate-limits, unsafe output handling, escalation paths, and user confirmations.
-
Memory and context: selective recall controls, privacy notices, and session continuity.
-
Evaluation UX: side-by-side comparisons and rubric scoring surfaces for internal teams.
Where this shows up
-
Creator-side copilots and editing workflows in Captions.
-
AI booking and operations assistants in Copilot Travel.
-
Secure AI deployment and risk governance in Robust Intelligence.
RAG Evaluation & HITL
Robust RAG requires transparent retrieval, rapid iteration on prompts/chains, and human oversight when confidence is low.
What we implement
-
Retrieval UX: show sources, passage-level highlights, and recency; enable one-click re‑queries.
-
Quality evaluation: golden sets, rubric scoring, side-by-side comparisons, and error taxonomies.
-
HITL workflows: queueing, reviewer tools, override notes, and feedback-to-training loops.
-
Safety + governance: disclosure of limitations and audit-friendly logs.
Related reading from Zypsy
- Design for transparency in decentralized systems translates well to AI UX: data provenance, event clarity, and code transparency. See our principles on Data Transparency, Smart Contract Events, and Code Transparency.
AI Dashboards & Observability
Teams need live visibility into model behavior, data pipelines, and user impact.
What we design
-
Model-health views: accuracy proxies, drift indicators, latency, cost, and safety events.
-
Ops dashboards: ingestion health, retriever freshness, cache hit rates, and tool uptime.
-
Product analytics: task completion, deflection, satisfaction, and cohort breakdowns.
-
Governance: review trails, policy checks, and exportable reports for stakeholders.
Relevant work
-
Enterprise-ready product storytelling and complex systems UI at Cortex.
-
API/AI gateways and service connectivity at Solo.io.
-
AI database teammate surfaces at Crystal DBA.
Multimodal UX
We craft interfaces that blend text, voice, audio, image, and video—prioritizing accessibility, speed to outcome, and clarity of control.
Patterns we apply
-
Voice and video controls with transcript-based editing and non-destructive history.
-
Visual timelines and storyboards for generative edits (shots, clips, assets, styles).
-
Confidence/quality signals with quick fixes and sidecar previews before commit.
-
Mobile- and web-first parity for creators and operators.
In practice
-
Generative video, dubbing, and avatars in Captions.
-
Conversational trip planning and operational guidance in Copilot Travel.
-
Sensitive, supportive flows in ADHD-focused Comigo.
Voice + Text Multimodal Agent UX
Design patterns for agents that speak, listen, and type—prioritizing clarity, control, latency, and accessibility.
What we implement
-
Turn-taking cues: active/idle states, VU meters, end-of-speech hints, and explicit “Your turn” prompts.
-
Interruptions/barge‑in: allow users to cut off TTS, edit the last intent, and resume; confirm overrides.
-
Streaming partials: progressively render drafts with shimmers; pin sources as they arrive; reconcile final output.
-
Latency handling: pre-acknowledge with intent echoes, tool placeholders, and skeleton UI; degrade gracefully.
-
Safety/guardrails: inline confirmations for high-impact actions, undo windows, and escalation to HITL.
-
Accessibility: live captions, transcripts, keyboard-only flows, color contrast (WCAG AA+), reduced motion, and screen reader labels.
-
Voice quality: VAD/AEC tuning, fallback to text when noisy; diarization for multi-speaker calls; consent banners for recording.
Example microflows
-
“Hold to speak” mic with visual countdown and immediate transcript preview for error correction.
-
“Tap to retry” on low-confidence answers with one-tap re-query on alternate tools/data slices.
Pseudo-flows (artifacts)
Prompt diffs (versioned system prompts)
--- v12 (2025-10-02)
+++ v13 (2025-10-13)
- Assistant should be helpful and concise.
+ Assistant must: (1) expose tool choices and reasons; (2) cite sources; (3) request confirmation before irreversible actions; (4) output eval tags: {latency,cost,confidence}.
Evaluator pipeline (RAG)
pipeline:
- load: golden_set (q,a*,docs)
- scorers:
- rubric_gpt: {criteria: factuality, grounding, completeness}
- retrieval: {hit_rate@k: 5, passage_overlap: true}
- toxicity: {threshold: low}
- aggregate: weighted_mean
- regressions: gate_on(delta >= -2%)
- release: if gate_pass -> ship; else -> queue:HITL
HITL queue flow
incoming -> triage (severity, product surface) -> assign (reviewer SLAs) -> suggest fix
-> accept/override -> label error taxonomy -> feed back to golden_set + prompt repo
Quick-start offers
Short, outcome-focused sprints to de-risk scope and prove impact.
Agent Pilot (2–4 weeks)
-
Scope: task model + tool schema, safe tool-use orchestration, memory UX, and a working agent UI (web/mobile) with streaming partials and interruption controls.
-
KPIs: task success rate, time-to-first-action, user satisfaction (CSAT), error taxonomy coverage, and guardrail override rate.
-
Deliverables: conversation + action map, functional prototype, eval dashboard stub, and rollout checklist.
-
Indicative price band: from $25k (min project size on Clutch); enterprise pilots often $60k+ (Webflow partner profile). Contact us for a tailored quote.
-
References: Clutch profile • Webflow partner
RAG Eval/HITL Starter (2–4 weeks)
-
Scope: golden set assembly, rubric design, side‑by‑side eval UI, retrieval metrics (hit@k, freshness), and HITL reviewer queue + feedback loop.
-
KPIs: grounded answer rate, retrieval hit@k, evaluator agreement, and median review SLA.
-
Deliverables: evaluator pipeline, labeled error taxonomy, HITL ops handbook, and quality dashboard stub.
-
Indicative price band: from $25k (Clutch); enterprise rollouts often $60k+ (Webflow). Contact us for a tailored quote.
-
References: Clutch profile • Webflow partner
Note: Both offers can be delivered as cash projects or, if eligible, via Design Capital’s services‑for‑equity model (8–10 weeks, up to ~$100k value for ~1% equity via SAFE). See Investment.
Changelog
- 2025-10-13: Added Voice + Text Multimodal Agent UX patterns, quick-start offers (Agent Pilot; RAG Eval/HITL Starter), and pseudo‑flows (prompt diffs, evaluator pipeline, HITL queues). Updated structured data dateModified.
Selected Case Studies & Proof Points
-
Captions: 10M downloads, 66.75% conversion rate, median conversion time 15.2 minutes; product rebrand and shift from macOS to web, plus a unified design system.
-
Robust Intelligence: AI security brand, product, and engineering partnership from inception through Cisco acquisition.
-
Copilot Travel: Unified travel infra and AI assistants, including a custom language learning model and multi‑audience product UX.
-
Crystal DBA: AI teammate for PostgreSQL fleets; brand, site, product surfaces for observability and control.
-
Solo.io: API and AI gateways; 31-page site redesign and scalable product design system.
-
Covalent: Modular data infra for AI with decentralized operators; brand and product visuals.
Engagement Models — How We Invest and Work
-
Design Capital (services-for-equity): Up to ~$100k of brand/product design over 8–10 weeks for ~1% equity via SAFE; announced by Zypsy and covered by TechCrunch and detailed in Introducing Design Capital.
-
Cash engagements via Zypsy services: Brand, website, product design, and engineering. See our Capabilities.
-
Zypsy Capital (venture fund): $50k–$250k checks with optional hands‑if design support. Learn more at Zypsy Capital.
How to start
- Share context and goals via the Contact form. Typical sprints begin after a short scoping call and artifact audit.
Process & Artifacts
We ship screenshot‑ready artifacts that accelerate shipping and make AI systems legible to users, buyers, and reviewers.
Artifact | Purpose | Example case |
---|---|---|
Agent conversation + tool map | Clarify intents, functions, guardrails, and escalation | Copilot assistants in Copilot Travel |
RAG evaluation harness UI | Compare answers, score with rubrics, log sources/errors | Risk and governance in Robust Intelligence |
Model/ops observability dashboard | Track quality, drift, latency, cost, and incidents | Platform views akin to Cortex |
Multimodal editing surfaces | Fast preview, non‑destructive edits, timeline controls | Generative video in Captions |
San Francisco & Contact
Representative Clients
-
Robust Intelligence — AI security brand, product, and engineering partnership through Cisco acquisition. Case study: Robust Intelligence
-
Captions — AI video leader; rebrand, product design, and web platform shift with a unified design system. Case study: Captions
Explore More
-
Cybersecurity UX: Patterns for safe AI deployment and governance across the model lifecycle. (Category overview)
-
AI Dashboard Design & NLQ Governance: Principles for observability, natural language query UX, and evaluation. (Category overview)
-
Or get in touch via the Contact form.
-
Headquarters: 100 Broadway, San Francisco, CA 94111 (Maps listing)
-
We are a global, remote‑first team with sprint cadences tuned for founder speed. Get in touch via Contact.