How it works

BotDoc puts realistic prospects in front of your AI agents, scores how they handle the conversation, and tells you exactly what to change to score higher next time.

This page walks through what BotDoc tests, how scoring works, and how the rollback + updated-prompt flow keeps the agent improving week over week.

What BotDoc tests

BotDoc tests your client's live AI agents — the same web chat or voice agent a real customer would talk to. You give BotDoc the agent's system prompt (for web chat) or its phone number (for voice), pick the scenarios you want covered, and BotDoc takes over.

Each scenario simulates a different kind of customer: a new lead with a specific need, a price-shopper, an after-hours urgency, an objection-hesitant browser, an off-topic curve ball, a booking request. The AI tester stays in character, sends realistic messages, and only ends the conversation when it's either captured what the customer needed or hit a natural stop.

How a test works

Pick the client, the channel (web chat, voice, or both), and the scenarios to run.
For each scenario, an AI "tester" plays the customer and talks to the client's real agent — for voice, this is a real outbound phone call placed via our Twilio number to the agent's number.
When the conversation ends, the full transcript is graded by Claude across five scoring dimensions (relevance, accuracy, tone, lead capture, goal completion) on a 0–100 scale.
After every scenario finishes, BotDoc synthesizes a single updated system prompt that incorporates every finding and targets the "Optimized" tier on a re-test.
You see scores, strengths, issues, and a per-scenario recommendation — plus the proposed updated prompt with a one-click Apply button.

How scoring works

Each conversation is graded on five 0–100 dimensions, then rolled into an overall score and a performance tier.

Dimension	What we look for
Relevance	Stayed on point and addressed exactly what the customer asked. No off-topic excursions.
Accuracy	No made-up facts. Honest about limits. Only states information the business has actually provided.
Tone	Warm, professional, on-brand. Not robotic, pushy, or aggressive.
Lead capture	Captured the customer's name AND contact (phone or email) within the conversation, with qualifying questions tied to the customer's goal.
Goal completion	Moved the conversation toward the customer's goal (booking, purchase, resolution) and ended with a specific named next step.

Performance tiers

Optimized 90 and above — the agent is converting and on-brand; minor polish only.
Sub-Optimal 50–89 — working but losing leads. The updated prompt focuses on closing specific gaps to push every dimension into the 90s.
Marginal Below 50 — the agent needs a serious rewrite; treat the report as the checklist.

What's in the report

Overall score and tier for the whole run.
Per-scenario cards with the five dimension scores, strengths, specific issues, and a concrete recommendation.
Transcripts of every conversation (web) or the call summary plus recording link (voice).
An updated system prompt synthesized from all the findings, designed to score 90+ on a re-test, with a one-click Apply that auto-versions the current prompt.
Downloadable PDF branded for sharing with your client, plus a permanent public share link (scores + findings only — no transcripts).

The updated prompt

After every run, BotDoc gives Claude the agent's current system prompt plus every scenario's per-dimension scores and findings, and asks for a single complete rewrite that would score 90+ on every dimension on a re-test.

The rewrite preserves your client's brand voice, hard rules, named contacts, prices, and any business facts the current prompt contains. It rewrites aggressively wherever a scoring dimension fell below 90 — and leaves the rest alone.

Two buttons on the "Updated prompt" card:

Copy — copies the full prompt to your clipboard for review or pasting elsewhere.
Apply to this client — confirms, then swaps the live prompt and snapshots the prior one as a rollback-able version (see below). After Apply, a "Re-run these scenarios to verify ≥90" link surfaces so you can confirm the rewrite actually hit the target.

Versions and rollback

Every save of a client's web-agent prompt — whether you edit it manually or apply an updated prompt from a test report — auto-snapshots the prior value as a numbered version (v1, v2, v3…). On the client detail page, the "Prompt versions" panel lists every prior version with timestamps and a one-click Rollback button.

Rollback is fully reversible: it auto-snapshots the current prompt as a new version before swapping, so any rollback can itself be rolled forward. You can experiment with an updated prompt safely — if it underperforms, one click restores the previous one.

Voice testing

Voice tests place real outbound phone calls from a Twilio number to your client's voice agent. Each call lasts up to three minutes; BotDoc shows a confirmation before placing any calls so you don't fire 25 of them by accident. Voice scenarios are tested one at a time to avoid hitting concurrent-call limits.

Voice testing is included on Pro and Agency plans. Voice findings appear in the per-scenario report; the synthesized updated prompt focuses on the web chat (Vapi voice configuration lives in a different surface).

FAQ

Does BotDoc store my clients' agent prompts?: Yes — the current prompt and every prior version are stored against the client record in your tenant, and used only by you and your team. They are not shared with other tenants. Each request is scoped by organization ID end to end.
What happens to my data if I cancel?: Your data stays in place during the trial and after cancellation; you can resume by reactivating your subscription. Reach out if you need an export or deletion.
What if the test agent says something off-brand?: The tester plays the customer, not the agent — so it asks questions and pushes back, but it never speaks as your client. If the report flags a tone issue, the synthesized updated prompt will tighten the agent's voice rules.
Can I share a report with a client without exposing the transcript?: Yes. The public share link and the PDF (when set to share mode) show scores + findings only. Internal copies show the full transcript and the updated prompt.
Does BotDoc rewrite my prompt on its own?: Only when you click Apply. The synthesized updated prompt is generated automatically with every run, but it doesn't touch your live agent until you confirm — and even then, the prior prompt is auto-saved as a rollback-able version.

Ready to run a test?

See pricing or start the free trial.

See pricing Start free trial