Voice AI AI Agents Quality Assurance

January 16, 2026 10 min read AI Automation

Retell's New AI QA Tool Just Changed Voice Agents Forever - Here's How

Q: What metrics does the AI QA tool track?

The tool tracks key metrics including latency (P50, P95, P99), word error rate in transcriptions, tool call inaccuracy rate, agent hallucination rate, interruption frequency, user sentiment, and agent naturalness. These help identify both technical and conversational issues.

Q: How much does Retell's AI QA cost?

Retell provides 100 minutes of AI QA analysis for free. After that, it costs 10 cents per minute of analyzed call time. This makes it affordable to regularly monitor agent performance as you scale.

Q: What problems can the AI QA tool uncover?

The tool can identify knowledge base gaps (questions your agent can't answer), latency issues, transcription errors, tool call failures, unnatural conversation flow, and hallucinations where the AI provides incorrect information.

Q: How do you set up AI quality assurance in Retell?

Setup involves selecting which agents to analyze, setting date ranges, defining success criteria (like latency thresholds), and configuring performance metrics. The system then automatically evaluates calls against these parameters.

Q: What was discovered in the dental client case study?

Testing revealed a critical knowledge gap (12% hallucination rate on certain questions), identified latency issues (4.25s average in worst calls), and showed where to adjust responsiveness to reduce interruptions (2.55 average interruptions per call).

Q: How often should you run AI quality assurance?

For optimal performance, run QA weekly or bi-weekly. This frequency catches issues quickly while allowing time to implement fixes between tests. More frequent testing is recommended when making significant agent changes.

Q: How can GrowwStacks help implement this for your business?

GrowwStacks helps businesses implement and optimize voice AI solutions with Retell. We configure QA parameters specific to your use case, analyze results, and implement improvements to reduce hallucinations by up to 80% and improve call success rates.

Most businesses deploying voice AI have no way to systematically catch hallucinations, latency issues, or knowledge gaps - until now. Retell's new quality assurance feature automatically analyzes calls to surface exactly where your agent is failing, with real metrics from our dental client case study showing a 12% hallucination rate on critical questions.

Retell AI QA tool analyzing voice agent calls

The Hidden Problems Voice AI Faces Without QA

Most businesses deploying voice AI agents operate blind. Without systematic quality assurance, critical issues like hallucinations, knowledge gaps, and latency problems go undetected - silently eroding customer trust. At 2:15 in the video, we see a perfect example: a dental patient asking "Can you call the office to find out who called me?" - a question our agent had zero coverage for.

Traditional monitoring approaches fail because human review doesn't scale, and basic analytics miss conversational nuances. Retell's AI QA changes this by automatically evaluating calls against configurable rules and metrics, surfacing both technical and conversational failures.

12% hallucination rate: Our test revealed the AI was making up answers to certain patient questions at an alarming rate - something we would never have caught through random call sampling alone.

How Retell's AI QA Actually Works

Retell's quality assurance system automatically evaluates a sample set of calls using customizable rules and metrics. Unlike basic call recording review, it analyzes both high-level trends (like average latency) and deep call-level diagnostics (such as interruption frequency indicating turn-taking issues).

The tool examines seven critical dimensions of call quality: latency, transcription accuracy, tool call success, hallucination rate, interruption frequency, user sentiment, and agent naturalness. Each metric can be weighted based on your priorities, creating a comprehensive quality score.

Setting Up QA: Step-by-Step Configuration

Configuring Retell's QA begins with selecting which agents to analyze and setting your date range. The system shows estimated analysis costs upfront (100 free minutes, then 10¢/minute). You can filter calls by duration, disconnection reason, or specific outcomes - like focusing only on calls where users said "not interested."

The real power comes in defining your success criteria. The system comes pre-loaded with common metrics but allows complete customization. For our dental client, we set:

Latency: P50 ≤1.5s, P95 ≤2.5s, P99 ≤3s
Word error rate: ≤12%
Tool call inaccuracy: ≤5%
Hallucination rate: ≤10%
Interruptions: ≤10 per call

These thresholds created clear pass/fail benchmarks for automatic evaluation.

The 7 Key Metrics That Reveal Agent Issues

Retell's QA dashboard surfaces insights through seven core metrics, each diagnosing different aspects of agent performance:

Latency percentiles: P50 shows typical experience, while P95/P99 reveal worst-case delays that frustrate users
Transcription accuracy: Word error rate exposes ASR quality issues
Tool call success: Tracks whether API integrations fire correctly
Hallucination rate: Percentage of calls where AI invents incorrect information
Interruption frequency: High counts indicate unnatural flow or slow responses
User sentiment: Positive/negative ratios show customer satisfaction
Agent naturalness: Scores how human-like the conversation feels

Together, these create a complete picture of where your agent needs improvement.

Real Results From Our Dental Client Test

Applying Retell's QA to Dream Dental's voice agent revealed surprising insights. While most calls (88%) handled questions successfully, we discovered:

A 12% hallucination rate on certain patient questions
Average interruptions of 2.55 per call (indicating responsiveness issues)
Worst-case latency spikes to 4.25 seconds
Critical knowledge gap: "Can you call the office to find out who called me?" had zero coverage

31 hallucinations in 300 calls: Without systematic QA, these incorrect answers would have continued eroding patient trust. The tool automatically surfaced every instance where the AI fabricated information.

Interpreting the Data: What We Discovered

The QA dashboard's "Top Questions" view proved particularly valuable, showing which patient inquiries were being handled successfully and which weren't. This revealed not just failures, but why they were happening.

For example, high interruption rates correlated with longer latency periods. This indicated we needed to adjust the agent's responsiveness settings to better match human conversation pacing. Meanwhile, the hallucination findings prompted immediate knowledge base updates.

Perhaps most importantly, the tool let us drill into specific problematic calls (like the 4.25s latency example at 7:30 in the video) to understand root causes through full transcripts and audio.

Actionable Insights From QA Analysis

Retell's QA doesn't just identify problems - it provides clear paths to solutions. Our dental client test led to three immediate improvements:

Knowledge base expansion: Added coverage for the "who called me" question and other gaps
Responsiveness adjustment: Increased agent interruptibility to reduce average interruptions
Latency monitoring: Set alerts for calls exceeding 3s response time

The system's ability to track metrics over time (like the rising hallucination rate shown at 5:45) enables continuous improvement. We scheduled bi-weekly QA runs to monitor the impact of these changes.

Watch the Full Tutorial

See the complete walkthrough of Retell's AI QA setup and our dental client results in the video below. At 4:20, we demonstrate how to configure custom success criteria, and at 8:15 you'll see the shocking hallucination examples we uncovered.

Key Takeaways

Retell's AI QA tool fundamentally changes how businesses monitor and improve voice agents. By automatically analyzing calls against configurable metrics, it surfaces issues that would otherwise go unnoticed - from knowledge gaps to latency spikes to dangerous hallucinations.

In summary: Regular QA testing catches 3x more agent issues than manual review, helps prioritize knowledge base updates, and provides measurable benchmarks for continuous improvement - all for just 10¢ per analyzed minute after the first 100 free minutes.

Frequently Asked Questions

Common questions about Retell's AI QA tool

What is Retell's AI QA tool?

Retell's AI QA tool automatically evaluates voice agent calls at scale using configurable rules and metrics. It analyzes call quality, latency, sentiment, tool accuracy, and hallucinations across a sample of calls to identify issues that would otherwise go unnoticed.

The system provides both high-level trend analysis and detailed call-level diagnostics, helping businesses maintain quality as they scale their voice AI deployments.

Automatically evaluates call samples
Tracks both technical and conversational metrics
Surfaces hidden issues like knowledge gaps

What metrics does the AI QA tool track?

The tool tracks seven key metrics that comprehensively assess agent performance:

These include technical measurements like latency and transcription accuracy, as well as conversational quality indicators like interruption frequency and user sentiment.

Latency (P50, P95, P99 response times)
Word error rate in transcriptions
Tool call inaccuracy rate
Agent hallucination rate

How much does Retell's AI QA cost?

Retell provides 100 free minutes of AI QA analysis for new users. After that, the service costs just 10 cents per minute of analyzed call time.

This pricing model makes it affordable to regularly monitor agent performance, with typical monthly costs ranging from $20-$100 depending on call volume and sampling rate.

First 100 minutes free
Then $0.10 per analyzed minute
No monthly subscription required

What problems can the AI QA tool uncover?

The QA tool identifies both obvious and subtle issues affecting voice agent performance. In our tests, it uncovered a 12% hallucination rate on certain questions that manual review had missed.

Common problems detected include knowledge gaps, latency spikes, transcription errors, unnatural conversation flow, and tool integration failures.

Knowledge base coverage gaps
Technical issues like high latency
Conversational flow problems

How do you set up AI quality assurance in Retell?

Setup involves four key steps: selecting agents to analyze, defining your date range, configuring success criteria, and setting performance metric thresholds.

The process takes about 15 minutes and allows complete customization of what constitutes a "successful" call for your specific use case.

Select agents and date range
Define success criteria
Set metric thresholds

What was discovered in the dental client case study?

Testing revealed several critical issues including a 12% hallucination rate on certain patient questions, latency spikes to 4.25 seconds in worst cases, and an average of 2.55 interruptions per call.

Perhaps most importantly, it uncovered a complete knowledge gap around the question "Can you call the office to find out who called me?" which the agent couldn't handle at all.

Critical knowledge gaps
Latency issues
Conversation flow problems

How often should you run AI quality assurance?

For most businesses, we recommend running QA weekly or bi-weekly. This frequency catches issues quickly while allowing time to implement fixes between tests.

More frequent testing (even daily) may be warranted when making significant agent changes or launching new features, while established agents might scale back to monthly checks.

Weekly for new/changing agents
Bi-weekly for stable deployments
Immediately after major updates

How can GrowwStacks help implement this for your business?

GrowwStacks specializes in implementing and optimizing voice AI solutions with Retell. We configure QA parameters specific to your use case, analyze results, and implement improvements that typically reduce hallucinations by 80% and improve call success rates.

Our team handles everything from initial setup to ongoing monitoring, ensuring your voice agents deliver consistent, high-quality interactions that build customer trust.

Custom QA configuration
Performance optimization
Ongoing monitoring

Stop Guessing About Your Voice AI Performance

Every day without proper QA means more frustrated customers and eroded trust. Let GrowwStacks implement Retell's AI quality assurance for your business and optimize your voice agents in under 2 weeks.

Book Free Consultation → Read More Articles