How Parakeet Health Scaled Background Noise Filtration for AI Voice Agents
Healthcare AI company Parakeet Health faced a critical challenge - their voice agents kept misinterpreting background noise as patient speech, creating frustrating experiences. Discover how they implemented Kubernetes-powered noise filtration that reduced transcription errors by 40% while optimizing cloud costs through intelligent scaling.
The Background Noise Problem
Healthcare AI company Parakeet Health was facing a critical customer experience issue with their voice agents. While human receptionists could naturally filter out background noise - TVs, other conversations, medical equipment - their AI agents kept misinterpreting these sounds as patient speech. This led to frustrating scenarios where agents would respond to background news broadcasts or irrelevant conversations.
As Stephen Cheng explained in his PyData Seattle talk, this wasn't just an annoyance - it became one of their top UX detractors. 99% of human receptionists could naturally ignore background noise, but their AI agents lacked this capability, causing confusion and requiring callers to repeat themselves frequently.
Key Insight: Background noise wasn't just reducing accuracy - it was damaging patient trust in their automated systems. Each time an agent responded to irrelevant noise instead of the patient's actual request, it reinforced skepticism about the technology's reliability.
AI Voice Agent Architecture
Parakeet Health's voice agent system followed a sophisticated architecture centered around a Python backend that coordinated all components. When a patient called:
- The telephony system (Twilio) captured the audio
- A speech-to-text service converted the audio to text
- The text (plus contextual prompts) was sent to their LLM provider
- The generated response went through text-to-speech
- Finally back through Twilio to the caller
The critical differentiator from simple chatbots was the agent's ability to take actions (like modifying appointments) and maintain conversation context. However, this sophisticated system was being undermined by a fundamental problem - noisy audio inputs causing poor transcription quality.
Noise Filtration Techniques
The team explored several noise filtration approaches before settling on their solution. The technical foundation involves Short-Time Fourier Transforms (STFT) that analyze audio in sliding windows to create spectrograms - visual representations of sound frequencies over time.
Modern noise filtration models use deep neural networks to:
- Analyze the spectrogram to identify dominant voice frequencies
- Create a "mask" that preserves those frequencies while suppressing others
- Rebuild clean audio through inverse STFT
Technical Note: While the mathematics behind STFT and spectrogram analysis is complex, the key takeaway is that these transforms enable AI models to separate speech from noise with remarkable accuracy when properly trained.
Why They Chose Crisp's Model
Parakeet initially tried an open-source solution called DeepFilterNet, but found it sometimes suppressed patient voices along with background noise. They ultimately selected Crisp's commercial AI model for three key reasons:
- Training Data Scale: Crisp's model was trained on 20,000 unique noises and 10,000 clear voices totaling 170+ years of audio
- Conversation Awareness: The model maintained session context to better identify the primary voice
- Accuracy: It preserved speech clarity while aggressively filtering background noise
However, implementing Crisp came with challenges. As Stephen noted, "They don't have very friendly SDKs...it involved building a lot of Cython bindings and our own session manager in Python." The technical debt was justified by the significant accuracy improvements.
The Computational Challenges
Adding Crisp's noise filtration fundamentally changed their application's resource profile. Where their backend previously ran as a stable workload, it now exhibited spiky demands:
- CPU usage spiked from 300 to 1500 millicores during noisy conversations
- Memory requirements increased significantly
- Latency became critical - the model needed to run inline to avoid unnatural pauses
Their existing Azure App Service infrastructure couldn't efficiently handle these variable demands. As Stephen explained, "You'd have to upgrade to much larger instances that would always be on, paying 3x the compute cost for capacity you only needed during business hours."
Kubernetes as the Scaling Solution
Kubernetes provided the perfect solution for their spiky workload demands. Key benefits they leveraged:
- Horizontal Scaling: Automatic pod scaling based on CPU/memory thresholds
- Self-Healing: Automatic pod replacement if failures occurred
- Cost Efficiency: Only pay for resources actually used
- Service Discovery: Simplified traffic routing to available pods
Their Kubernetes configuration used YAML files to define deployments, services, and config maps. Critical settings included:
Resource Management: They set both requests (for provisioning) and limits (for throttling/killing pods). For memory, exceeding limits would crash pods, while CPU limits would trigger throttling. Careful tuning of these values was essential for stability.
The Migration Process
Migrating from App Service to Kubernetes required careful planning:
- Phased Rollout: Started with outbound calls only
- Traffic Graduation: Slowly shifted customer segments while monitoring
- Overprovisioning: Initially allocated 1.5x estimated memory needs
- Monitoring: Used K9s to track pod health, restarts, and resource usage
Stephen emphasized that this migration consumed significant engineering bandwidth, requiring months of careful work to avoid disruptions. However, the long-term benefits justified the investment.
Results and Business Impact
The combined Crisp + Kubernetes solution delivered measurable improvements:
- 40% reduction in transcription errors caused by background noise
- 30% lower cloud costs compared to always-on large instances
- More natural conversation flow as agents stopped responding to irrelevant noise
- Improved patient satisfaction scores on call experiences
As Stephen noted, while the technical implementation was complex, the business impact was clear: "We saw that a lot more conversations were able to flow naturally, and the agent wasn't getting tripped up as much with background noises."
Watch the Full Tutorial
For a deeper dive into Parakeet Health's noise filtration implementation, watch Stephen Cheng's complete PyData Seattle presentation (timestamp 12:45 covers their Kubernetes configuration details).
Key Takeaways
Parakeet Health's journey with AI voice agents highlights several critical lessons for implementing noise filtration at scale:
In summary: 1) Commercial noise models like Crisp can outperform open-source alternatives when trained on sufficient data 2) Kubernetes provides the dynamic scaling needed for computationally expensive AI workloads 3) Gradual migration with overprovisioning prevents production issues when adopting new infrastructure.
Frequently Asked Questions
Common questions about this topic
Background noise caused transcription services to misinterpret conversations, leading to AI agents responding to noise instead of patient voices. In healthcare settings with TVs, conversations, and medical equipment, this created frustrating experiences where agents would respond to background news broadcasts or other irrelevant sounds.
The problem was particularly acute because 99% of human receptionists can naturally filter out background noise, while the AI systems lacked this capability. This discrepancy made the automated systems feel less competent than human operators.
Standard speech-to-text services focus primarily on transcription rather than noise filtration. While they have basic noise reduction, they lack sophisticated models trained specifically to identify and filter diverse background noises while preserving primary speech.
The open-source DeepFilterNet model they initially tried sometimes suppressed patient voices along with noise. This created situations where critical parts of conversations were lost, making the solution worse than having no filtration at all.
Crisp's commercial AI model was trained on 20,000 unique noises and 10,000 clear voices totaling 170+ years of audio data. Unlike open-source alternatives, it uses deep neural networks to identify the dominant voice in real-time, create a spectrogram mask, and reconstruct clean audio while preserving speech clarity.
The model maintains conversation context through sessions, allowing it to better identify and track the primary speaker's voice characteristics throughout a call. This session awareness proved critical for healthcare applications where calls might involve multiple speakers.
The Crisp model caused spiky compute demands (300-1500 millicores per call) that made traditional cloud services cost-prohibitive. Kubernetes allowed dynamic scaling during business hours while automatically reducing capacity overnight, cutting costs by 40% compared to maintaining always-on large instances.
Without Kubernetes, they would have needed to provision for peak loads at all times, resulting in significant wasted capacity during off-hours. The self-healing and automatic scaling capabilities were essential for maintaining service quality during unpredictable demand spikes.
Critical metrics included CPU requests vs limits (with limits triggering pod throttling), memory thresholds (where exceeding would crash pods), and horizontal pod autoscaling policies. They used K9s monitoring to track restarts, CPU/memory usage percentages, and pod health across their node clusters.
The team emphasized the importance of overprovisioning memory initially - if they estimated needing 1GB per pod, they'd provision 1.5GB to avoid out-of-memory crashes during production calls. CPU could be throttled, but memory issues would immediately crash pods during active conversations.
They migrated gradually by first routing outbound calls to Kubernetes pods, then slowly shifting customer traffic while monitoring performance. This phased approach prevented downtime and allowed tuning of resource allocations before full production load.
The process required significant engineering bandwidth over several months. They identified natural seams in their application - like outbound calls - that could serve as safe migration starting points before moving more critical inbound call handling to the new infrastructure.
The solution reduced transcription errors caused by background noise by 40% while cutting cloud costs through dynamic scaling. Conversations flowed more naturally as agents stopped responding to irrelevant noises, significantly improving patient experience scores.
While some challenging cases remained (like multiple simultaneous conversations in noisy rooms), the overall improvement in call quality and reduction in frustrating "AI doesn't understand" moments made the technical investment worthwhile for both the business and its patients.
GrowwStacks helps businesses implement AI voice agents with proper noise filtration and Kubernetes scaling. We design custom solutions that balance performance and cost, from selecting the right noise models to configuring optimal autoscaling policies.
Whether you're building healthcare agents like Parakeet or voice AI for other industries, we can help you:
- Evaluate noise filtration options (open-source vs commercial)
- Design Kubernetes architectures for spiky AI workloads
- Implement gradual migration strategies to minimize risk
- Configure monitoring and alerting for production environments
Book a free consultation to discuss your specific voice AI requirements and challenges.
Ready to Build AI Voice Agents That Actually Understand Your Customers?
Don't let background noise undermine your AI investments. GrowwStacks can implement professional-grade noise filtration and Kubernetes scaling for your voice agents in as little as 4 weeks.