What This Workflow Does
This n8n template creates a sophisticated multimodal Telegram bot that intelligently responds to users based on their input method. When users send voice messages, the bot transcribes them using ElevenLabs speech-to-text, processes the query with AI (Groq or Gemini via LangChain agents), and replies with natural-sounding voice generated by ElevenLabs text-to-speech. For text messages, it bypasses the audio processing for faster text responses.
The workflow maintains conversation context and can integrate custom tools like database lookups, API calls, or calculations through LangChain agents. This creates a Siri-like experience within Telegram that can handle customer support, educational tutoring, crypto analytics, or multilingual FAQ automation with human-like interaction quality.
How It Works
1. Message Detection & Routing
The Telegram trigger node listens for incoming messages and voice notes. It immediately identifies the message type and routes it through the appropriate processing path—voice messages go through the STT/TTS pipeline, while text messages take the faster direct AI processing route.
2. Voice Processing Pipeline
Voice messages are sent to ElevenLabs for transcription. The resulting text is then passed to the LangChain agent, which can use various tools (database queries, API calls, calculations) to generate a comprehensive response. This response is converted back to natural speech using ElevenLabs TTS before being sent to the user.
3. AI Agent Processing
The LangChain agent node serves as the brain of the operation. It maintains conversation memory, selects appropriate tools based on the query, and generates contextually relevant responses. The system message can be customized to give the AI a specific personality or area of expertise.
4. Multilingual Support & Context Management
The workflow automatically detects the user's language from Telegram metadata and adjusts responses accordingly. Conversation history is maintained in session memory, allowing for coherent multi-turn dialogues that remember previous exchanges and context.
Who This Is For
This template is ideal for businesses offering 24/7 customer support across multiple languages, educational platforms providing voice-interactive tutoring, crypto/analytics services needing voice-enabled query systems, and any organization wanting to engage users through natural voice conversations. It's particularly valuable for companies with international audiences who need consistent quality support across different languages without hiring multilingual staff.
What You'll Need
- Telegram Bot Token – Create via @BotFather on Telegram
- ElevenLabs API Key – For high-quality speech-to-text and text-to-speech
- AI Model API Key – Groq, Google Gemini, or alternative (OpenAI, Anthropic, etc.)
- Self-hosted n8n instance – Required for community nodes compatibility
- Optional: Custom tools – Database connections, external APIs, or custom functions you want the agent to use
Quick Setup Guide
- Download the template and import it into your n8n instance
- Configure the Telegram trigger node with your bot token
- Set up credentials for ElevenLabs and your chosen AI model (Groq/Gemini)
- Customize the system message in the LangChain agent node to define your bot's personality and capabilities
- Test with simple voice and text messages to verify both pipelines work correctly
- Add custom tools to the LangChain agent if needed (database connections, API integrations)
- Deploy and share your bot's username with users
Pro tip: Start with text-only functionality first to ensure the AI agent works correctly, then add the voice pipeline. This makes debugging much easier and ensures you have a functional bot even if there are temporary issues with the TTS/STT services.
Key Benefits
Reduce support costs by 30-50% while providing 24/7 multilingual assistance. The AI handles routine queries instantly, freeing human agents for complex issues that require personal attention.
Improve customer satisfaction with natural voice interactions that feel more personal than text-only chatbots. ElevenLabs' human-like TTS creates engaging experiences that users prefer for quick queries.
Scale instantly during peak periods without additional staffing. The bot can handle thousands of simultaneous conversations, ensuring consistent response times even during surges in demand.
Maintain consistent quality across all languages with automated translation and voice generation. No need to hire and train multilingual support staff for each target market.
Extend functionality easily by adding custom tools to the LangChain agent. Connect to your CRM, knowledge base, inventory systems, or any API to create a truly integrated assistant.