🗣️ Voice Workflows
🎯 Mục tiêu bài học
Sau bài học này, bạn sẽ:
✅ Hiểu Voice Pipeline end-to-end (STT → AI → TTS)
✅ Xây dựng Telegram Voice Assistant hoàn chỉnh
✅ Tạo Voice FAQ Bot với RAG pipeline
✅ Implement Multi-turn Voice Conversation với memory
✅ Build Voice-Triggered Automation với confirmation
✅ Áp dụng Voice Quality Optimization và Analytics
Kết hợp STT + AI + TTS để tạo voice-first workflows hoàn chỉnh. Build voice assistants, IVR systems, và voice bots.
🔍 Voice Pipeline
Checkpoint
Voice Pipeline gồm những layer nào? Vai trò của NLU (Natural Language Understanding) là gì?
🤖 Telegram Voice Assistant
Implementation
1// Node 1: Telegram Trigger (on voice message)2// Telegram sends voice messages as .ogg files34// Node 2: HTTP Request - Download voice file5// URL: https://api.telegram.org/file/bot<TOKEN>/<file_path>67// Node 3: OpenAI Whisper - Transcribe8// Input: downloaded audio file9// Output: { text: "..." }1011// Node 4: AI Agent - Process12const processPrompt = `13User said (voice): "${$json.transcript}"1415You are a voice assistant. Provide a concise, 16spoken-friendly response (max 3 sentences).17Avoid lists, tables, or formatting.18Respond in the same language as the user.19`;2021// Node 5: OpenAI TTS - Generate speech22// Input: AI response text23// Voice: nova (friendly)24// Format: ogg (Telegram preferred)2526// Node 6: Telegram - Send Voice Message27// Chat ID: original chat id28// Audio: TTS output fileCheckpoint
Telegram Voice Assistant gồm những nodes nào? Tại sao response cần spoken-friendly format?
💡 Voice FAQ Bot
1// Process voice questions against FAQ database23// Step 1: Transcribe4const transcript = $json.whisperOutput.text;56// Step 2: Search FAQ (Vector Store)7// Query: transcript89// Step 3: Generate spoken answer10const answerPrompt = `11Question (from voice): "${transcript}"1213Answer from knowledge base:14${$json.faqResult}1516Create a natural spoken response:17- Direct and concise (30 seconds max when spoken)18- Conversational tone19- No bullet points or formatting20- End with "Is there anything else I can help with?"21`;2223// Step 4: Convert to speech and send backCheckpoint
Voice FAQ Bot kết hợp RAG và TTS như thế nào? Response cần optimize cho voice output ra sao?
🧠 Multi-Turn Voice Conversation
1// Manage multi-turn voice conversations2// Using session memory34// Code node: Session manager5const sessionId = `voice_${$json.chatId}`;67// Load conversation history from memory8// Add new transcript9// Send full context to AI Agent10// Save response to memory1112const conversationContext = `13Previous conversation:14${$json.history.map(m => `${m.role}: ${m.content}`).join('\n')}1516New voice message: "${$json.transcript}"1718Continue the conversation naturally.19Remember previous context.20`;Checkpoint
Multi-turn Voice Conversation quản lý session memory như thế nào? Tại sao cần conversation context?
⚡ Voice-Triggered Automation
1// Voice command with confirmation 2// Step 1: Parse command3const parseResult = $json.parsedCommand;45if (parseResult.requiresConfirmation) {6 // Generate confirmation question7 const confirmText = `I'm about to ${parseResult.action}. 8 ${parseResult.summary}. Should I proceed?`;9 10 // TTS → Send voice confirmation → Wait for response11 // Next voice message: "yes" or "no"12 return { json: { 13 needsConfirmation: true, 14 confirmText,15 pendingAction: parseResult 16 }};17}Checkpoint
Voice-Triggered Automation cần confirmation step ở đâu? Tại sao destructive actions phải confirm?
🔧 Voice Quality Optimization & Analytics
Voice Quality Optimization
1// Code node: Optimize voice pipeline23// 1. Audio preprocessing4// - Convert to wav 16kHz mono for Whisper5// - Remove background noise (if possible)67// 2. Transcription optimization8// - Set language hint for better accuracy9// - Use prompt field for domain-specific terms:10const whisperPrompt = "n8n, workflow, automation, API, webhook";1112// 3. Response optimization for speech13function optimizeForVoice(text) {14 return text15 .replace(/\d+\.\d+/g, match => `${match.replace('.', ' point ')}`)16 .replace(/(\d{4})/g, match => match.split('').join(' '))17 .replace(/etc\./g, 'et cetera')18 .replace(/i\.e\./g, 'that is')19 .replace(/e\.g\./g, 'for example');20}2122// 4. TTS optimization23// - Use HD model for important responses24// - Match voice to context (professional vs friendly)25// - Speed 0.9-1.1 for best comprehensionVoice Analytics
1// Track voice interaction metrics2const analytics = {3 timestamp: new Date().toISOString(),4 userId: $json.userId,5 transcriptionLanguage: $json.language,6 transcriptionConfidence: $json.confidence,7 intent: $json.parsedIntent,8 responseTime: $json.processingMs,9 userSatisfaction: $json.feedback // from follow-up10};1112// Save to analytics sheet13return { json: analytics };- Keep responses short: 15-30 seconds max when spoken
- Confirm actions: Always confirm before destructive actions
- Error recovery: If STT fails, ask user to repeat
- Feedback: Play a "processing" sound while AI thinks
- Language matching: Respond in same language as user
Checkpoint
Những kỹ thuật nào giúp optimize voice quality? Voice Analytics track những metrics gì?
📚 Bài tập thực hành
- Build Telegram voice assistant (STT, AI, TTS)
- Create voice FAQ bot với RAG pipeline
- Implement voice command system với confirmation
- Build multi-turn voice conversation với memory
Checkpoint
Bạn đã build được voice assistant hoàn chỉnh chưa? Voice pipeline có response time chấp nhận được không?
🚀 Bài tiếp theo
Bài tiếp theo: Capstone Project →
