🗣️ Voice Workflows

🎯 Mục tiêu bài học

TB5 min

Sau bài học này, bạn sẽ:

✅ Hiểu Voice Pipeline end-to-end (STT → AI → TTS)

✅ Xây dựng Telegram Voice Assistant hoàn chỉnh

✅ Tạo Voice FAQ Bot với RAG pipeline

✅ Implement Multi-turn Voice Conversation với memory

✅ Build Voice-Triggered Automation với confirmation

✅ Áp dụng Voice Quality Optimization và Analytics

Kết hợp STT + AI + TTS để tạo voice-first workflows hoàn chỉnh. Build voice assistants, IVR systems, và voice bots.

Task 0

🔍 Voice Pipeline

TB5 min

Diagram

Đang vẽ diagram...

Checkpoint

Voice Pipeline gồm những layer nào? Vai trò của NLU (Natural Language Understanding) là gì?

Task 1

🤖 Telegram Voice Assistant

TB5 min

Diagram

Đang vẽ diagram...

Implementation

JavaScript

1// Node 1: Telegram Trigger (on voice message)
2// Telegram sends voice messages as .ogg files
3
4// Node 2: HTTP Request - Download voice file
5// URL: https://api.telegram.org/file/bot<TOKEN>/<file_path>
6
7// Node 3: OpenAI Whisper - Transcribe
8// Input: downloaded audio file
9// Output: { text: "..." }
10
11// Node 4: AI Agent - Process
12const processPrompt = `
13User said (voice): "${$json.transcript}"
14
15You are a voice assistant. Provide a concise, 
16spoken-friendly response (max 3 sentences).
17Avoid lists, tables, or formatting.
18Respond in the same language as the user.
19`;
20
21// Node 5: OpenAI TTS - Generate speech
22// Input: AI response text
23// Voice: nova (friendly)
24// Format: ogg (Telegram preferred)
25
26// Node 6: Telegram - Send Voice Message
27// Chat ID: original chat id
28// Audio: TTS output file

Checkpoint

Telegram Voice Assistant gồm những nodes nào? Tại sao response cần spoken-friendly format?

Task 2

💡 Voice FAQ Bot

TB5 min

JavaScript

1// Process voice questions against FAQ database
2
3// Step 1: Transcribe
4const transcript = $json.whisperOutput.text;
5
6// Step 2: Search FAQ (Vector Store)
7// Query: transcript
8
9// Step 3: Generate spoken answer
10const answerPrompt = `
11Question (from voice): "${transcript}"
12
13Answer from knowledge base:
14${$json.faqResult}
15
16Create a natural spoken response:
17- Direct and concise (30 seconds max when spoken)
18- Conversational tone
19- No bullet points or formatting
20- End with "Is there anything else I can help with?"
21`;
22
23// Step 4: Convert to speech and send back

Checkpoint

Voice FAQ Bot kết hợp RAG và TTS như thế nào? Response cần optimize cho voice output ra sao?

Task 3

🧠 Multi-Turn Voice Conversation

TB5 min

JavaScript

1// Manage multi-turn voice conversations
2// Using session memory
3
4// Code node: Session manager
5const sessionId = `voice_${$json.chatId}`;
6
7// Load conversation history from memory
8// Add new transcript
9// Send full context to AI Agent
10// Save response to memory
11
12const conversationContext = `
13Previous conversation:
14${$json.history.map(m => `${m.role}: ${m.content}`).join('\n')}
15
16New voice message: "${$json.transcript}"
17
18Continue the conversation naturally.
19Remember previous context.
20`;

Checkpoint

Multi-turn Voice Conversation quản lý session memory như thế nào? Tại sao cần conversation context?

Task 4

⚡ Voice-Triggered Automation

TB5 min

Diagram

Đang vẽ diagram...

JavaScript

1// Voice command with confirmation 
2// Step 1: Parse command
3const parseResult = $json.parsedCommand;
4
5if (parseResult.requiresConfirmation) {
6  // Generate confirmation question
7  const confirmText = `I'm about to ${parseResult.action}. 
8  ${parseResult.summary}. Should I proceed?`;
9  
10  // TTS → Send voice confirmation → Wait for response
11  // Next voice message: "yes" or "no"
12  return { json: { 
13    needsConfirmation: true, 
14    confirmText,
15    pendingAction: parseResult 
16  }};
17}

Checkpoint

Voice-Triggered Automation cần confirmation step ở đâu? Tại sao destructive actions phải confirm?

Task 5

🔧 Voice Quality Optimization & Analytics

TB5 min

Voice Quality Optimization

JavaScript

1// Code node: Optimize voice pipeline
2
3// 1. Audio preprocessing
4// - Convert to wav 16kHz mono for Whisper
5// - Remove background noise (if possible)
6
7// 2. Transcription optimization
8// - Set language hint for better accuracy
9// - Use prompt field for domain-specific terms:
10const whisperPrompt = "n8n, workflow, automation, API, webhook";
11
12// 3. Response optimization for speech
13function optimizeForVoice(text) {
14  return text
15    .replace(/\d+\.\d+/g, match => `${match.replace('.', ' point ')}`)
16    .replace(/(\d{4})/g, match => match.split('').join(' '))
17    .replace(/etc\./g, 'et cetera')
18    .replace(/i\.e\./g, 'that is')
19    .replace(/e\.g\./g, 'for example');
20}
21
22// 4. TTS optimization
23// - Use HD model for important responses
24// - Match voice to context (professional vs friendly)
25// - Speed 0.9-1.1 for best comprehension

Voice Analytics

JavaScript

1// Track voice interaction metrics
2const analytics = {
3  timestamp: new Date().toISOString(),
4  userId: $json.userId,
5  transcriptionLanguage: $json.language,
6  transcriptionConfidence: $json.confidence,
7  intent: $json.parsedIntent,
8  responseTime: $json.processingMs,
9  userSatisfaction: $json.feedback // from follow-up
10};
11
12// Save to analytics sheet
13return { json: analytics };

Voice UX Tips

Keep responses short: 15-30 seconds max when spoken
Confirm actions: Always confirm before destructive actions
Error recovery: If STT fails, ask user to repeat
Feedback: Play a "processing" sound while AI thinks
Language matching: Respond in same language as user

Checkpoint

Những kỹ thuật nào giúp optimize voice quality? Voice Analytics track những metrics gì?

Task 6

📚 Bài tập thực hành

TB5 min

Exercises

Build Telegram voice assistant (STT, AI, TTS)
Create voice FAQ bot với RAG pipeline
Implement voice command system với confirmation
Build multi-turn voice conversation với memory

Checkpoint

Bạn đã build được voice assistant hoàn chỉnh chưa? Voice pipeline có response time chấp nhận được không?

Task 7

🚀 Bài tiếp theo

Bài tiếp theo: Capstone Project →

Voice Workflows

🗣️ Voice Workflows

🎯 Mục tiêu bài học

🔍 Voice Pipeline

Checkpoint

🤖 Telegram Voice Assistant

Implementation

Checkpoint

💡 Voice FAQ Bot

Checkpoint

🧠 Multi-Turn Voice Conversation

Checkpoint

⚡ Voice-Triggered Automation

Checkpoint

🔧 Voice Quality Optimization & Analytics

Voice Quality Optimization

Voice Analytics

Checkpoint

📚 Bài tập thực hành

Checkpoint

🚀 Bài tiếp theo

Khóa học

Mentor & Hỗ trợ

Blog

Giới thiệu