MinAI - Về trang chủ
Lý thuyết
11/1335 phút
Đang tải...

Voice Workflows

Xây dựng end-to-end voice workflows - voice assistants, IVR, voice bots

🗣️ Voice Workflows

0

🎯 Mục tiêu bài học

TB5 min

Sau bài học này, bạn sẽ:

✅ Hiểu Voice Pipeline end-to-end (STT → AI → TTS)

✅ Xây dựng Telegram Voice Assistant hoàn chỉnh

✅ Tạo Voice FAQ Bot với RAG pipeline

✅ Implement Multi-turn Voice Conversation với memory

✅ Build Voice-Triggered Automation với confirmation

✅ Áp dụng Voice Quality Optimization và Analytics

Kết hợp STT + AI + TTS để tạo voice-first workflows hoàn chỉnh. Build voice assistants, IVR systems, và voice bots.

1

🔍 Voice Pipeline

TB5 min
Diagram
Đang vẽ diagram...

Checkpoint

Voice Pipeline gồm những layer nào? Vai trò của NLU (Natural Language Understanding) là gì?

2

🤖 Telegram Voice Assistant

TB5 min
Diagram
Đang vẽ diagram...

Implementation

JavaScript
1// Node 1: Telegram Trigger (on voice message)
2// Telegram sends voice messages as .ogg files
3
4// Node 2: HTTP Request - Download voice file
5// URL: https://api.telegram.org/file/bot<TOKEN>/<file_path>
6
7// Node 3: OpenAI Whisper - Transcribe
8// Input: downloaded audio file
9// Output: { text: "..." }
10
11// Node 4: AI Agent - Process
12const processPrompt = `
13User said (voice): "${$json.transcript}"
14
15You are a voice assistant. Provide a concise,
16spoken-friendly response (max 3 sentences).
17Avoid lists, tables, or formatting.
18Respond in the same language as the user.
19`;
20
21// Node 5: OpenAI TTS - Generate speech
22// Input: AI response text
23// Voice: nova (friendly)
24// Format: ogg (Telegram preferred)
25
26// Node 6: Telegram - Send Voice Message
27// Chat ID: original chat id
28// Audio: TTS output file

Checkpoint

Telegram Voice Assistant gồm những nodes nào? Tại sao response cần spoken-friendly format?

3

💡 Voice FAQ Bot

TB5 min
JavaScript
1// Process voice questions against FAQ database
2
3// Step 1: Transcribe
4const transcript = $json.whisperOutput.text;
5
6// Step 2: Search FAQ (Vector Store)
7// Query: transcript
8
9// Step 3: Generate spoken answer
10const answerPrompt = `
11Question (from voice): "${transcript}"
12
13Answer from knowledge base:
14${$json.faqResult}
15
16Create a natural spoken response:
17- Direct and concise (30 seconds max when spoken)
18- Conversational tone
19- No bullet points or formatting
20- End with "Is there anything else I can help with?"
21`;
22
23// Step 4: Convert to speech and send back

Checkpoint

Voice FAQ Bot kết hợp RAG và TTS như thế nào? Response cần optimize cho voice output ra sao?

4

🧠 Multi-Turn Voice Conversation

TB5 min
JavaScript
1// Manage multi-turn voice conversations
2// Using session memory
3
4// Code node: Session manager
5const sessionId = `voice_${$json.chatId}`;
6
7// Load conversation history from memory
8// Add new transcript
9// Send full context to AI Agent
10// Save response to memory
11
12const conversationContext = `
13Previous conversation:
14${$json.history.map(m => `${m.role}: ${m.content}`).join('\n')}
15
16New voice message: "${$json.transcript}"
17
18Continue the conversation naturally.
19Remember previous context.
20`;

Checkpoint

Multi-turn Voice Conversation quản lý session memory như thế nào? Tại sao cần conversation context?

5

⚡ Voice-Triggered Automation

TB5 min
Diagram
Đang vẽ diagram...
JavaScript
1// Voice command with confirmation
2// Step 1: Parse command
3const parseResult = $json.parsedCommand;
4
5if (parseResult.requiresConfirmation) {
6 // Generate confirmation question
7 const confirmText = `I'm about to ${parseResult.action}.
8 ${parseResult.summary}. Should I proceed?`;
9
10 // TTS → Send voice confirmation → Wait for response
11 // Next voice message: "yes" or "no"
12 return { json: {
13 needsConfirmation: true,
14 confirmText,
15 pendingAction: parseResult
16 }};
17}

Checkpoint

Voice-Triggered Automation cần confirmation step ở đâu? Tại sao destructive actions phải confirm?

6

🔧 Voice Quality Optimization & Analytics

TB5 min

Voice Quality Optimization

JavaScript
1// Code node: Optimize voice pipeline
2
3// 1. Audio preprocessing
4// - Convert to wav 16kHz mono for Whisper
5// - Remove background noise (if possible)
6
7// 2. Transcription optimization
8// - Set language hint for better accuracy
9// - Use prompt field for domain-specific terms:
10const whisperPrompt = "n8n, workflow, automation, API, webhook";
11
12// 3. Response optimization for speech
13function optimizeForVoice(text) {
14 return text
15 .replace(/\d+\.\d+/g, match => `${match.replace('.', ' point ')}`)
16 .replace(/(\d{4})/g, match => match.split('').join(' '))
17 .replace(/etc\./g, 'et cetera')
18 .replace(/i\.e\./g, 'that is')
19 .replace(/e\.g\./g, 'for example');
20}
21
22// 4. TTS optimization
23// - Use HD model for important responses
24// - Match voice to context (professional vs friendly)
25// - Speed 0.9-1.1 for best comprehension

Voice Analytics

JavaScript
1// Track voice interaction metrics
2const analytics = {
3 timestamp: new Date().toISOString(),
4 userId: $json.userId,
5 transcriptionLanguage: $json.language,
6 transcriptionConfidence: $json.confidence,
7 intent: $json.parsedIntent,
8 responseTime: $json.processingMs,
9 userSatisfaction: $json.feedback // from follow-up
10};
11
12// Save to analytics sheet
13return { json: analytics };
Voice UX Tips
  • Keep responses short: 15-30 seconds max when spoken
  • Confirm actions: Always confirm before destructive actions
  • Error recovery: If STT fails, ask user to repeat
  • Feedback: Play a "processing" sound while AI thinks
  • Language matching: Respond in same language as user

Checkpoint

Những kỹ thuật nào giúp optimize voice quality? Voice Analytics track những metrics gì?

7

📚 Bài tập thực hành

TB5 min
Exercises
  1. Build Telegram voice assistant (STT, AI, TTS)
  2. Create voice FAQ bot với RAG pipeline
  3. Implement voice command system với confirmation
  4. Build multi-turn voice conversation với memory

Checkpoint

Bạn đã build được voice assistant hoàn chỉnh chưa? Voice pipeline có response time chấp nhận được không?

🚀 Bài tiếp theo

Bài tiếp theo: Capstone Project →