MinAI - Về trang chủ
Lý thuyết
9/1330 phút
Đang tải...

Speech-to-Text

Tích hợp Speech-to-Text vào n8n workflows - Whisper, audio processing

🎤 Speech-to-Text

0

🎯 Mục tiêu bài học

TB5 min

Sau bài học này, bạn sẽ:

✅ Hiểu STT Architecture và flow xử lý audio

✅ Tích hợp OpenAI Whisper vào n8n workflows

✅ Xây dựng Voice Command Workflow

✅ Tạo Meeting Transcription pipeline

✅ Xử lý multi-language audio

✅ Build Voice Note Processor với auto-classification

Biến audio thành text trong n8n workflows. Dùng cho voice commands, transcription, meeting notes.

1

🔍 STT Architecture

TB5 min
Diagram
Đang vẽ diagram...

Checkpoint

STT pipeline gồm những bước nào? Audio input có thể đến từ những nguồn nào?

2

🛠️ OpenAI Whisper trong n8n

TB5 min

Setup

JavaScript
1// OpenAI node: Audio Transcription
2// Model: whisper-1
3// Input: Audio file (mp3, wav, m4a, webm)
4// Max file size: 25MB
5// Supported languages: 98+ languages
6
7// Configuration:
8// - Model: whisper-1
9// - Response Format: json (or text, srt, vtt)
10// - Language: auto-detect or specify (vi, en, ja, etc.)
11// - Temperature: 0 (for most accurate)

Basic Transcription Workflow

Diagram
Đang vẽ diagram...
JavaScript
1// Webhook receives audio file
2// OpenAI Whisper transcribes
3// Return transcript
4
5// Result format:
6{
7 "text": "Xin chào, tôi cần hỗ trợ về đơn hàng số 12345.",
8 "language": "vi",
9 "duration": 5.2
10}

Checkpoint

Whisper hỗ trợ những audio formats nào? Cách cấu hình để có accuracy cao nhất?

3

⚡ Voice Command Workflow

TB5 min
Diagram
Đang vẽ diagram...
JavaScript
1// Code node: Parse voice command
2const transcript = $json.text;
3
4const parsePrompt = `
5Parse this voice command and extract the intent and parameters:
6
7Voice: "${transcript}"
8
9Possible intents:
10- send_email (to, subject, body)
11- create_task (title, due_date, priority)
12- search (query)
13- question (question_text)
14- reminder (text, time)
15
16Return JSON:
17{
18 "intent": "...",
19 "params": {...},
20 "confidence": 0.0-1.0
21}`;
22
23return { json: { prompt: parsePrompt } };

Checkpoint

Voice Command Workflow parse intent như thế nào? Những intents phổ biến nào cần hỗ trợ?

4

📝 Meeting Transcription

TB5 min
JavaScript
1// Workflow: Transcribe meeting audio → Summary → Action items
2
3// Step 1: Transcribe
4// OpenAI Whisper node
5
6// Step 2: Summarize
7const summaryPrompt = `
8Transcribe and summarize this meeting:
9
10Transcript:
11${$json.transcript}
12
13Generate:
141. Meeting Summary (3-5 sentences)
152. Key Decisions Made
163. Action Items (with assigned person if mentioned)
174. Follow-up Topics
185. Next Steps
19
20Format as Markdown.`;
21
22// Step 3: Extract action items
23const actionPrompt = `
24From this meeting transcript, extract all action items:
25
26${$json.transcript}
27
28Return JSON array:
29[
30 {
31 "task": "description",
32 "assignedTo": "person name or unassigned",
33 "deadline": "mentioned deadline or none",
34 "priority": "high/medium/low"
35 }
36]`;

Checkpoint

Meeting Transcription pipeline gồm những bước nào? Cách extract action items từ transcript?

5

🌍 Audio File Processing & Multi-Language

TB5 min

Audio File Processing

JavaScript
1// Code node: Handle different audio sources
2
3// Source 1: Direct upload via webhook
4// Content-Type: multipart/form-data
5
6// Source 2: Download from URL
7// HTTP Request node → Download audio file
8
9// Source 3: Record from Telegram voice message
10// Telegram trigger → Download voice file
11
12// Source 4: Google Drive audio
13// Google Drive node → Download file
14
15// All feed into Whisper for transcription

Multi-Language Support

JavaScript
1// Whisper supports 98+ languages
2// Auto-detection is usually accurate
3
4// For explicit language setting:
5// OpenAI node → Language: "vi" (Vietnamese)
6
7// For multi-language meetings:
8const postProcessPrompt = `
9This meeting transcript contains multiple languages.
10Identify each speaker's language and translate everything to Vietnamese.
11
12Transcript: ${$json.transcript}
13
14Output format:
15[Speaker 1 (English)]: Original -> Translation
16[Speaker 2 (Vietnamese)]: Text as-is
17`;

Checkpoint

Whisper hỗ trợ bao nhiêu ngôn ngữ? Xử lý multi-language audio như thế nào?

6

📋 Voice Note Processing

TB5 min
Diagram
Đang vẽ diagram...
JavaScript
1// Quick voice note processor
2// "Email John about the meeting tomorrow at 3pm"
3// → Intent: email
4// → To: John
5// → Subject: Meeting tomorrow
6// → Time context: 3pm
7
8// "Remind me to call the client on Friday"
9// → Intent: reminder
10// → Task: Call client
11// → When: Friday
STT Tips
  • Audio quality: Mic quality ảnh hưởng lớn đến accuracy
  • File size: Max 25MB cho Whisper; split files lớn hơn
  • Language hint: Specify language nếu biết trước, tăng accuracy
  • Post-processing: Luôn có AI post-process để fix transcription errors

Checkpoint

Voice Note Processor classify intents dựa trên gì? Những tips nào giúp tăng STT accuracy?

7

📚 Bài tập thực hành

TB5 min
Exercises
  1. Build basic STT workflow: upload audio, get transcript
  2. Create voice command parser (email, task, search)
  3. Build meeting transcription, summary workflow
  4. Create voice note processor với auto-classification

Checkpoint

Bạn đã build được STT workflow hoàn chỉnh chưa? Voice command parser có phân loại đúng intent không?

🚀 Bài tiếp theo

Bài tiếp theo: Text-to-Speech →