MinAI - Về trang chủ
Lý thuyết
10/1330 phút
Đang tải...

Text-to-Speech

Tích hợp Text-to-Speech vào n8n workflows - OpenAI TTS, ElevenLabs

🔊 Text-to-Speech

0

🎯 Mục tiêu bài học

TB5 min

Sau bài học này, bạn sẽ:

✅ Hiểu các TTS Options: OpenAI, ElevenLabs, Google Cloud

✅ Tích hợp OpenAI TTS vào n8n workflows

✅ Biết cách chọn voice phù hợp cho use case

✅ Xây dựng AI Voice Response Pipeline

✅ Xử lý Long Text với chunking strategy

✅ Tạo Audio Newsletter generator

Biến text thành giọng nói tự nhiên. Dùng cho voice responses, audiobooks, accessibility.

1

🔍 TTS Options

TB5 min
Diagram
Đang vẽ diagram...

Checkpoint

So sánh 3 TTS providers: OpenAI, ElevenLabs, Google Cloud. Khi nào nên dùng provider nào?

2

🛠️ OpenAI TTS trong n8n

TB5 min
JavaScript
1// HTTP Request node: OpenAI TTS API
2// Method: POST
3// URL: https://api.openai.com/v1/audio/speech
4
5const ttsRequest = {
6 model: "tts-1", // or "tts-1-hd" for higher quality
7 input: $json.text,
8 voice: "alloy", // Options: alloy, echo, fable, onyx, nova, shimmer
9 response_format: "mp3", // mp3, opus, aac, flac
10 speed: 1.0 // 0.25 to 4.0
11};
12
13// Headers:
14// Authorization: Bearer $OPENAI_API_KEY
15// Content-Type: application/json
16
17// Response: Binary audio data (mp3)

Voice Options

VoiceCharacteristicsBest For
alloyNeutral, balancedGeneral purpose
echoMale, warmNarration
fableExpressive, BritishStorytelling
onyxDeep, authoritativeBusiness
novaFemale, friendlyCustomer service
shimmerFemale, warmAudiobooks

Checkpoint

OpenAI TTS có những voices nào? Cách chọn voice phù hợp cho từng use case?

3

⚡ Basic TTS Workflow

TB5 min
Diagram
Đang vẽ diagram...
JavaScript
1// Workflow: Convert text to speech and send via Telegram
2
3// Step 1: Prepare text (clean, add pauses)
4function prepareForTTS(text) {
5 return text
6 .replace(/\n\n/g, '... ') // Paragraphs to pauses
7 .replace(/\n/g, '. ') // Line breaks to periods
8 .replace(/[*_#]/g, '') // Remove markdown formatting
9 .substring(0, 4096); // TTS limit per request
10}
11
12// Step 2: Call TTS API (HTTP Request node)
13// Step 3: Send audio via Telegram (Telegram node: Send Audio)

Checkpoint

Tại sao cần prepare text trước khi gửi cho TTS? Text cần được clean như thế nào?

4

🎵 ElevenLabs Integration

TB5 min
JavaScript
1// HTTP Request node: ElevenLabs TTS
2// More natural voices, voice cloning supported
3
4const elevenLabsRequest = {
5 url: `https://api.elevenlabs.io/v1/text-to-speech/${voiceId}`,
6 method: "POST",
7 headers: {
8 "xi-api-key": process.env.ELEVENLABS_API_KEY,
9 "Content-Type": "application/json"
10 },
11 body: {
12 text: $json.text,
13 model_id: "eleven_multilingual_v2",
14 voice_settings: {
15 stability: 0.5,
16 similarity_boost: 0.75,
17 style: 0.5,
18 use_speaker_boost: true
19 }
20 }
21};
22
23// Response: Binary audio stream

Checkpoint

ElevenLabs có những tính năng gì mà OpenAI TTS không có? Voice settings ảnh hưởng output ra sao?

5

🤖 AI Voice Response Pipeline

TB5 min
Diagram
Đang vẽ diagram...
JavaScript
1// Code node: Optimize text for speech
2function optimizeForSpeech(text) {
3 let speech = text;
4
5 // Convert abbreviations to spoken form
6 speech = speech.replace(/API/g, 'A P I');
7 speech = speech.replace(/URL/g, 'U R L');
8 speech = speech.replace(/SQL/g, 'sequel');
9
10 // Add natural pauses
11 speech = speech.replace(/\./g, '.\n');
12 speech = speech.replace(/:/g, '...');
13
14 // Remove elements that don't sound natural
15 speech = speech.replace(/\[.*?\]/g, ''); // Remove citations
16 speech = speech.replace(/\|/g, ', '); // Table pipes to commas
17
18 return speech;
19}
20
21return { json: { speechText: optimizeForSpeech($json.aiResponse) } };

Checkpoint

Tại sao cần optimize text cho speech? Những abbreviations nào cần convert sang spoken form?

6

📄 Long Text Processing & Audio Newsletter

TB5 min

Long Text Processing

JavaScript
1// OpenAI TTS limit: 4096 characters per request
2// For longer texts: split into chunks
3
4function splitForTTS(text, maxChars = 4000) {
5 const sentences = text.match(/[^.!?]+[.!?]+/g) || [text];
6 const chunks = [];
7 let current = '';
8
9 for (const sentence of sentences) {
10 if ((current + sentence).length > maxChars) {
11 chunks.push(current.trim());
12 current = sentence;
13 } else {
14 current += ' ' + sentence;
15 }
16 }
17 if (current) chunks.push(current.trim());
18
19 return chunks;
20}
21
22// Process each chunk → Concatenate audio files
23const chunks = splitForTTS($json.longText);
24return chunks.map((c, i) => ({ json: { text: c, index: i } }));

Audio Newsletter

JavaScript
1// Weekly: Convert newsletter to audio format
2// Schedule (weekly) → Fetch newsletter → Clean text → TTS → Upload to S3 → Send link
3
4const newsletterPrompt = `
5Convert this newsletter into a natural script for audio reading.
6
7Newsletter content:
8${$json.newsletterContent}
9
10Script rules:
11- Add intro: "Welcome to this week's newsletter"
12- Transition between sections: "Moving on to..."
13- Add outro: "That's all for this week"
14- Make it conversational, not robotic
15- Remove URLs and links (say "check the link in the show notes")
16`;
TTS Cost Comparison
ProviderQualityCost per 1M charsVietnamese
OpenAI TTS-1Good$15Limited
OpenAI TTS-1-HDBetter$30Limited
ElevenLabsBest$30-99Good
Google CloudGood$4-16Good

Checkpoint

Khi text dài hơn 4096 chars, xử lý chunking như thế nào? So sánh cost giữa các TTS providers.

7

📚 Bài tập thực hành

TB5 min
Exercises
  1. Build basic TTS workflow: text input, audio output
  2. Create voice response cho RAG chatbot
  3. Build audio newsletter generator
  4. Test different voices và compare quality

Checkpoint

Bạn đã build được TTS workflow chưa? Voice nào phù hợp nhất cho use case của bạn?

🚀 Bài tiếp theo

Bài tiếp theo: Voice Workflows →