🔊 Text-to-Speech
🎯 Mục tiêu bài học
Sau bài học này, bạn sẽ:
✅ Hiểu các TTS Options: OpenAI, ElevenLabs, Google Cloud
✅ Tích hợp OpenAI TTS vào n8n workflows
✅ Biết cách chọn voice phù hợp cho use case
✅ Xây dựng AI Voice Response Pipeline
✅ Xử lý Long Text với chunking strategy
✅ Tạo Audio Newsletter generator
Biến text thành giọng nói tự nhiên. Dùng cho voice responses, audiobooks, accessibility.
🔍 TTS Options
Checkpoint
So sánh 3 TTS providers: OpenAI, ElevenLabs, Google Cloud. Khi nào nên dùng provider nào?
🛠️ OpenAI TTS trong n8n
1// HTTP Request node: OpenAI TTS API2// Method: POST3// URL: https://api.openai.com/v1/audio/speech45const ttsRequest = {6 model: "tts-1", // or "tts-1-hd" for higher quality7 input: $json.text,8 voice: "alloy", // Options: alloy, echo, fable, onyx, nova, shimmer9 response_format: "mp3", // mp3, opus, aac, flac10 speed: 1.0 // 0.25 to 4.011};1213// Headers:14// Authorization: Bearer $OPENAI_API_KEY15// Content-Type: application/json1617// Response: Binary audio data (mp3)Voice Options
| Voice | Characteristics | Best For |
|---|---|---|
| alloy | Neutral, balanced | General purpose |
| echo | Male, warm | Narration |
| fable | Expressive, British | Storytelling |
| onyx | Deep, authoritative | Business |
| nova | Female, friendly | Customer service |
| shimmer | Female, warm | Audiobooks |
Checkpoint
OpenAI TTS có những voices nào? Cách chọn voice phù hợp cho từng use case?
⚡ Basic TTS Workflow
1// Workflow: Convert text to speech and send via Telegram23// Step 1: Prepare text (clean, add pauses)4function prepareForTTS(text) {5 return text6 .replace(/\n\n/g, '... ') // Paragraphs to pauses7 .replace(/\n/g, '. ') // Line breaks to periods8 .replace(/[*_#]/g, '') // Remove markdown formatting9 .substring(0, 4096); // TTS limit per request10}1112// Step 2: Call TTS API (HTTP Request node)13// Step 3: Send audio via Telegram (Telegram node: Send Audio)Checkpoint
Tại sao cần prepare text trước khi gửi cho TTS? Text cần được clean như thế nào?
🎵 ElevenLabs Integration
1// HTTP Request node: ElevenLabs TTS2// More natural voices, voice cloning supported34const elevenLabsRequest = {5 url: `https://api.elevenlabs.io/v1/text-to-speech/${voiceId}`,6 method: "POST",7 headers: {8 "xi-api-key": process.env.ELEVENLABS_API_KEY,9 "Content-Type": "application/json"10 },11 body: {12 text: $json.text,13 model_id: "eleven_multilingual_v2",14 voice_settings: {15 stability: 0.5,16 similarity_boost: 0.75,17 style: 0.5,18 use_speaker_boost: true19 }20 }21};2223// Response: Binary audio streamCheckpoint
ElevenLabs có những tính năng gì mà OpenAI TTS không có? Voice settings ảnh hưởng output ra sao?
🤖 AI Voice Response Pipeline
1// Code node: Optimize text for speech2function optimizeForSpeech(text) {3 let speech = text;4 5 // Convert abbreviations to spoken form6 speech = speech.replace(/API/g, 'A P I');7 speech = speech.replace(/URL/g, 'U R L');8 speech = speech.replace(/SQL/g, 'sequel');9 10 // Add natural pauses11 speech = speech.replace(/\./g, '.\n');12 speech = speech.replace(/:/g, '...');13 14 // Remove elements that don't sound natural15 speech = speech.replace(/\[.*?\]/g, ''); // Remove citations16 speech = speech.replace(/\|/g, ', '); // Table pipes to commas17 18 return speech;19}2021return { json: { speechText: optimizeForSpeech($json.aiResponse) } };Checkpoint
Tại sao cần optimize text cho speech? Những abbreviations nào cần convert sang spoken form?
📄 Long Text Processing & Audio Newsletter
Long Text Processing
1// OpenAI TTS limit: 4096 characters per request2// For longer texts: split into chunks34function splitForTTS(text, maxChars = 4000) {5 const sentences = text.match(/[^.!?]+[.!?]+/g) || [text];6 const chunks = [];7 let current = '';8 9 for (const sentence of sentences) {10 if ((current + sentence).length > maxChars) {11 chunks.push(current.trim());12 current = sentence;13 } else {14 current += ' ' + sentence;15 }16 }17 if (current) chunks.push(current.trim());18 19 return chunks;20}2122// Process each chunk → Concatenate audio files23const chunks = splitForTTS($json.longText);24return chunks.map((c, i) => ({ json: { text: c, index: i } }));Audio Newsletter
1// Weekly: Convert newsletter to audio format2// Schedule (weekly) → Fetch newsletter → Clean text → TTS → Upload to S3 → Send link34const newsletterPrompt = `5Convert this newsletter into a natural script for audio reading.67Newsletter content:8${$json.newsletterContent}910Script rules:11- Add intro: "Welcome to this week's newsletter"12- Transition between sections: "Moving on to..."13- Add outro: "That's all for this week"14- Make it conversational, not robotic15- Remove URLs and links (say "check the link in the show notes")16`;| Provider | Quality | Cost per 1M chars | Vietnamese |
|---|---|---|---|
| OpenAI TTS-1 | Good | $15 | Limited |
| OpenAI TTS-1-HD | Better | $30 | Limited |
| ElevenLabs | Best | $30-99 | Good |
| Google Cloud | Good | $4-16 | Good |
Checkpoint
Khi text dài hơn 4096 chars, xử lý chunking như thế nào? So sánh cost giữa các TTS providers.
📚 Bài tập thực hành
- Build basic TTS workflow: text input, audio output
- Create voice response cho RAG chatbot
- Build audio newsletter generator
- Test different voices và compare quality
Checkpoint
Bạn đã build được TTS workflow chưa? Voice nào phù hợp nhất cho use case của bạn?
🚀 Bài tiếp theo
Bài tiếp theo: Voice Workflows →
