Data Transformation
Data từ APIs/databases thường không đúng format bạn cần. Bài này dạy bạn transform data giữa các nodes.
🎯 Mục tiêu bài học
🎯 Mục tiêu
- Set node: Reshape data
- Merge node: Combine data sources
- Split/Aggregate: Handle arrays
- Filter & Sort: Clean data
- Build ETL pipelines
Checkpoint
Data transformation là gì? Tại sao cần transform data giữa các nodes?
📝 Set Node — Reshape Data
1. Set Node — Reshape Data
1.1 What is Set Node?
Set node tạo output mới với các fields bạn chọn — rename, reformat, add, remove fields.
1.2 Keep Only Selected Fields
1Input:2{3 "id": 1,4 "firstName": "Nguyen",5 "lastName": "Van A",6 "email": "a@email.com",7 "password": "hashed123",8 "internal_id": "INT001"9}10 11Set node (Keep Only Set):12- name: "{{ $json.firstName }} {{ $json.lastName }}"13- email: "{{ $json.email }}"14- fullName: "{{ $json.firstName + ' ' + $json.lastName }}"15 16Output:17{18 "name": "Nguyen Van A",19 "email": "a@email.com",20 "fullName": "Nguyen Van A"21}1.3 Add Calculated Fields
1Set node:2- total: {{ $json.price * $json.quantity }}3- taxAmount: {{ $json.price * $json.quantity * 0.1 }}4- grandTotal: {{ $json.price * $json.quantity * 1.1 }}5- status: {{ $json.quantity > 0 ? 'In Stock' : 'Out of Stock' }}6- processedAt: {{ $now.format('yyyy-MM-dd HH:mm') }}1.4 Rename Fields (API → Database mapping)
1API returns:2{ "user_name": "A", "user_email": "a@b.com", "created_at": "..." }3 4Set node:5- name: {{ $json.user_name }}6- email: {{ $json.user_email }}7- createdAt: {{ $json.created_at }}8 9Database expects:10{ "name": "A", "email": "a@b.com", "createdAt": "..." }Checkpoint
Set node có thể thực hiện những thao tác gì? Cho ví dụ "Keep Only Selected Fields".
🔀 Merge Node — Combine Data
2. Merge Node — Combine Data
2.1 Merge Modes
| Mode | Description | SQL Equivalent |
|---|---|---|
| Append | Combine items from both inputs | UNION ALL |
| Combine by Position | Match items by index | - |
| Combine by Fields | Match items by field values | JOIN |
| Choose Branch | Pick one input only | - |
2.2 Append (Union)
1Input 1: [{name: "A"}, {name: "B"}]2Input 2: [{name: "C"}, {name: "D"}]3 4Merge (Append):5Output: [{name: "A"}, {name: "B"}, {name: "C"}, {name: "D"}]Use case: Combine data from multiple sources (sheets, APIs).
2.3 Combine by Fields (Join)
1Input 1 (Orders):2[{orderId: 1, customerId: 101, amount: 500000},3 {orderId: 2, customerId: 102, amount: 300000}]4 5Input 2 (Customers):6[{customerId: 101, name: "Nguyen A", city: "Hanoi"},7 {customerId: 102, name: "Tran B", city: "HCMC"}]8 9Merge (Combine by Fields: customerId):10Output:11[{orderId: 1, customerId: 101, amount: 500000, name: "Nguyen A", city: "Hanoi"},12 {orderId: 2, customerId: 102, amount: 300000, name: "Tran B", city: "HCMC"}]2.4 Join Types
1Inner Join: Chỉ items match cả 2 inputs2Left Join: Tất cả items từ Input 1, match từ Input 23Right Join: Tất cả items từ Input 2, match từ Input 14Full Outer: Tất cả items từ cả 2Checkpoint
Merge node có bao nhiêu modes? "Combine by Fields" tương đương với lệnh SQL nào?
✂️ Split & Aggregate
3. Split & Aggregate
3.1 Split In Batches
1Input: 100 items2 3Split In Batches (batch size: 10):4→ Batch 1: items 1-105→ Batch 2: items 11-206→ ... 7→ Batch 10: items 91-1008 9Use case:10- API rate limits (process 10 at a time, then wait)11- Batch operations (bulk email, bulk update)3.2 Item Lists (Split Array)
1Input:2{3 "orderId": 1,4 "items": ["iPhone", "AirPods", "Case"]5}6 7Item Lists node (Split Out: "items"):8Output:9[{orderId: 1, items: "iPhone"},10 {orderId: 1, items: "AirPods"},11 {orderId: 1, items: "Case"}]3.3 Aggregate Items
1Input: 2[{category: "Electronics", amount: 500},3 {category: "Food", amount: 200},4 {category: "Electronics", amount: 300}]5 6Aggregate (Group by: category, Sum: amount):7Output:8[{category: "Electronics", totalAmount: 800},9 {category: "Food", totalAmount: 200}]Checkpoint
Split In Batches dùng khi nào? Item Lists (Split Out) khác gì so với Split In Batches?
🔍 Filter Node
4. Filter Node
4.1 Basic Filter
1Filter node:2- Condition: $json.status === "active"3 4Input: 10 items (5 active, 5 inactive)5Output: 5 items (only active)4.2 Multiple Conditions
1Filter: AND2- $json.status === "active"3- $json.amount > 1000004- $json.country === "VN"5 6→ Only items matching ALL conditions pass through1Filter: OR2- $json.priority === "high"3- $json.amount > 10000004 5→ Items matching ANY condition pass through4.3 Advanced Filters
1// String contains2{{ $json.name.includes('Nguyen') }}3 4// Regex match5{{ /^VN\d{6}$/.test($json.code) }}6 7// Date filter (last 7 days)8{{ new Date($json.created) > $now.minus(7, 'days').toJSDate() }}9 10// Not null/empty11{{ $json.email !== null && $json.email !== '' }}Checkpoint
Filter AND và Filter OR khác nhau thế nào? Cho ví dụ filter lọc records trong 7 ngày gần nhất.
📊 Sort & Remove Duplicates
5. Sort Node
5.1 Basic Sort
1Sort node:2- Field: amount3- Direction: Descending4 5→ Highest amount first5.2 Multi-level Sort
1Sort:21. Priority (Ascending): high → medium → low32. Due Date (Ascending): soonest first43. Name (Ascending): A → Z6. Remove Duplicates
6.1 Remove Duplicates Node
1Remove Duplicates:2- Compare: email field3- Keep: First occurrence4 5Input:6[{email: "a@b.com", name: "A1"},7 {email: "a@b.com", name: "A2"},8 {email: "c@d.com", name: "C"}]9 10Output:11[{email: "a@b.com", name: "A1"},12 {email: "c@d.com", name: "C"}]Checkpoint
Remove Duplicates node so sánh dựa trên field nào? Khi có duplicates, giữ bản ghi nào?
🔄 ETL Pipeline
7. ETL Pipeline Example
7.1 Full Pipeline: Sales Report
11. EXTRACT:2 Google Sheets → Get all sales data3 Airtable → Get product info4 52. TRANSFORM:6 Merge (Join by productId)7 → Set (calculate totals, format dates)8 → Filter (last 30 days only)9 → Sort (by revenue DESC)10 → Aggregate (group by category)11 123. LOAD:13 → Google Sheets (write summary)14 → Slack (post report)15 → Email (send to managers)7.2 Data Cleaning Pipeline
1Input: Raw CSV data (messy)2 31. Set node: Rename/standardize field names42. Filter: Remove rows with missing email53. Set node: 6 - Trim whitespace7 - Lowercase emails8 - Format phone numbers9 - Parse dates104. Remove Duplicates: By email115. Sort: By name126. Output: Clean data → Database/Sheet8. Hands-on Lab
Lab 1: Customer 360
Merge customer data from 3 sources:
- Google Sheets — Customer names, emails
- Airtable — Order history
- HTTP Request — Payment info from API
- Merge — Join all by customerId
- Set — Calculate total spend, last order date
- Sort — By total spend DESC
- Google Sheets — Write customer report
Lab 2: Data Cleaning
- Webhook — Receive messy form data
- Set — Clean: trim, lowercase email, format phone
- Filter — Remove invalid entries
- Remove Duplicates — By email
- Airtable — Save clean data
Lab 3: Daily Aggregated Report
- Schedule (6 PM daily)
- Google Sheets — Get today's transactions
- Aggregate — Sum by category
- Set — Format numbers, calculate percentages
- Sort — By revenue DESC
- Slack — Post formatted report
📝 Quiz
-
Merge node mode "Combine by Fields" giống gì trong SQL?
- SELECT
- INSERT
- JOIN
- GROUP BY
-
Set node dùng để?
- Gửi email
- Reshape/transform data fields
- Filter data
- Sort data
-
Khi API trả về array nested trong 1 field, dùng node nào?
- Merge
- Filter
- Item Lists (Split Out)
- Sort
🎯 Key Takeaways
- Set — Reshape, rename, calculate fields
- Merge — Join/combine data from multiple sources
- Split/Aggregate — Handle arrays and batches
- Filter — Remove unwanted items
- Sort + Deduplicate — Clean and order data
- ETL pattern — Extract → Transform → Load, the core of data workflows
Checkpoint
Mô tả pipeline ETL hoàn chỉnh cho Sales Report: gồm những bước nào ở mỗi giai đoạn Extract, Transform, Load?
🚀 Bài tiếp theo
Code Node — JavaScript trong n8n cho logic phức tạp!
