MinAI - Về trang chủ
Hướng dẫn
10/1345 phút
Đang tải...

Data Transformation

Set, Merge, Split, Filter, Sort, và Aggregate nodes cho data processing

Data Transformation

Data từ APIs/databases thường không đúng format bạn cần. Bài này dạy bạn transform data giữa các nodes.

0

🎯 Mục tiêu bài học

TB5 min

🎯 Mục tiêu

  • Set node: Reshape data
  • Merge node: Combine data sources
  • Split/Aggregate: Handle arrays
  • Filter & Sort: Clean data
  • Build ETL pipelines

Checkpoint

Data transformation là gì? Tại sao cần transform data giữa các nodes?

1

📝 Set Node — Reshape Data

TB5 min

1. Set Node — Reshape Data

1.1 What is Set Node?

Set node tạo output mới với các fields bạn chọn — rename, reformat, add, remove fields.

1.2 Keep Only Selected Fields

Ví dụ
1Input:
2{
3 "id": 1,
4 "firstName": "Nguyen",
5 "lastName": "Van A",
6 "email": "a@email.com",
7 "password": "hashed123",
8 "internal_id": "INT001"
9}
10
11Set node (Keep Only Set):
12- name: "{{ $json.firstName }} {{ $json.lastName }}"
13- email: "{{ $json.email }}"
14- fullName: "{{ $json.firstName + ' ' + $json.lastName }}"
15
16Output:
17{
18 "name": "Nguyen Van A",
19 "email": "a@email.com",
20 "fullName": "Nguyen Van A"
21}

1.3 Add Calculated Fields

Ví dụ
1Set node:
2- total: {{ $json.price * $json.quantity }}
3- taxAmount: {{ $json.price * $json.quantity * 0.1 }}
4- grandTotal: {{ $json.price * $json.quantity * 1.1 }}
5- status: {{ $json.quantity > 0 ? 'In Stock' : 'Out of Stock' }}
6- processedAt: {{ $now.format('yyyy-MM-dd HH:mm') }}

1.4 Rename Fields (API → Database mapping)

Ví dụ
1API returns:
2{ "user_name": "A", "user_email": "a@b.com", "created_at": "..." }
3
4Set node:
5- name: {{ $json.user_name }}
6- email: {{ $json.user_email }}
7- createdAt: {{ $json.created_at }}
8
9Database expects:
10{ "name": "A", "email": "a@b.com", "createdAt": "..." }

Checkpoint

Set node có thể thực hiện những thao tác gì? Cho ví dụ "Keep Only Selected Fields".

2

🔀 Merge Node — Combine Data

TB5 min

2. Merge Node — Combine Data

2.1 Merge Modes

ModeDescriptionSQL Equivalent
AppendCombine items from both inputsUNION ALL
Combine by PositionMatch items by index-
Combine by FieldsMatch items by field valuesJOIN
Choose BranchPick one input only-

2.2 Append (Union)

Ví dụ
1Input 1: [{name: "A"}, {name: "B"}]
2Input 2: [{name: "C"}, {name: "D"}]
3
4Merge (Append):
5Output: [{name: "A"}, {name: "B"}, {name: "C"}, {name: "D"}]

Use case: Combine data from multiple sources (sheets, APIs).

2.3 Combine by Fields (Join)

Ví dụ
1Input 1 (Orders):
2[{orderId: 1, customerId: 101, amount: 500000},
3 {orderId: 2, customerId: 102, amount: 300000}]
4
5Input 2 (Customers):
6[{customerId: 101, name: "Nguyen A", city: "Hanoi"},
7 {customerId: 102, name: "Tran B", city: "HCMC"}]
8
9Merge (Combine by Fields: customerId):
10Output:
11[{orderId: 1, customerId: 101, amount: 500000, name: "Nguyen A", city: "Hanoi"},
12 {orderId: 2, customerId: 102, amount: 300000, name: "Tran B", city: "HCMC"}]

2.4 Join Types

Ví dụ
1Inner Join: Chỉ items match cả 2 inputs
2Left Join: Tất cả items từ Input 1, match từ Input 2
3Right Join: Tất cả items từ Input 2, match từ Input 1
4Full Outer: Tất cả items từ cả 2

Checkpoint

Merge node có bao nhiêu modes? "Combine by Fields" tương đương với lệnh SQL nào?

3

✂️ Split & Aggregate

TB5 min

3. Split & Aggregate

3.1 Split In Batches

Ví dụ
1Input: 100 items
2
3Split In Batches (batch size: 10):
4→ Batch 1: items 1-10
5→ Batch 2: items 11-20
6→ ...
7→ Batch 10: items 91-100
8
9Use case:
10- API rate limits (process 10 at a time, then wait)
11- Batch operations (bulk email, bulk update)

3.2 Item Lists (Split Array)

Ví dụ
1Input:
2{
3 "orderId": 1,
4 "items": ["iPhone", "AirPods", "Case"]
5}
6
7Item Lists node (Split Out: "items"):
8Output:
9[{orderId: 1, items: "iPhone"},
10 {orderId: 1, items: "AirPods"},
11 {orderId: 1, items: "Case"}]

3.3 Aggregate Items

Ví dụ
1Input:
2[{category: "Electronics", amount: 500},
3 {category: "Food", amount: 200},
4 {category: "Electronics", amount: 300}]
5
6Aggregate (Group by: category, Sum: amount):
7Output:
8[{category: "Electronics", totalAmount: 800},
9 {category: "Food", totalAmount: 200}]

Checkpoint

Split In Batches dùng khi nào? Item Lists (Split Out) khác gì so với Split In Batches?

4

🔍 Filter Node

TB5 min

4. Filter Node

4.1 Basic Filter

Ví dụ
1Filter node:
2- Condition: $json.status === "active"
3
4Input: 10 items (5 active, 5 inactive)
5Output: 5 items (only active)

4.2 Multiple Conditions

Ví dụ
1Filter: AND
2- $json.status === "active"
3- $json.amount > 100000
4- $json.country === "VN"
5
6→ Only items matching ALL conditions pass through
Ví dụ
1Filter: OR
2- $json.priority === "high"
3- $json.amount > 1000000
4
5→ Items matching ANY condition pass through

4.3 Advanced Filters

Ví dụ
1// String contains
2{{ $json.name.includes('Nguyen') }}
3
4// Regex match
5{{ /^VN\d{6}$/.test($json.code) }}
6
7// Date filter (last 7 days)
8{{ new Date($json.created) > $now.minus(7, 'days').toJSDate() }}
9
10// Not null/empty
11{{ $json.email !== null && $json.email !== '' }}

Checkpoint

Filter AND và Filter OR khác nhau thế nào? Cho ví dụ filter lọc records trong 7 ngày gần nhất.

5

📊 Sort & Remove Duplicates

TB5 min

5. Sort Node

5.1 Basic Sort

Ví dụ
1Sort node:
2- Field: amount
3- Direction: Descending
4
5→ Highest amount first

5.2 Multi-level Sort

Ví dụ
1Sort:
21. Priority (Ascending): high → medium → low
32. Due Date (Ascending): soonest first
43. Name (Ascending): A → Z

6. Remove Duplicates

6.1 Remove Duplicates Node

Ví dụ
1Remove Duplicates:
2- Compare: email field
3- Keep: First occurrence
4
5Input:
6[{email: "a@b.com", name: "A1"},
7 {email: "a@b.com", name: "A2"},
8 {email: "c@d.com", name: "C"}]
9
10Output:
11[{email: "a@b.com", name: "A1"},
12 {email: "c@d.com", name: "C"}]

Checkpoint

Remove Duplicates node so sánh dựa trên field nào? Khi có duplicates, giữ bản ghi nào?

6

🔄 ETL Pipeline

TB5 min

7. ETL Pipeline Example

7.1 Full Pipeline: Sales Report

Ví dụ
11. EXTRACT:
2 Google Sheets → Get all sales data
3 Airtable → Get product info
4
52. TRANSFORM:
6 Merge (Join by productId)
7 → Set (calculate totals, format dates)
8 → Filter (last 30 days only)
9 → Sort (by revenue DESC)
10 → Aggregate (group by category)
11
123. LOAD:
13 → Google Sheets (write summary)
14 → Slack (post report)
15 → Email (send to managers)

7.2 Data Cleaning Pipeline

Ví dụ
1Input: Raw CSV data (messy)
2
31. Set node: Rename/standardize field names
42. Filter: Remove rows with missing email
53. Set node:
6 - Trim whitespace
7 - Lowercase emails
8 - Format phone numbers
9 - Parse dates
104. Remove Duplicates: By email
115. Sort: By name
126. Output: Clean data → Database/Sheet

8. Hands-on Lab

Lab 1: Customer 360

Merge customer data from 3 sources:

  1. Google Sheets — Customer names, emails
  2. Airtable — Order history
  3. HTTP Request — Payment info from API
  4. Merge — Join all by customerId
  5. Set — Calculate total spend, last order date
  6. Sort — By total spend DESC
  7. Google Sheets — Write customer report

Lab 2: Data Cleaning

  1. Webhook — Receive messy form data
  2. Set — Clean: trim, lowercase email, format phone
  3. Filter — Remove invalid entries
  4. Remove Duplicates — By email
  5. Airtable — Save clean data

Lab 3: Daily Aggregated Report

  1. Schedule (6 PM daily)
  2. Google Sheets — Get today's transactions
  3. Aggregate — Sum by category
  4. Set — Format numbers, calculate percentages
  5. Sort — By revenue DESC
  6. Slack — Post formatted report

📝 Quiz

  1. Merge node mode "Combine by Fields" giống gì trong SQL?

    • SELECT
    • INSERT
    • JOIN
    • GROUP BY
  2. Set node dùng để?

    • Gửi email
    • Reshape/transform data fields
    • Filter data
    • Sort data
  3. Khi API trả về array nested trong 1 field, dùng node nào?

    • Merge
    • Filter
    • Item Lists (Split Out)
    • Sort

🎯 Key Takeaways

  1. Set — Reshape, rename, calculate fields
  2. Merge — Join/combine data from multiple sources
  3. Split/Aggregate — Handle arrays and batches
  4. Filter — Remove unwanted items
  5. Sort + Deduplicate — Clean and order data
  6. ETL pattern — Extract → Transform → Load, the core of data workflows

Checkpoint

Mô tả pipeline ETL hoàn chỉnh cho Sales Report: gồm những bước nào ở mỗi giai đoạn Extract, Transform, Load?


🚀 Bài tiếp theo

Code Node — JavaScript trong n8n cho logic phức tạp!