Python Setup & Environment

🎯 Mục tiêu bài học

TB5 min

Sau bài học này, bạn sẽ:

✅ Install Python + Data Analysis packages

✅ Setup virtual environment chuyên nghiệp

✅ Sử dụng Jupyter Notebook/Lab cho analysis

✅ Nắm Python basics: types, collections, functions

✅ Hiểu NumPy cho numerical computing

Thời gian: 1.5 giờ | Độ khó: Beginner | Tool: Python 3.10+, Anaconda/pip, Jupyter

Task 0

📖 Bảng Thuật Ngữ Quan Trọng

TB5 min

Thuật ngữ	Tiếng Việt	Mô tả
Virtual Environment	Môi trường ảo	Isolated Python environment per project
Anaconda	-	Python distribution với 250+ DS packages
Jupyter Notebook	-	Interactive coding: code + text + viz
pip	-	Python package manager
conda	-	Package + environment manager
NumPy	-	Numerical Python — fast array operations
Pandas	-	Data manipulation library (DataFrames)
DataFrame	Bảng dữ liệu	2D labeled data structure
JupyterLab	-	Next-gen Jupyter với tabbed interface
Google Colab	-	Cloud Jupyter — free GPU, no setup

Checkpoint

Virtual env = isolated packages per project. Anaconda = all-in-one DS distribution. Jupyter = interactive analysis. pip vs conda đều install packages, nhưng conda quản lý cả environments!

Task 1

🐍 1. Tại sao Python?

TB5 min

1.1 Python cho Data Analysis

Ưu điểm	Mô tả
Easy to learn	Cú pháp rõ ràng, dễ đọc
Rich ecosystem	Pandas, NumPy, Scikit-learn, etc.
Community	Cộng đồng lớn, nhiều tài liệu
Versatile	Analysis, ML, Web, Automation
Job market	Kỹ năng được săn đón nhất

1.2 Data Analysis Stack

🐍Python Data Stack

⚙️Python 3.10+

🔢NumPy (Arrays)

📊Pandas (DataFrames)

📈Matplotlib (Charts)

📦Supporting: Seaborn, Plotly, Scipy, Statsmodels

Checkpoint

Python = #1 cho Data Analysis nhờ ecosystem (Pandas, NumPy, Matplotlib) + community + job market. Stack: NumPy (arrays) → Pandas (DataFrames) → Matplotlib (charts).

Task 2

💻 2. Cài đặt Python

TB5 min

2.1 Phương pháp 1: Anaconda (Recommended)

Bash

1# Download Anaconda từ https://www.anaconda.com/download
2 
3# Sau khi cài, verify:
4conda --version
5 
6# Create môi trường mới
7conda create -n data-analysis python=3.10
8 
9# Activate môi trường
10conda activate data-analysis
11 
12# Install packages
13conda install pandas numpy matplotlib seaborn jupyter

Anaconda = Python distribution bao gồm sẵn 250+ packages cho Data Science, package manager (conda), và môi trường ảo — bắt đầu nhanh chóng!

2.2 Phương pháp 2: Python + pip

Bash

1# Download Python từ https://www.python.org/downloads/
2python --version
3 
4# Create virtual environment
5python -m venv data-analysis-env
6 
7# Activate (Windows)
8data-analysis-env\Scripts\activate
9 
10# Activate (macOS/Linux)
11source data-analysis-env/bin/activate
12 
13# Install packages
14pip install pandas numpy matplotlib seaborn jupyter notebook

2.3 Packages cần thiết

Bash

1# Core
2pip install pandas numpy scipy
3 
4# Visualization
5pip install matplotlib seaborn plotly
6 
7# Jupyter
8pip install jupyter notebook jupyterlab
9 
10# Database + Excel
11pip install sqlalchemy psycopg2-binary openpyxl xlrd
12 
13# Statistics
14pip install statsmodels scikit-learn

Checkpoint

2 cách: Anaconda (recommended, all-in-one) hoặc Python + pip (lightweight). Luôn dùng virtual environment để isolate projects!

Task 3

📓 3. Jupyter Notebook

TB5 min

3.1 Khởi động

Bash

1# Classic Notebook
2jupyter notebook
3 
4# JupyterLab (recommended)
5jupyter lab
6# → http://localhost:8888

3.2 Keyboard Shortcuts

Shortcut	Action
`Shift + Enter`	Run cell, move to next
`Ctrl + Enter`	Run cell, stay
`Alt + Enter`	Run cell, insert below
`A` / `B`	Insert cell above / below (command mode)
`DD`	Delete cell (command mode)
`M` / `Y`	Markdown / Code mode

3.3 Notebook Best Practices

Python

1# Cell 1: Imports (always at top)
2import pandas as pd
3import numpy as np
4import matplotlib.pyplot as plt
5import seaborn as sns
6
7# Settings
8pd.set_option('display.max_columns', None)
9pd.set_option('display.max_rows', 100)
10plt.style.use('seaborn-v0_8-whitegrid')
11%matplotlib inline

3.4 IDE Alternatives

IDE	Best For
VS Code	Full IDE + Jupyter + Git integration
PyCharm	Large projects, excellent debugging
Google Colab	Free GPU, no setup, easy sharing

Checkpoint

Jupyter = interactive analysis (code + text + viz). Shortcuts: Shift+Enter (run), A/B (insert), DD (delete). VS Code + Jupyter extension = best combo cho professionals!

Task 4

🔤 4. Python Basics cho Analysis

TB5 min

4.1 Data Types

Python

1integer_val = 42
2float_val = 3.14
3name = "Data Analysis"
4is_valid = True
5missing_value = None
6
7print(type(42))         # <class 'int'>
8print(type(3.14))       # <class 'float'>
9print(type("hello"))    # <class 'str'>

4.2 Collections

Python

1# List - ordered, mutable
2numbers = [1, 2, 3, 4, 5]
3numbers.append(6)
4print(numbers[1:3])  # [2, 3]
5
6# Tuple - ordered, immutable
7coordinates = (10.5, 20.3)
8x, y = coordinates  # Unpacking
9
10# Dictionary - key-value pairs
11person = {'name': 'Alice', 'age': 30, 'city': 'Hanoi'}
12print(person['name'])  # Alice
13
14# Set - unique values
15unique_ids = {1, 2, 3, 2, 1}  # {1, 2, 3}

4.3 Control Flow & Functions

Python

1# List comprehension (Pythonic!)
2squares = [x**2 for x in range(10)]
3even_squares = [x**2 for x in range(10) if x % 2 == 0]
4
5# Functions
6def calculate_tax(amount, rate=0.1):
7    return amount * rate
8
9# Lambda
10square = lambda x: x ** 2
11
12# Multiple returns
13def get_statistics(numbers):
14    return min(numbers), max(numbers), sum(numbers)/len(numbers)
15minimum, maximum, average = get_statistics([1, 2, 3, 4, 5])
16
17# *args and **kwargs
18def flexible_func(*args, **kwargs):
19    print(f"Args: {args}, Kwargs: {kwargs}")

Checkpoint

Collections: List (mutable), Tuple (immutable), Dict (key-value), Set (unique). List comprehension = Pythonic filtering. Lambda = anonymous function!

Task 5

🔢 5. NumPy Basics

TB5 min

5.1 Arrays

Python

1import numpy as np
2
3arr1 = np.array([1, 2, 3, 4, 5])
4arr2 = np.zeros(5)          # [0, 0, 0, 0, 0]
5arr3 = np.ones(5)           # [1, 1, 1, 1, 1]
6arr4 = np.arange(0, 10, 2)  # [0, 2, 4, 6, 8]
7arr5 = np.linspace(0, 1, 5) # [0, 0.25, 0.5, 0.75, 1]
8
9# 2D arrays
10matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
11print(matrix.shape)  # (3, 3)

5.2 Operations & Statistics

Python

1a = np.array([1, 2, 3, 4])
2b = np.array([5, 6, 7, 8])
3
4# Element-wise
5print(a + b)      # [6, 8, 10, 12]
6print(a * b)      # [5, 12, 21, 32]
7print(a ** 2)     # [1, 4, 9, 16]
8
9# Statistics
10print(np.mean(a))   # 2.5
11print(np.std(a))    # 1.118
12print(np.median(a)) # 2.5
13
14# Boolean indexing
15arr = np.arange(10)
16print(arr[arr > 5])  # [6, 7, 8, 9]
17
18# 2D indexing
19matrix = np.arange(12).reshape(3, 4)
20print(matrix[1, 2])     # 6
21print(matrix[0, :])     # [0, 1, 2, 3]
22print(matrix[:, 1])     # [1, 5, 9]

Checkpoint

NumPy = fast vectorized operations (no loops!). Boolean indexing = filter with conditions. NumPy là foundation cho Pandas!

Task 6

✅ 6. Verify Installation

TB5 min

Python

1print("Testing Python Data Analysis Setup...")
2print("=" * 50)
3
4import pandas as pd
5print(f"✅ Pandas {pd.__version__}")
6
7import numpy as np
8print(f"✅ NumPy {np.__version__}")
9
10import matplotlib
11print(f"✅ Matplotlib {matplotlib.__version__}")
12
13import seaborn as sns
14print(f"✅ Seaborn {sns.__version__}")
15
16# Test operations
17df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
18print(f"✅ DataFrame created:\n{df}")
19
20print("\n🎉 Setup complete! Ready for Data Analysis!")

Nếu bất kỳ import nào fail → chạy lại pip install package_name hoặc conda install package_name. Đảm bảo virtual environment đã activated!

Task 7

📋 Tổng kết

TB5 min

Kiến thức đã học

Topic	Key Points
Installation	Anaconda (recommended) hoặc Python + pip
Environment	Virtual environments để isolate projects
Jupyter	Interactive analysis, export reports
Python Basics	Types, collections, functions, comprehensions
NumPy	Efficient numerical operations, arrays

Checklist hoàn thành

Câu hỏi tự kiểm tra

Virtual environment dùng để giải quyết vấn đề gì?
Jupyter Notebook khác Python script thế nào?
NumPy array khác Python list ở điểm nào?
Anaconda vs pip: khi nào dùng cái nào?

Bài tiếp theo: Pandas Fundamentals — DataFrames và operations cơ bản →

🎉 Tuyệt vời! Bạn đã setup xong môi trường Python cho Data Analysis!

Nhớ: Virtual environment + Jupyter + NumPy là bộ 3 không thể thiếu. Hãy luôn tạo environment mới cho mỗi project!

Task 8

Python Setup & Environment

🎯 Mục tiêu bài học

📖 Bảng Thuật Ngữ Quan Trọng

Checkpoint

🐍 1. Tại sao Python?

1.1 Python cho Data Analysis

1.2 Data Analysis Stack

Checkpoint

💻 2. Cài đặt Python

2.1 Phương pháp 1: Anaconda (Recommended)

2.2 Phương pháp 2: Python + pip

2.3 Packages cần thiết

Checkpoint

📓 3. Jupyter Notebook

3.1 Khởi động

3.2 Keyboard Shortcuts

3.3 Notebook Best Practices

3.4 IDE Alternatives

Checkpoint

🔤 4. Python Basics cho Analysis

4.1 Data Types

4.2 Collections

4.3 Control Flow & Functions

Checkpoint

🔢 5. NumPy Basics

5.1 Arrays

5.2 Operations & Statistics

Checkpoint

✅ 6. Verify Installation

📋 Tổng kết

Kiến thức đã học

Checklist hoàn thành

Câu hỏi tự kiểm tra

Khóa học

Mentor & Hỗ trợ

Blog

Giới thiệu