MinAI - Về trang chủ
Hướng dẫn
1/131.5 giờ
Đang tải...

Python Setup & Environment

Thiết lập môi trường Python chuyên nghiệp cho Data Analysis

0

🎯 Mục tiêu bài học

TB5 min

Sau bài học này, bạn sẽ:

✅ Install Python + Data Analysis packages

✅ Setup virtual environment chuyên nghiệp

✅ Sử dụng Jupyter Notebook/Lab cho analysis

✅ Nắm Python basics: types, collections, functions

✅ Hiểu NumPy cho numerical computing

Thời gian: 1.5 giờ | Độ khó: Beginner | Tool: Python 3.10+, Anaconda/pip, Jupyter

1

📖 Bảng Thuật Ngữ Quan Trọng

TB5 min
Thuật ngữTiếng ViệtMô tả
Virtual EnvironmentMôi trường ảoIsolated Python environment per project
Anaconda-Python distribution với 250+ DS packages
Jupyter Notebook-Interactive coding: code + text + viz
pip-Python package manager
conda-Package + environment manager
NumPy-Numerical Python — fast array operations
Pandas-Data manipulation library (DataFrames)
DataFrameBảng dữ liệu2D labeled data structure
JupyterLab-Next-gen Jupyter với tabbed interface
Google Colab-Cloud Jupyter — free GPU, no setup

Checkpoint

Virtual env = isolated packages per project. Anaconda = all-in-one DS distribution. Jupyter = interactive analysis. pip vs conda đều install packages, nhưng conda quản lý cả environments!

2

🐍 1. Tại sao Python?

TB5 min

1.1 Python cho Data Analysis

Ưu điểmMô tả
Easy to learnCú pháp rõ ràng, dễ đọc
Rich ecosystemPandas, NumPy, Scikit-learn, etc.
CommunityCộng đồng lớn, nhiều tài liệu
VersatileAnalysis, ML, Web, Automation
Job marketKỹ năng được săn đón nhất

1.2 Data Analysis Stack

🐍Python Data Stack
⚙️Python 3.10+
🔢NumPy (Arrays)
📊Pandas (DataFrames)
📈Matplotlib (Charts)
📦Supporting: Seaborn, Plotly, Scipy, Statsmodels

Checkpoint

Python = #1 cho Data Analysis nhờ ecosystem (Pandas, NumPy, Matplotlib) + community + job market. Stack: NumPy (arrays) → Pandas (DataFrames) → Matplotlib (charts).

3

💻 2. Cài đặt Python

TB5 min

2.1 Phương pháp 1: Anaconda (Recommended)

Bash
1# Download Anaconda từ https://www.anaconda.com/download
2
3# Sau khi cài, verify:
4conda --version
5
6# Create môi trường mới
7conda create -n data-analysis python=3.10
8
9# Activate môi trường
10conda activate data-analysis
11
12# Install packages
13conda install pandas numpy matplotlib seaborn jupyter

Anaconda = Python distribution bao gồm sẵn 250+ packages cho Data Science, package manager (conda), và môi trường ảo — bắt đầu nhanh chóng!

2.2 Phương pháp 2: Python + pip

Bash
1# Download Python từ https://www.python.org/downloads/
2python --version
3
4# Create virtual environment
5python -m venv data-analysis-env
6
7# Activate (Windows)
8data-analysis-env\Scripts\activate
9
10# Activate (macOS/Linux)
11source data-analysis-env/bin/activate
12
13# Install packages
14pip install pandas numpy matplotlib seaborn jupyter notebook

2.3 Packages cần thiết

Bash
1# Core
2pip install pandas numpy scipy
3
4# Visualization
5pip install matplotlib seaborn plotly
6
7# Jupyter
8pip install jupyter notebook jupyterlab
9
10# Database + Excel
11pip install sqlalchemy psycopg2-binary openpyxl xlrd
12
13# Statistics
14pip install statsmodels scikit-learn

Checkpoint

2 cách: Anaconda (recommended, all-in-one) hoặc Python + pip (lightweight). Luôn dùng virtual environment để isolate projects!

4

📓 3. Jupyter Notebook

TB5 min

3.1 Khởi động

Bash
1# Classic Notebook
2jupyter notebook
3
4# JupyterLab (recommended)
5jupyter lab
6# → http://localhost:8888

3.2 Keyboard Shortcuts

ShortcutAction
Shift + EnterRun cell, move to next
Ctrl + EnterRun cell, stay
Alt + EnterRun cell, insert below
A / BInsert cell above / below (command mode)
DDDelete cell (command mode)
M / YMarkdown / Code mode

3.3 Notebook Best Practices

Python
1# Cell 1: Imports (always at top)
2import pandas as pd
3import numpy as np
4import matplotlib.pyplot as plt
5import seaborn as sns
6
7# Settings
8pd.set_option('display.max_columns', None)
9pd.set_option('display.max_rows', 100)
10plt.style.use('seaborn-v0_8-whitegrid')
11%matplotlib inline

3.4 IDE Alternatives

IDEBest For
VS CodeFull IDE + Jupyter + Git integration
PyCharmLarge projects, excellent debugging
Google ColabFree GPU, no setup, easy sharing

Checkpoint

Jupyter = interactive analysis (code + text + viz). Shortcuts: Shift+Enter (run), A/B (insert), DD (delete). VS Code + Jupyter extension = best combo cho professionals!

5

🔤 4. Python Basics cho Analysis

TB5 min

4.1 Data Types

Python
1integer_val = 42
2float_val = 3.14
3name = "Data Analysis"
4is_valid = True
5missing_value = None
6
7print(type(42)) # <class 'int'>
8print(type(3.14)) # <class 'float'>
9print(type("hello")) # <class 'str'>

4.2 Collections

Python
1# List - ordered, mutable
2numbers = [1, 2, 3, 4, 5]
3numbers.append(6)
4print(numbers[1:3]) # [2, 3]
5
6# Tuple - ordered, immutable
7coordinates = (10.5, 20.3)
8x, y = coordinates # Unpacking
9
10# Dictionary - key-value pairs
11person = {'name': 'Alice', 'age': 30, 'city': 'Hanoi'}
12print(person['name']) # Alice
13
14# Set - unique values
15unique_ids = {1, 2, 3, 2, 1} # {1, 2, 3}

4.3 Control Flow & Functions

Python
1# List comprehension (Pythonic!)
2squares = [x**2 for x in range(10)]
3even_squares = [x**2 for x in range(10) if x % 2 == 0]
4
5# Functions
6def calculate_tax(amount, rate=0.1):
7 return amount * rate
8
9# Lambda
10square = lambda x: x ** 2
11
12# Multiple returns
13def get_statistics(numbers):
14 return min(numbers), max(numbers), sum(numbers)/len(numbers)
15minimum, maximum, average = get_statistics([1, 2, 3, 4, 5])
16
17# *args and **kwargs
18def flexible_func(*args, **kwargs):
19 print(f"Args: {args}, Kwargs: {kwargs}")

Checkpoint

Collections: List (mutable), Tuple (immutable), Dict (key-value), Set (unique). List comprehension = Pythonic filtering. Lambda = anonymous function!

6

🔢 5. NumPy Basics

TB5 min

5.1 Arrays

Python
1import numpy as np
2
3arr1 = np.array([1, 2, 3, 4, 5])
4arr2 = np.zeros(5) # [0, 0, 0, 0, 0]
5arr3 = np.ones(5) # [1, 1, 1, 1, 1]
6arr4 = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
7arr5 = np.linspace(0, 1, 5) # [0, 0.25, 0.5, 0.75, 1]
8
9# 2D arrays
10matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
11print(matrix.shape) # (3, 3)

5.2 Operations & Statistics

Python
1a = np.array([1, 2, 3, 4])
2b = np.array([5, 6, 7, 8])
3
4# Element-wise
5print(a + b) # [6, 8, 10, 12]
6print(a * b) # [5, 12, 21, 32]
7print(a ** 2) # [1, 4, 9, 16]
8
9# Statistics
10print(np.mean(a)) # 2.5
11print(np.std(a)) # 1.118
12print(np.median(a)) # 2.5
13
14# Boolean indexing
15arr = np.arange(10)
16print(arr[arr > 5]) # [6, 7, 8, 9]
17
18# 2D indexing
19matrix = np.arange(12).reshape(3, 4)
20print(matrix[1, 2]) # 6
21print(matrix[0, :]) # [0, 1, 2, 3]
22print(matrix[:, 1]) # [1, 5, 9]

Checkpoint

NumPy = fast vectorized operations (no loops!). Boolean indexing = filter with conditions. NumPy là foundation cho Pandas!

7

✅ 6. Verify Installation

TB5 min
Python
1print("Testing Python Data Analysis Setup...")
2print("=" * 50)
3
4import pandas as pd
5print(f"✅ Pandas {pd.__version__}")
6
7import numpy as np
8print(f"✅ NumPy {np.__version__}")
9
10import matplotlib
11print(f"✅ Matplotlib {matplotlib.__version__}")
12
13import seaborn as sns
14print(f"✅ Seaborn {sns.__version__}")
15
16# Test operations
17df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
18print(f"✅ DataFrame created:\n{df}")
19
20print("\n🎉 Setup complete! Ready for Data Analysis!")

Nếu bất kỳ import nào fail → chạy lại pip install package_name hoặc conda install package_name. Đảm bảo virtual environment đã activated!

8

📋 Tổng kết

TB5 min

Kiến thức đã học

TopicKey Points
InstallationAnaconda (recommended) hoặc Python + pip
EnvironmentVirtual environments để isolate projects
JupyterInteractive analysis, export reports
Python BasicsTypes, collections, functions, comprehensions
NumPyEfficient numerical operations, arrays

Checklist hoàn thành

  • Python 3.10+ installed
  • Virtual environment created
  • Core packages installed
  • Jupyter working
  • Test script passed

Câu hỏi tự kiểm tra

  1. Virtual environment dùng để giải quyết vấn đề gì?
  2. Jupyter Notebook khác Python script thế nào?
  3. NumPy array khác Python list ở điểm nào?
  4. Anaconda vs pip: khi nào dùng cái nào?

Bài tiếp theo: Pandas Fundamentals — DataFrames và operations cơ bản →

🎉 Tuyệt vời! Bạn đã setup xong môi trường Python cho Data Analysis!

Nhớ: Virtual environment + Jupyter + NumPy là bộ 3 không thể thiếu. Hãy luôn tạo environment mới cho mỗi project!