Day 3: Overview of ML Algorithms
π Table of Contents
- π Welcome to Day 3
- π Classical Machine Learning Paradigms
- Supervised Learning
- Unsupervised Learning
- Semi-Supervised Learning
- Reinforcement Learning
- ποΈ Supervised Learning Algorithms
- Linear Regression
- Logistic Regression
- Decision Trees
- Random Forests
- Support Vector Machines
- k-Nearest Neighbors
- Gradient Boosting Machines (XGBoost, LightGBM, CatBoost)
- Neural Networks (Intro)
- π Unsupervised Learning Algorithms
- k-Means Clustering
- DBSCAN
- Hierarchical Clustering
- Principal Component Analysis (PCA)
- t-SNE and UMAP
- π Semi-Supervised and Reinforcement Learning
- Label Propagation
- Q-Learning (High-Level)
- π» Practical Examples and Use Cases
- Regression Example
- Classification Example
- Clustering Example
- Dimensionality Reduction Example
- Evaluation Metrics
- π Resources
- π‘ Tips and Tricks
1. π Welcome to Day 3
Welcome to Day 3 of your 90-day machine learning journey! Today, weβre taking a grand tour of the machine learning algorithm ecosystem, exploring different approaches and methods to solve various data-driven problems. Whether youβre interested in predicting prices, grouping customers, reducing dimensionality, or learning from minimal labels, thereβs a family of algorithms tailored for your needs.
2. π Classical Machine Learning Paradigms
π Supervised Learning
- Description: Models learn patterns from labeled data.
- Common Tasks: Classification (predicting discrete categories), Regression (predicting continuous values).
- Related Image:
π Unsupervised Learning
- Description: Extracts structure from unlabeled data.
- Common Tasks: Clustering (grouping similar items), Dimensionality Reduction (compressing features).
- Related Image:
π Semi-Supervised Learning
- Description: Utilizes a mix of labeled and unlabeled data.
- Application: When labeling is expensive and only a small portion of data is labeled.
- Related Image (Search: "Semi Supervised Learning Diagram"):
π Reinforcement Learning
- Description: Agents learn optimal actions by interacting with an environment to maximize rewards.
- Application: Robotics, autonomous navigation, game playing.
- Related Image (Search: "Reinforcement Learning Diagram"):
3. ποΈ Supervised Learning Algorithms
π Linear Regression
- What: Fits a straight line to predict continuous values.
- Use Cases: Predicting house prices, sales forecasting.
- Code Example:
from sklearn.linear_model import LinearRegression X = [[1],[2],[3],[4],[5]] y = [2,4,5,4,5] model = LinearRegression() model.fit(X, y) print(model.predict([[6]]))
- Related Image (Search: "Linear Regression Line"):
π Logistic Regression
- What: Estimates probabilities for classification.
- Use Cases: Spam detection, disease prediction.
- Code Example:
from sklearn.linear_model import LogisticRegression X = [[0],[1],[2],[3],[4]] y = [0,0,1,1,1] clf = LogisticRegression() clf.fit(X, y) print(clf.predict([[2.5]]))
- Related Image (Search: "Logistic Regression Sigmoid Curve"):
π Decision Trees
- What: Uses feature-based splits to create a tree of decisions.
- Use Cases: Credit risk assessment, medical diagnosis rules.
- Code Example:
from sklearn.tree import DecisionTreeClassifier X = [[0,0],[1,0],[1,1],[0,1]] y = [0, 0, 1, 1] tree = DecisionTreeClassifier() tree.fit(X, y) print(tree.predict([[0.5, 0.5]]))
- Related Image (Search: "Decision Tree Diagram"):
π Random Forests
- What: Combines multiple decision trees for robust predictions.
- Use Cases: Loan approval, stock market predictions.
- Code Example:
from sklearn.ensemble import RandomForestClassifier X = [[0,0],[1,0],[1,1],[0,1]] y = [0,0,1,1] rf = RandomForestClassifier() rf.fit(X, y) print(rf.predict([[0.5,0.5]]))
- Related Image (Search: "Random Forest Ensemble"):
π Support Vector Machines (SVMs)
- What: Finds the optimal separating hyperplane for classification.
- Use Cases: Image classification, text categorization.
- Code Example:
from sklearn.svm import SVC X = [[0,0],[1,0],[1,1],[0,1]] y = [0,0,1,1] svc = SVC(kernel='linear') svc.fit(X, y) print(svc.predict([[0.5,0.5]]))
- Related Image (Search: "SVM Margin"):
π k-Nearest Neighbors (kNN)
- What: Classifies based on proximity to known samples.
- Use Cases: Recommender systems, anomaly detection.
- Code Example:
from sklearn.neighbors import KNeighborsClassifier X = [[0],[1],[2],[3]] y = [0,0,1,1] knn = KNeighborsClassifier(n_neighbors=3) knn.fit(X, y) print(knn.predict([[1.5]]))
- Related Image (Search: "kNN Visualization"):
π Gradient Boosting Machines (XGBoost, LightGBM, CatBoost)
- What: Sequentially improve weak learners to create a strong model.
- Use Cases: Structured data challenges, Kaggle competitions.
- Code Example (XGBoost):
import xgboost as xgb X = [[1,2],[2,3],[3,4],[4,5]] y = [0,1,1,0] model = xgb.XGBClassifier() model.fit(X, y) print(model.predict([[2.5,3.5]]))
- Related Image (Search: "Gradient Boosting Diagram"):
π Neural Networks (Intro)
- What: Layers of neurons that learn complex representations.
- Use Cases: Image classification, sentiment analysis.
- Code Example (Simple MLP):
from sklearn.neural_network import MLPClassifier X = [[0,0],[1,0],[1,1],[0,1]] y = [0,0,1,1] mlp = MLPClassifier(hidden_layer_sizes=(5,), max_iter=500) mlp.fit(X, y) print(mlp.predict([[0.5,0.5]]))
- Related Image (Search: "Neural Network Layers"):
4. π Unsupervised Learning Algorithms
π k-Means Clustering
- What: Divides data into k clusters based on similarity.
- Use Cases: Customer segmentation, grouping documents.
- Code Example:
from sklearn.cluster import KMeans X = [[1,2],[1,3],[2,2],[5,8],[6,9],[5,7]] km = KMeans(n_clusters=2) km.fit(X) print(km.labels_)
- Related Image (Search: "k-Means Visualization"):
π DBSCAN
- What: Density-based clustering that detects outliers.
- Use Cases: Identifying unusual patterns, noise filtering.
- Code Example:
from sklearn.cluster import DBSCAN X = [[1,2],[1,3],[2,2],[10,10],[10,11],[11,10]] db = DBSCAN(eps=1.5, min_samples=2) db.fit(X) print(db.labels_)
- Related Image (Search: "DBSCAN Clustering"):
π Hierarchical Clustering
- What: Builds a hierarchy of clusters without pre-specifying k.
- Use Cases: Gene expression analysis, text analysis.
- Code Example (Agglomerative):
from sklearn.cluster import AgglomerativeClustering X = [[1,2],[1,3],[2,2],[5,8],[6,9],[5,7]] hc = AgglomerativeClustering(n_clusters=2) hc.fit(X) print(hc.labels_)
- Related Image (Search: "Dendrogram Hierarchical Clustering"):
π Principal Component Analysis (PCA)
- What: Reduces dimensionality by projecting onto directions of max variance.
- Use Cases: Visualization, noise reduction.
- Code Example:
from sklearn.decomposition import PCA X = [[1,2],[1,3],[2,2],[5,8],[6,9],[5,7]] pca = PCA(n_components=2) X_reduced = pca.fit_transform(X) print(X_reduced)
- Related Image (Search: "PCA Visualization"):
π t-SNE and UMAP
- What: Non-linear methods for high-dimensional data visualization.
- Use Cases: Visualizing word embeddings, image embeddings.
- Code Example (t-SNE):
from sklearn.manifold import TSNE X = [[1,2],[1,3],[2,2],[5,8],[6,9],[5,7]] tsne = TSNE(n_components=2, random_state=42) X_tsne = tsne.fit_transform(X) print(X_tsne)
- Related Image (Search: "t-SNE Visualization"):
5. π Semi-Supervised and Reinforcement Learning
π Label Propagation (Semi-Supervised)
- What: Spreads labels from a small labeled set to a larger unlabeled set.
- Use Cases: Partially labeled datasets in text or image classification.
- Code Example:
from sklearn.semi_supervised import LabelPropagation X = [[0,0],[1,0],[1,1],[0,1]] y = [0, -1, 1, -1] lp = LabelPropagation() lp.fit(X, y) print(lp.predict(X))
- Related Image (Search: "Semi-Supervised Label Propagation"):
π Q-Learning (High-Level, RL)
- What: Learns action values to maximize rewards over time.
- Use Cases: Game AI, robotic control.
- Note: Generally requires custom implementations or specialized libraries.
- Related Image (Search: "Q Learning Diagram"):
6. π» Practical Examples and Use Cases
π Regression Example (Predicting House Prices)
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
X = np.random.rand(100, 1)*10
y = 3*X.squeeze() + np.random.randn(100)*2
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("MSE:", mean_squared_error(y_test,y_pred))
π Classification Example (Iris)
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
data = load_iris()
X_train,X_test,y_train,y_test = train_test_split(data.data,data.target,random_state=42)
clf = RandomForestClassifier()
clf.fit(X_train,y_train)
y_pred = clf.predict(X_test)
print("Accuracy:", accuracy_score(y_test,y_pred))
π Clustering Example (Iris)
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
from sklearn.metrics import adjusted_rand_score
data = load_iris()
X = data.data
km = KMeans(n_clusters=3,random_state=42)
labels = km.fit_predict(X)
print("ARI:", adjusted_rand_score(data.target,labels))
π Dimensionality Reduction Example (PCA on Iris)
from sklearn.decomposition import PCA
X_pca = PCA(n_components=2).fit_transform(data.data)
print("Reduced shape:", X_pca.shape)
π Evaluation Metrics
- Regression: Mean Squared Error (MSE), Mean Absolute Error (MAE), RΒ².
- Classification: Accuracy, Precision, Recall, F1-score, ROC AUC.
- Clustering: Silhouette Score, Adjusted Rand Index.
7. π Resources
- Scikit-Learn Official Documentation
- Kaggle Datasets
- Google ML Crash Course
- Stanford CS229
- Machine Learning Mastery
8. π‘ Tips and Tricks
- Start Simple: Begin with straightforward models before moving to complex algorithms.
- Hyperparameter Tuning: Use GridSearchCV or RandomizedSearchCV to improve model performance.
- Feature Engineering: Good features often matter more than fancy models.
- Visualizations: Use libraries like matplotlib or seaborn to visualize data and model predictions.