Skip to main content

Scikit-Learn Boss in 90 Days

Day 3: Overview of ML Algorithms

ML Algorithms Overview

πŸ“‘ Table of Contents

  1. 🌟 Welcome to Day 3
  2. πŸ“œ Classical Machine Learning Paradigms
    • Supervised Learning
    • Unsupervised Learning
    • Semi-Supervised Learning
    • Reinforcement Learning
  3. πŸ‹οΈ Supervised Learning Algorithms
    • Linear Regression
    • Logistic Regression
    • Decision Trees
    • Random Forests
    • Support Vector Machines
    • k-Nearest Neighbors
    • Gradient Boosting Machines (XGBoost, LightGBM, CatBoost)
    • Neural Networks (Intro)
  4. πŸ”Ž Unsupervised Learning Algorithms
    • k-Means Clustering
    • DBSCAN
    • Hierarchical Clustering
    • Principal Component Analysis (PCA)
    • t-SNE and UMAP
  5. πŸ”„ Semi-Supervised and Reinforcement Learning
    • Label Propagation
    • Q-Learning (High-Level)
  6. πŸ’» Practical Examples and Use Cases
    • Regression Example
    • Classification Example
    • Clustering Example
    • Dimensionality Reduction Example
    • Evaluation Metrics
  7. πŸ“š Resources
  8. πŸ’‘ Tips and Tricks

1. 🌟 Welcome to Day 3

Welcome to Day 3 of your 90-day machine learning journey! Today, we’re taking a grand tour of the machine learning algorithm ecosystem, exploring different approaches and methods to solve various data-driven problems. Whether you’re interested in predicting prices, grouping customers, reducing dimensionality, or learning from minimal labels, there’s a family of algorithms tailored for your needs.


2. πŸ“œ Classical Machine Learning Paradigms

πŸ“ Supervised Learning

  • Description: Models learn patterns from labeled data.
  • Common Tasks: Classification (predicting discrete categories), Regression (predicting continuous values).
  • Related Image:
    Supervised Learning Concept

πŸ“ Unsupervised Learning

  • Description: Extracts structure from unlabeled data.
  • Common Tasks: Clustering (grouping similar items), Dimensionality Reduction (compressing features).
  • Related Image:
    Unsupervised Learning Concept

πŸ“ Semi-Supervised Learning

  • Description: Utilizes a mix of labeled and unlabeled data.
  • Application: When labeling is expensive and only a small portion of data is labeled.
  • Related Image (Search: "Semi Supervised Learning Diagram"):
    Semi-Supervised Learning

πŸ“ Reinforcement Learning

  • Description: Agents learn optimal actions by interacting with an environment to maximize rewards.
  • Application: Robotics, autonomous navigation, game playing.
  • Related Image (Search: "Reinforcement Learning Diagram"):
    Reinforcement Learning

3. πŸ‹οΈ Supervised Learning Algorithms

πŸ“ Linear Regression

  • What: Fits a straight line to predict continuous values.
  • Use Cases: Predicting house prices, sales forecasting.
  • Code Example:
    from sklearn.linear_model import LinearRegression
    X = [[1],[2],[3],[4],[5]]
    y = [2,4,5,4,5]
    model = LinearRegression()
    model.fit(X, y)
    print(model.predict([[6]]))
    
  • Related Image (Search: "Linear Regression Line"):
    Linear Regression

πŸ“ Logistic Regression

  • What: Estimates probabilities for classification.
  • Use Cases: Spam detection, disease prediction.
  • Code Example:
    from sklearn.linear_model import LogisticRegression
    X = [[0],[1],[2],[3],[4]]
    y = [0,0,1,1,1]
    clf = LogisticRegression()
    clf.fit(X, y)
    print(clf.predict([[2.5]]))
    
  • Related Image (Search: "Logistic Regression Sigmoid Curve"):
    Logistic Regression

πŸ“ Decision Trees

  • What: Uses feature-based splits to create a tree of decisions.
  • Use Cases: Credit risk assessment, medical diagnosis rules.
  • Code Example:
    from sklearn.tree import DecisionTreeClassifier
    X = [[0,0],[1,0],[1,1],[0,1]]
    y = [0, 0, 1, 1]
    tree = DecisionTreeClassifier()
    tree.fit(X, y)
    print(tree.predict([[0.5, 0.5]]))
    
  • Related Image (Search: "Decision Tree Diagram"):
    Decision Tree

πŸ“ Random Forests

  • What: Combines multiple decision trees for robust predictions.
  • Use Cases: Loan approval, stock market predictions.
  • Code Example:
    from sklearn.ensemble import RandomForestClassifier
    X = [[0,0],[1,0],[1,1],[0,1]]
    y = [0,0,1,1]
    rf = RandomForestClassifier()
    rf.fit(X, y)
    print(rf.predict([[0.5,0.5]]))
    
  • Related Image (Search: "Random Forest Ensemble"):
    Random Forest

πŸ“ Support Vector Machines (SVMs)

  • What: Finds the optimal separating hyperplane for classification.
  • Use Cases: Image classification, text categorization.
  • Code Example:
    from sklearn.svm import SVC
    X = [[0,0],[1,0],[1,1],[0,1]]
    y = [0,0,1,1]
    svc = SVC(kernel='linear')
    svc.fit(X, y)
    print(svc.predict([[0.5,0.5]]))
    
  • Related Image (Search: "SVM Margin"):
    SVM Concept

πŸ“ k-Nearest Neighbors (kNN)

  • What: Classifies based on proximity to known samples.
  • Use Cases: Recommender systems, anomaly detection.
  • Code Example:
    from sklearn.neighbors import KNeighborsClassifier
    X = [[0],[1],[2],[3]]
    y = [0,0,1,1]
    knn = KNeighborsClassifier(n_neighbors=3)
    knn.fit(X, y)
    print(knn.predict([[1.5]]))
    
  • Related Image (Search: "kNN Visualization"):
    kNN Concept

πŸ“ Gradient Boosting Machines (XGBoost, LightGBM, CatBoost)

  • What: Sequentially improve weak learners to create a strong model.
  • Use Cases: Structured data challenges, Kaggle competitions.
  • Code Example (XGBoost):
    import xgboost as xgb
    X = [[1,2],[2,3],[3,4],[4,5]]
    y = [0,1,1,0]
    model = xgb.XGBClassifier()
    model.fit(X, y)
    print(model.predict([[2.5,3.5]]))
    
  • Related Image (Search: "Gradient Boosting Diagram"):
    Gradient Boosting Concept

πŸ“ Neural Networks (Intro)

  • What: Layers of neurons that learn complex representations.
  • Use Cases: Image classification, sentiment analysis.
  • Code Example (Simple MLP):
    from sklearn.neural_network import MLPClassifier
    X = [[0,0],[1,0],[1,1],[0,1]]
    y = [0,0,1,1]
    mlp = MLPClassifier(hidden_layer_sizes=(5,), max_iter=500)
    mlp.fit(X, y)
    print(mlp.predict([[0.5,0.5]]))
    
  • Related Image (Search: "Neural Network Layers"):
    Neural Network

4. πŸ”Ž Unsupervised Learning Algorithms

πŸ“ k-Means Clustering

  • What: Divides data into k clusters based on similarity.
  • Use Cases: Customer segmentation, grouping documents.
  • Code Example:
    from sklearn.cluster import KMeans
    X = [[1,2],[1,3],[2,2],[5,8],[6,9],[5,7]]
    km = KMeans(n_clusters=2)
    km.fit(X)
    print(km.labels_)
    
  • Related Image (Search: "k-Means Visualization"):
    k-Means Clustering

πŸ“ DBSCAN

  • What: Density-based clustering that detects outliers.
  • Use Cases: Identifying unusual patterns, noise filtering.
  • Code Example:
    from sklearn.cluster import DBSCAN
    X = [[1,2],[1,3],[2,2],[10,10],[10,11],[11,10]]
    db = DBSCAN(eps=1.5, min_samples=2)
    db.fit(X)
    print(db.labels_)
    
  • Related Image (Search: "DBSCAN Clustering"):
    DBSCAN Clustering

πŸ“ Hierarchical Clustering

  • What: Builds a hierarchy of clusters without pre-specifying k.
  • Use Cases: Gene expression analysis, text analysis.
  • Code Example (Agglomerative):
    from sklearn.cluster import AgglomerativeClustering
    X = [[1,2],[1,3],[2,2],[5,8],[6,9],[5,7]]
    hc = AgglomerativeClustering(n_clusters=2)
    hc.fit(X)
    print(hc.labels_)
    
  • Related Image (Search: "Dendrogram Hierarchical Clustering"):
    Hierarchical Clustering Dendrogram

πŸ“ Principal Component Analysis (PCA)

  • What: Reduces dimensionality by projecting onto directions of max variance.
  • Use Cases: Visualization, noise reduction.
  • Code Example:
    from sklearn.decomposition import PCA
    X = [[1,2],[1,3],[2,2],[5,8],[6,9],[5,7]]
    pca = PCA(n_components=2)
    X_reduced = pca.fit_transform(X)
    print(X_reduced)
    
  • Related Image (Search: "PCA Visualization"):
    PCA Concept

πŸ“ t-SNE and UMAP

  • What: Non-linear methods for high-dimensional data visualization.
  • Use Cases: Visualizing word embeddings, image embeddings.
  • Code Example (t-SNE):
    from sklearn.manifold import TSNE
    X = [[1,2],[1,3],[2,2],[5,8],[6,9],[5,7]]
    tsne = TSNE(n_components=2, random_state=42)
    X_tsne = tsne.fit_transform(X)
    print(X_tsne)
    
  • Related Image (Search: "t-SNE Visualization"):
    t-SNE Visualization

5. πŸ”„ Semi-Supervised and Reinforcement Learning

πŸ“ Label Propagation (Semi-Supervised)

  • What: Spreads labels from a small labeled set to a larger unlabeled set.
  • Use Cases: Partially labeled datasets in text or image classification.
  • Code Example:
    from sklearn.semi_supervised import LabelPropagation
    X = [[0,0],[1,0],[1,1],[0,1]]
    y = [0, -1, 1, -1]
    lp = LabelPropagation()
    lp.fit(X, y)
    print(lp.predict(X))
    
  • Related Image (Search: "Semi-Supervised Label Propagation"):
    Label Propagation

πŸ“ Q-Learning (High-Level, RL)

  • What: Learns action values to maximize rewards over time.
  • Use Cases: Game AI, robotic control.
  • Note: Generally requires custom implementations or specialized libraries.
  • Related Image (Search: "Q Learning Diagram"):
    Q-Learning

6. πŸ’» Practical Examples and Use Cases

πŸ“ Regression Example (Predicting House Prices)

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

X = np.random.rand(100, 1)*10
y = 3*X.squeeze() + np.random.randn(100)*2
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("MSE:", mean_squared_error(y_test,y_pred))

πŸ“ Classification Example (Iris)

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

data = load_iris()
X_train,X_test,y_train,y_test = train_test_split(data.data,data.target,random_state=42)
clf = RandomForestClassifier()
clf.fit(X_train,y_train)
y_pred = clf.predict(X_test)
print("Accuracy:", accuracy_score(y_test,y_pred))

πŸ“ Clustering Example (Iris)

from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
from sklearn.metrics import adjusted_rand_score

data = load_iris()
X = data.data
km = KMeans(n_clusters=3,random_state=42)
labels = km.fit_predict(X)
print("ARI:", adjusted_rand_score(data.target,labels))

πŸ“ Dimensionality Reduction Example (PCA on Iris)

from sklearn.decomposition import PCA
X_pca = PCA(n_components=2).fit_transform(data.data)
print("Reduced shape:", X_pca.shape)

πŸ“ Evaluation Metrics

  • Regression: Mean Squared Error (MSE), Mean Absolute Error (MAE), RΒ².
  • Classification: Accuracy, Precision, Recall, F1-score, ROC AUC.
  • Clustering: Silhouette Score, Adjusted Rand Index.

7. πŸ“š Resources


8. πŸ’‘ Tips and Tricks

  • Start Simple: Begin with straightforward models before moving to complex algorithms.
  • Hyperparameter Tuning: Use GridSearchCV or RandomizedSearchCV to improve model performance.
  • Feature Engineering: Good features often matter more than fancy models.
  • Visualizations: Use libraries like matplotlib or seaborn to visualize data and model predictions.