Q&A: 3

Machine Learning Concepts

What is a decision tree, and how does it work?

A decision tree is a supervised learning algorithm used for both classification and regression tasks. It splits the dataset into branches based on feature values, creating a tree-like model. Each node represents a feature, each branch represents a decision rule, and each leaf node represents an outcome.

What is regularization in machine learning?

Regularization is a technique used to prevent overfitting by adding a penalty term to the model’s loss function. This term discourages complex models, keeping the coefficients small. Common methods include L1 regularization (Lasso) and L2 regularization (Ridge).

What is the difference between bagging and boosting?

Bagging (Bootstrap Aggregating) is an ensemble technique that trains multiple models independently on random subsets of data and combines their outputs to improve accuracy and stability. Boosting is an ensemble method that trains models sequentially, with each new model correcting errors made by the previous ones, focusing on difficult cases.

What is a confusion matrix, and why is it important?

A confusion matrix is a table used to evaluate the performance of a classification model. It shows the true positives, true negatives, false positives, and false negatives. It provides insights into model accuracy, precision, recall, and other metrics.

What is the ROC curve?

The ROC (Receiver Operating Characteristic) curve is a graphical representation of a classification model’s performance across different thresholds. It plots the true positive rate against the false positive rate, showing the tradeoff between sensitivity and specificity. The area under the curve (AUC) measures model effectiveness.