|
1 | | -# ML_Intro |
2 | | -This is a one-day machine learning introductory course for beginners |
| 1 | + |
| 2 | + |
| 3 | +# Introduction to Machine Learning: One-Day Course |
| 4 | +This is a one-day machine learning introductory course for beginners. The course covers the basics of supervised and unsupervised learning, including regression, classification, clustering, dimensinality reduction and anomaly detection. It also includes hands-on exercises and examples using popular Machine Learning (ML) libraries like Scikit-learn. |
| 5 | + |
| 6 | +The [slides](presentation/ML_intro.pdf) are used to guide the instructor through the course, providing a structured outline of the topics to be covered. |
| 7 | + |
| 8 | +## Table of Contents |
| 9 | +1. [Introduction to Machine Learning](#1-introduction-to-machine-learning) |
| 10 | +2. [Understanding the Machine Learning Workflow](#2-understanding-the-machine-learning-workflow) |
| 11 | +3. [Supervised Learning](#3-supervised-learning) |
| 12 | + - [3.1 Regression](#31-regression) |
| 13 | + - [3.2 Classification](#32-classification) |
| 14 | +4. [Unsupervised Learning](#4-unsupervised-learning) |
| 15 | + - [4.1 Clustering](#41-clustering) |
| 16 | + - [4.2 Other Unsupervised Learning Techniques](#42-other-unsupervised-learning-techniques) |
| 17 | +5. [In-Class Assignment](#5-in-class-assignment) |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +## 1. Introduction to Machine Learning |
| 22 | +- **What is AI and ML?** |
| 23 | + - AI: The ability of machines to simulate intelligent behavior. |
| 24 | + - ML: A branch of AI where models are trained to learn from data and improve over time. |
| 25 | +- **Applications**: |
| 26 | + - ChatGPT, Netflix recommendations, fraud detection, self-driving cars, etc. |
| 27 | +- **Types of ML**: |
| 28 | + - Supervised Learning, Unsupervised Learning, Reinforcement Learning. |
| 29 | +- **Key Terminology**: |
| 30 | + - Dataset, Features, Labels, Model, Training, Testing, Hyperparameters, Overfitting, Underfitting. |
| 31 | + |
| 32 | +### Highlights |
| 33 | +- Comparison between supervised and unsupervised learning using Linear Regression and K-Means examples. |
| 34 | +- Basic visualizations of regression and clustering tasks. |
| 35 | + |
| 36 | + |
| 37 | +--- |
| 38 | + |
| 39 | +## 2. Understanding the Machine Learning Workflow |
| 40 | + |
| 41 | +### Steps in the Workflow |
| 42 | +1. **Define the Problem**: Set objectives (e.g., regression, classification). |
| 43 | +2. **Collect and Clean Data**: Handle missing values, duplicates, outliers. |
| 44 | +3. **Explore and Visualize Data**: Use summary statistics and visual tools like histograms and scatterplots. |
| 45 | +4. **Feature Engineering**: |
| 46 | + - **Selection**: Remove irrelevant features. |
| 47 | + - **Transformation**: Normalize and encode data. |
| 48 | + - **Creation**: Generate new features. |
| 49 | +5. **Split Data**: Divide into training, validation, and test sets. |
| 50 | +6. **Choose and Train a Model**: Select an algorithm based on the task. |
| 51 | +7. **Evaluate the Model**: Use metrics like RMSE, Accuracy, and Silhouette Score. |
| 52 | +8. **Hyperparameter Optimization**: Use `GridSearchCV` or `RandomizedSearchCV` for fine-tuning. |
| 53 | + |
| 54 | +### Highlights |
| 55 | +- End-to-end example of an ML pipeline using Scikit-learn. |
| 56 | +- Visualization of preprocessing and evaluation results. |
| 57 | + |
| 58 | +--- |
| 59 | + |
| 60 | +## 3. Supervised Learning |
| 61 | + |
| 62 | +### 3.1 Regression |
| 63 | + |
| 64 | +- **Goal**: Predict continuous outputs (e.g., house prices, temperature). |
| 65 | +- **Common Algorithms**: |
| 66 | + - Linear Regression, Polynomial Regression, Ridge, and Lasso. |
| 67 | +- **Evaluation Metrics**: |
| 68 | + - MAE, MSE, RMSE, \( R^2 \). |
| 69 | + |
| 70 | +#### Highlights |
| 71 | +- Hands-on example of Linear Regression with visualization of results. |
| 72 | +- Analysis of regression coefficients. |
| 73 | + |
| 74 | +--- |
| 75 | + |
| 76 | +### 3.2 Classification |
| 77 | + |
| 78 | +- **Goal**: Predict discrete categories (e.g., spam detection, disease diagnosis). |
| 79 | +- **Types**: |
| 80 | + - Binary, Multi-Class, Multi-Label Classification. |
| 81 | +- **Evaluation Metrics**: |
| 82 | + - Accuracy, Precision, Recall, F1-Score, Confusion Matrix. |
| 83 | + |
| 84 | +#### Highlights |
| 85 | +- Logistic Regression example for binary classification. |
| 86 | +- Hands-on exercise with Random Forest Classifier. |
| 87 | +- Visualization of confusion matrix results. |
| 88 | + |
| 89 | +--- |
| 90 | + |
| 91 | +## 4. Unsupervised Learning |
| 92 | + |
| 93 | +### 4.1 Clustering |
| 94 | + |
| 95 | +- **Goal**: Group data points into clusters based on similarity without labels. |
| 96 | +- **Types**: |
| 97 | + - Partition-Based: K-Means. |
| 98 | + - Hierarchical: Agglomerative Clustering. |
| 99 | + - Density-Based: DBSCAN. |
| 100 | +- **Evaluation Metrics**: |
| 101 | + - Silhouette Score, Inertia, Visualization. |
| 102 | + |
| 103 | +#### Highlights |
| 104 | +- K-Means Clustering example with synthetic data. |
| 105 | +- Visualizing clusters and centroids. |
| 106 | + |
| 107 | +--- |
| 108 | + |
| 109 | +### 4.2 Other Unsupervised Learning Techniques |
| 110 | + |
| 111 | +- **Dimensionality Reduction**: |
| 112 | + - Reduces input features while preserving patterns. |
| 113 | + - Example: PCA (Principal Component Analysis). |
| 114 | +- **Anomaly Detection**: |
| 115 | + - Identifies outliers or unusual patterns. |
| 116 | + - Examples: Isolation Forest, Z-scores. |
| 117 | +- **Association Rule Mining**: |
| 118 | + - Finds relationships between items (e.g., market basket analysis). |
| 119 | + |
| 120 | +#### Highlights |
| 121 | +- PCA visualization of high-dimensional data projected into 2D. |
| 122 | +- Hands-on example of Isolation Forest for anomaly detection. |
| 123 | +- Apriori algorithm for discovering association rules. |
| 124 | + |
| 125 | +--- |
| 126 | + |
| 127 | +## 5. In-Class Assignment |
| 128 | + |
| 129 | +- **Objective**: |
| 130 | + - Develop a classification model using a dataset of your choice. |
| 131 | + - Save the model as a pickle file. |
| 132 | +- **Steps**: |
| 133 | + - Preprocess the data (handle missing values, encode categorical variables). |
| 134 | + - Train, evaluate and optimize the model. |
| 135 | + - Submit the pickle file of the trained model. |
| 136 | + |
| 137 | +--- |
| 138 | + |
| 139 | +### Usage |
| 140 | +This ReadMe serves as: |
| 141 | +1. A reference for instructors to organize class materials. |
| 142 | +2. A guide for students reviewing ML concepts and techniques. |
| 143 | +3. A roadmap for practical ML workflows. |
| 144 | + |
| 145 | + |
| 146 | + |
| 147 | +### Acknowledgements |
| 148 | +Thanks to [**Leon Boschman**](https://github.com/lboschman) for contributing his ideas, slides and feedback to this course material. |
0 commit comments