Skip to content

Commit c088f81

Browse files
committed
update readme
1 parent bf8f419 commit c088f81

File tree

3 files changed

+148
-2
lines changed

3 files changed

+148
-2
lines changed

README.md

Lines changed: 148 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,148 @@
1-
# ML_Intro
2-
This is a one-day machine learning introductory course for beginners
1+
2+
3+
# Introduction to Machine Learning: One-Day Course
4+
This is a one-day machine learning introductory course for beginners. The course covers the basics of supervised and unsupervised learning, including regression, classification, clustering, dimensinality reduction and anomaly detection. It also includes hands-on exercises and examples using popular Machine Learning (ML) libraries like Scikit-learn.
5+
6+
The [slides](presentation/ML_intro.pdf) are used to guide the instructor through the course, providing a structured outline of the topics to be covered.
7+
8+
## Table of Contents
9+
1. [Introduction to Machine Learning](#1-introduction-to-machine-learning)
10+
2. [Understanding the Machine Learning Workflow](#2-understanding-the-machine-learning-workflow)
11+
3. [Supervised Learning](#3-supervised-learning)
12+
- [3.1 Regression](#31-regression)
13+
- [3.2 Classification](#32-classification)
14+
4. [Unsupervised Learning](#4-unsupervised-learning)
15+
- [4.1 Clustering](#41-clustering)
16+
- [4.2 Other Unsupervised Learning Techniques](#42-other-unsupervised-learning-techniques)
17+
5. [In-Class Assignment](#5-in-class-assignment)
18+
19+
---
20+
21+
## 1. Introduction to Machine Learning
22+
- **What is AI and ML?**
23+
- AI: The ability of machines to simulate intelligent behavior.
24+
- ML: A branch of AI where models are trained to learn from data and improve over time.
25+
- **Applications**:
26+
- ChatGPT, Netflix recommendations, fraud detection, self-driving cars, etc.
27+
- **Types of ML**:
28+
- Supervised Learning, Unsupervised Learning, Reinforcement Learning.
29+
- **Key Terminology**:
30+
- Dataset, Features, Labels, Model, Training, Testing, Hyperparameters, Overfitting, Underfitting.
31+
32+
### Highlights
33+
- Comparison between supervised and unsupervised learning using Linear Regression and K-Means examples.
34+
- Basic visualizations of regression and clustering tasks.
35+
36+
37+
---
38+
39+
## 2. Understanding the Machine Learning Workflow
40+
41+
### Steps in the Workflow
42+
1. **Define the Problem**: Set objectives (e.g., regression, classification).
43+
2. **Collect and Clean Data**: Handle missing values, duplicates, outliers.
44+
3. **Explore and Visualize Data**: Use summary statistics and visual tools like histograms and scatterplots.
45+
4. **Feature Engineering**:
46+
- **Selection**: Remove irrelevant features.
47+
- **Transformation**: Normalize and encode data.
48+
- **Creation**: Generate new features.
49+
5. **Split Data**: Divide into training, validation, and test sets.
50+
6. **Choose and Train a Model**: Select an algorithm based on the task.
51+
7. **Evaluate the Model**: Use metrics like RMSE, Accuracy, and Silhouette Score.
52+
8. **Hyperparameter Optimization**: Use `GridSearchCV` or `RandomizedSearchCV` for fine-tuning.
53+
54+
### Highlights
55+
- End-to-end example of an ML pipeline using Scikit-learn.
56+
- Visualization of preprocessing and evaluation results.
57+
58+
---
59+
60+
## 3. Supervised Learning
61+
62+
### 3.1 Regression
63+
64+
- **Goal**: Predict continuous outputs (e.g., house prices, temperature).
65+
- **Common Algorithms**:
66+
- Linear Regression, Polynomial Regression, Ridge, and Lasso.
67+
- **Evaluation Metrics**:
68+
- MAE, MSE, RMSE, \( R^2 \).
69+
70+
#### Highlights
71+
- Hands-on example of Linear Regression with visualization of results.
72+
- Analysis of regression coefficients.
73+
74+
---
75+
76+
### 3.2 Classification
77+
78+
- **Goal**: Predict discrete categories (e.g., spam detection, disease diagnosis).
79+
- **Types**:
80+
- Binary, Multi-Class, Multi-Label Classification.
81+
- **Evaluation Metrics**:
82+
- Accuracy, Precision, Recall, F1-Score, Confusion Matrix.
83+
84+
#### Highlights
85+
- Logistic Regression example for binary classification.
86+
- Hands-on exercise with Random Forest Classifier.
87+
- Visualization of confusion matrix results.
88+
89+
---
90+
91+
## 4. Unsupervised Learning
92+
93+
### 4.1 Clustering
94+
95+
- **Goal**: Group data points into clusters based on similarity without labels.
96+
- **Types**:
97+
- Partition-Based: K-Means.
98+
- Hierarchical: Agglomerative Clustering.
99+
- Density-Based: DBSCAN.
100+
- **Evaluation Metrics**:
101+
- Silhouette Score, Inertia, Visualization.
102+
103+
#### Highlights
104+
- K-Means Clustering example with synthetic data.
105+
- Visualizing clusters and centroids.
106+
107+
---
108+
109+
### 4.2 Other Unsupervised Learning Techniques
110+
111+
- **Dimensionality Reduction**:
112+
- Reduces input features while preserving patterns.
113+
- Example: PCA (Principal Component Analysis).
114+
- **Anomaly Detection**:
115+
- Identifies outliers or unusual patterns.
116+
- Examples: Isolation Forest, Z-scores.
117+
- **Association Rule Mining**:
118+
- Finds relationships between items (e.g., market basket analysis).
119+
120+
#### Highlights
121+
- PCA visualization of high-dimensional data projected into 2D.
122+
- Hands-on example of Isolation Forest for anomaly detection.
123+
- Apriori algorithm for discovering association rules.
124+
125+
---
126+
127+
## 5. In-Class Assignment
128+
129+
- **Objective**:
130+
- Develop a classification model using a dataset of your choice.
131+
- Save the model as a pickle file.
132+
- **Steps**:
133+
- Preprocess the data (handle missing values, encode categorical variables).
134+
- Train, evaluate and optimize the model.
135+
- Submit the pickle file of the trained model.
136+
137+
---
138+
139+
### Usage
140+
This ReadMe serves as:
141+
1. A reference for instructors to organize class materials.
142+
2. A guide for students reviewing ML concepts and techniques.
143+
3. A roadmap for practical ML workflows.
144+
145+
146+
147+
### Acknowledgements
148+
Thanks to [**Leon Boschman**](https://github.com/lboschman) for contributing his ideas, slides and feedback to this course material.

model.pkl

-553 KB
Binary file not shown.

presentation/ML_intro.pdf

2.55 MB
Binary file not shown.

0 commit comments

Comments
 (0)