generated from carpentries-incubator/template
-
-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Thoughts after DUSC workshop 15/11/23:
- Using a different dataset that has a more even spread (50:50) between the two classes would make the bias-variance tradeoff clearer
- Also using a non-medical dataset would expand the usability
- Generally needs more programming tasks. If this lesson follows on from the intro to ML course then you can:
- Set the data pre-processing as a task
- Get learners to determine accuracy across test/training data from the get go
- Set a general task at the end to allow user to train on more or on different data types
- The course really could do with highlighting the benefits of random forests and gradient boosting. This can only be done by adding more features sooner.
- Reduce the amount of plotting. It effective early on to visualise decision trees but not effective and time consuming when comparing later models.
- Perhaps ignore gradient boosting entirely. It is skimmed over so fast it doesn't convey any of the benefits or differences over random forests.
- to show the power of random forests try running the model on highly correlated features
- Ideally the code should not be continually renaming the mdl variable, but create new variables for each model to help comparison
tompollard
Metadata
Metadata
Assignees
Labels
No labels