Skip to content

Post DUSC Instructor thoughts #24

@DimmestP

Description

@DimmestP

Thoughts after DUSC workshop 15/11/23:

  • Using a different dataset that has a more even spread (50:50) between the two classes would make the bias-variance tradeoff clearer
  • Also using a non-medical dataset would expand the usability
  • Generally needs more programming tasks. If this lesson follows on from the intro to ML course then you can:
  1. Set the data pre-processing as a task
  2. Get learners to determine accuracy across test/training data from the get go
  3. Set a general task at the end to allow user to train on more or on different data types
  • The course really could do with highlighting the benefits of random forests and gradient boosting. This can only be done by adding more features sooner.
  • Reduce the amount of plotting. It effective early on to visualise decision trees but not effective and time consuming when comparing later models.
  • Perhaps ignore gradient boosting entirely. It is skimmed over so fast it doesn't convey any of the benefits or differences over random forests.
  • to show the power of random forests try running the model on highly correlated features
  • Ideally the code should not be continually renaming the mdl variable, but create new variables for each model to help comparison

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions