Skip to content

Identify Repostories for ML model training #11

@kiranbaby14

Description

@kiranbaby14

Datasets are crucial for training any machine leraning model and we need a lot of it. Currently we only have identified one reporsitory which contains GAP files which can be seen in the code https://github.com/kiranbaby14/Analysis-of-GAP-programming-practices-on-GitHub/blob/main/scripts/get_GAP_files.py#L57.

But inorder for our model to be effective we need to train on different datasets otherwise there will be bias in the model. so identify different repositories that contains other programming languages like Java, JavaScript, Python, etc.. and inlcude them in the list in the code that is given in the link in the above sentence.

TASK

  • Identify other repositories containing other languages
  • rename the script from 'get_GAP_files' to an appropriate naming

also it might be better to comment the repositories first as reply in this issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions