-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
documentationImprovements or additions to documentationImprovements or additions to documentation
Description
Datasets are crucial for training any machine leraning model and we need a lot of it. Currently we only have identified one reporsitory which contains GAP files which can be seen in the code https://github.com/kiranbaby14/Analysis-of-GAP-programming-practices-on-GitHub/blob/main/scripts/get_GAP_files.py#L57.
But inorder for our model to be effective we need to train on different datasets otherwise there will be bias in the model. so identify different repositories that contains other programming languages like Java, JavaScript, Python, etc.. and inlcude them in the list in the code that is given in the link in the above sentence.
TASK
- Identify other repositories containing other languages
- rename the script from 'get_GAP_files' to an appropriate naming
also it might be better to comment the repositories first as reply in this issue
Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentationImprovements or additions to documentation