From 97612ae589cc877fd7a44376115623dd82d039d8 Mon Sep 17 00:00:00 2001 From: Andrew Head Date: Thu, 12 Apr 2018 19:08:19 -0700 Subject: [PATCH] Update "file" -> "project" typo for Java dataset To my surprise, this dataset includes around 14,000 projects, with ~2,000,000 files. Wow!! --- datasets/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/datasets/index.md b/datasets/index.md index d2ac4f3..5cb34c9 100644 --- a/datasets/index.md +++ b/datasets/index.md @@ -34,7 +34,7 @@ The datasets here should not require sign-up for web services or writing emails

Java GitHub corpus

-

This dataset includes about 14'000 Java files from GitHub, split into training and test set. +

This dataset includes about 14'000 Java projects from GitHub, split into training and test set. The files are from open source projects that have been forked at least once.
[download dataset]