Skip to content

Commit 5d685bc

Browse files
committed
Add hepatitis data set
1 parent 0d9e5a2 commit 5d685bc

File tree

6 files changed

+221
-0
lines changed

6 files changed

+221
-0
lines changed

R/hepatitis_docs.R

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
#' Hepatitis Data Set
2+
#'
3+
#' This data set contains information on folks that suffer from hepatitis.
4+
#'
5+
#' @format A data frame with 6497 observations (1599 Red and 4898 White) on the following 12 variables.
6+
#' - `class`
7+
#' - Die or Live
8+
#' - `age`
9+
#' - Integer
10+
#' - `sex`
11+
#' - Male, Female
12+
#' - `steroid`
13+
#' - No, Yes
14+
#' - `antivirals`
15+
#' - No, Yes
16+
#' - `fatigue`
17+
#' - No, Yes
18+
#' - `malaise`
19+
#' - No, Yes
20+
#' - `anorexia`
21+
#' - No, Yes
22+
#' - `liver_big`
23+
#' - No, Yes
24+
#' - `liver_firm`
25+
#' - No, Yes
26+
#' - `spleen_palpable`
27+
#' - No, Yes
28+
#' - `spiders`
29+
#' - No, Yes
30+
#' - `ascites`
31+
#' - No, Yes
32+
#' - `varices`
33+
#' - No, Yes
34+
#' - `bilirubin`
35+
#' - Numeric
36+
#' - This can also be treated as a factor
37+
#' - `alk_phosphate`
38+
#' - Integer
39+
#' - `sgot`
40+
#' - Integer
41+
#' - `albumin`
42+
#' - Numeric
43+
#' - `protime`
44+
#' - Integer
45+
#' - `histology`
46+
#' - No, Yes
47+
#' @source
48+
#' G.Gong (Carnegie-Mellon University) via
49+
#' Bojan Cestnik
50+
#' Jozef Stefan Institute
51+
#' Jamova 39
52+
#' 61000 Ljubljana
53+
#' Yugoslavia (tel.: (38)(+61) 214-399 ext.287)
54+
#'
55+
#' @references
56+
#' <https://archive.ics.uci.edu/ml/machine-learning-databases/hepatitis/hepatitis.data>
57+
#' <https://archive.ics.uci.edu/ml/datasets/hepatitis>
58+
"hepatitis"

README.Rmd

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ The following data sets are included in the `ucidata` package:
7777
- [`bcw_original` (Breast Cancer Wisconsin Original)](https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset)
7878
- [`bike_sharing_daily`](https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset)
7979
- [`bridges`](https://archive.ics.uci.edu/ml/datasets/Pittsburgh+Bridges)
80+
- [`hepatitis`](https://archive.ics.uci.edu/ml/datasets/hepatitis)
8081
- [`wine`](https://archive.ics.uci.edu/ml/datasets/wine)
8182

8283
## Build Scripts

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ The following data sets are included in the `ucidata` package:
6161
- [`bcw_original` (Breast Cancer Wisconsin Original)](https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset)
6262
- [`bike_sharing_daily`](https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset)
6363
- [`bridges`](https://archive.ics.uci.edu/ml/datasets/Pittsburgh+Bridges)
64+
- [`hepatitis`](https://archive.ics.uci.edu/ml/datasets/hepatitis)
6465
- [`wine`](https://archive.ics.uci.edu/ml/datasets/wine)
6566

6667
Build Scripts

data-raw/hepatitis_build.R

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
## UCI Data
2+
# Hepatitis Data http://archive.ics.uci.edu/ml/machine-learning-databases/hepatitis/hepatitis.data
3+
4+
url_hepatitis = "http://archive.ics.uci.edu/ml/machine-learning-databases/hepatitis/hepatitis.data"
5+
6+
hepatitis = read.csv(url_hepatitis,
7+
header = FALSE, na.strings = "?")
8+
9+
# Columns taken verbatim from ML page
10+
# Regex search with: [0-9]{1,2}\. (.*):.*
11+
# Replacement: "\1",
12+
var_names = c(
13+
"Class",
14+
"AGE",
15+
"SEX",
16+
"STEROID",
17+
"ANTIVIRALS",
18+
"FATIGUE",
19+
"MALAISE",
20+
"ANOREXIA",
21+
"LIVER BIG",
22+
"LIVER FIRM",
23+
"SPLEEN PALPABLE",
24+
"SPIDERS",
25+
"ASCITES",
26+
"VARICES",
27+
"BILIRUBIN",
28+
"ALK PHOSPHATE",
29+
"SGOT",
30+
"ALBUMIN",
31+
"PROTIME",
32+
"HISTOLOGY"
33+
)
34+
35+
var_names_safe = gsub("[[:space:]]", "_", var_names)
36+
37+
# Label columns
38+
colnames(hepatitis) = tolower(var_names_safe)
39+
40+
# Make into a dichotomous variable marked by a factor
41+
hepatitis[, c(4:14, 20)] = lapply(hepatitis[, c(4:14, 20)], factor, labels = c("No", "Yes"))
42+
43+
# Switch to being factor based
44+
hepatitis = within(hepatitis,{
45+
class = factor(class, labels = c("Die", "Live"))
46+
sex = factor(sex, labels = c("Male", "Female"))
47+
})
48+
49+
devtools::use_data(hepatitis, overwrite = TRUE)
50+
51+
## output colnames
52+
cat(paste0(colnames(hepatitis),"\n"), sep="")

data/hepatitis.rda

2 KB
Binary file not shown.

man/hepatitis.Rd

Lines changed: 109 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)