diff --git a/_freeze/learn/models/parsnip-predictions/index/execute-results/html.json b/_freeze/learn/models/parsnip-predictions/index/execute-results/html.json
new file mode 100644
index 00000000..9099464d
--- /dev/null
+++ b/_freeze/learn/models/parsnip-predictions/index/execute-results/html.json
@@ -0,0 +1,17 @@
+{
+ "hash": "0fa75413e84db534cedd43cb05c12d53",
+ "result": {
+ "engine": "knitr",
+ "markdown": "---\ntitle: \"Fitting and predicting with parsnip\"\ncategories:\n - model fitting\n - parsnip\n - regression\n - classification\ntype: learn-subsection\nweight: 1\ndescription: | \n Examples that show how to fit and predict with different combinations of model, mode, and engine.\ntoc: true\ntoc-depth: 3\ninclude-after-body: ../../../resources.html\nformat:\n html:\n theme: [\"style.scss\"]\n---\n\n\n\n\n\n\n# Introduction\n\nThis page shows examples of how to *fit* and *predict* with different combinations of model, mode, and engine. As a reminder, in parsnip, \n\n- the **model type** differentiates basic modeling approaches, such as random forests, logistic regression, linear support vector machines, etc.,\n\n- the **mode** denotes in what kind of modeling context it will be used (most commonly, classification or regression), and\n\n- the computational **engine** indicates how the model is fit, such as with a specific R package implementation or even methods outside of R like Keras or Stan.\n\nWe'll break the examples up by their mode. For each model, we'll show different data sets used across the different engines. \n\nTo use code in this article, you will need to install the following packages: agua, baguette, bonsai, censored, discrim, HSAUR3, lme4, multilevelmod, plsmod, poissonreg, prodlim, rules, sparklyr, survival, and tidymodels. There are numerous other \"engine\" packages that are required. If you use a model that is missing one or more installed packages, parsnip will prompt you to install them. There are some packages that require non-standard installation or rely on external dependencies. We'll describe these next. \n\n## External Dependencies\n\nSome models available in parsnip use other computational frameworks for computations. There may be some additional downloads for engines using **catboost**, **Spark**, **h2o**, **tensorflow**/**keras**, and **torch**. You can expand the sections below to get basic installation instructions.\n\n\n\n### catboost\n\ncatboost is a popular boosting framework. Unfortunately, the R package is not available on CRAN. First, go to [https://github.com/catboost/catboost/releases/](\"https://github.com/catboost/catboost/releases/) and search for \"`[R-package]`\" to find the most recent release. \n\nThe following code and be used to install and test the package (which requires the glue package to be installed): \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(glue)\n\n# Put the current version number in this variable: \nversion_number <- \"#.##\"\n\ntemplate <- \"https://github.com/catboost/catboost/releases/download/v{version}/catboost-R-darwin-universal2-{version}.tgz\"\n\ntarget_url <- glue::glue(template)\ntarget_dest <- tempfile()\ndownload.file(target_url, target_dest)\n\nif (grepl(\"^mac\", .Platform$pkgType)) {\n options <- \"--no-staged-install\"\n} else {\n options <- character(0)\n}\n\ninst <- glue::glue(\"R CMD INSTALL {options} {target_dest}\")\nsystem(inst)\n```\n:::\n\n\nTo test, fit an example model: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(catboost)\n\ntrain_pool_path <- system.file(\"extdata\", \"adult_train.1000\", package = \"catboost\")\ntest_pool_path <- system.file(\"extdata\", \"adult_test.1000\", package = \"catboost\")\ncd_path <- system.file(\"extdata\", \"adult.cd\", package = \"catboost\")\ntrain_pool <- catboost.load_pool(train_pool_path, column_description = cd_path)\ntest_pool <- catboost.load_pool(test_pool_path, column_description = cd_path)\nfit_params <- list(\n iterations = 100,\n loss_function = 'Logloss',\n ignored_features = c(4, 9),\n border_count = 32,\n depth = 5,\n learning_rate = 0.03,\n l2_leaf_reg = 3.5,\n train_dir = tempdir())\nfit_params\n```\n:::\n\n\n### Apache Spark\n\nTo use [Apache Spark](https://spark.apache.org/) as an engine, we will first install Spark and then need a connection to a cluster. For this article, we will set up and use a single-node Spark cluster running on a laptop.\n\nTo install, first install sparklyr:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ninstall.packages(\"sparklyr\")\n```\n:::\n\n\nand then install the Spark backend. For example, you might use: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(sparklyr)\nspark_install(version = \"4.0\")\n```\n:::\n\n\nOnce that is working, you can get ready to fit models using: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(sparklyr)\nsc <- spark_connect(\"local\")\n#> Warning in sprintf(version$pattern, version$spark, version$hadoop): 2 arguments\n#> not used by format 'spark-4.1.0-preview3-bin-hadoop3'\n```\n:::\n\n\n### h2o \n\nh2o.ai offers a Java-based high-performance computing server for machine learning. This can be run locally or externally. There are general installation instructions at [https://docs.h2o.ai/](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/downloading.html). There is a package on CRAN, but you can also install directly from [h2o](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/downloading.html#install-in-r) via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ninstall.packages(\n \"h2o\",\n type = \"source\",\n repos = \"http://h2o-release.s3.amazonaws.com/h2o/latest_stable_R\"\n)\n```\n:::\n\n\nAfter installation is complete, you can start a local server via `h2o::h2o.init()`. \n\nThe tidymodels [agua](https://agua.tidymodels.org/) package contains some helpers and will also need to be installed. You can use its function to start a server too:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(agua)\n#> \n#> Attaching package: 'agua'\n#> The following object is masked from 'package:workflowsets':\n#> \n#> rank_results\nh2o_start()\n#> Warning: JAVA not found, H2O may take minutes trying to connect.\n#> Warning in h2o.clusterInfo(): \n#> Your H2O cluster version is (1 year, 11 months and 5 days) old. There may be a newer version available.\n#> Please download and install the latest version from: https://h2o-release.s3.amazonaws.com/h2o/latest_stable.html\n```\n:::\n\n\n### Tensorflow and Keras\n\nR's tensorflow and keras3 packages call Python directly. To enable this, you'll have to install two R packages: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ninstall.packages(\"keras3\")\n```\n:::\n\n\nOnce that is done, use: \n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nkeras3::install_keras(backend = \"tensorflow\")\n```\n:::\n\n\nThere are other options for installation. See [https://tensorflow.rstudio.com/install/index.html](https://tensorflow.rstudio.com/install/index.html) for more details. \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Assumes you are going to use a virtual environment called \npve <- grep(\"tensorflow\", reticulate::virtualenv_list(), value = TRUE)\nreticulate::use_virtualenv(pve)\n```\n:::\n\n\n### Torch\n\nR's torch package is the low-level package containing the framework. Once you have installed it, you will get this message the first time you load the package: \n\n> Additional software needs to be downloaded and installed for torch to work correctly.\"\n\nChoosing \"Yes\" will do the _one-time_ installation. \n\n \n\nTo get started, let's load the tidymodels package: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(tidymodels)\ntheme_set(theme_bw() + theme(legend.position = \"top\"))\n```\n:::\n\n\n# Classification Models\n\nTo demonstrate classification, let's make a small training and test sets for a binary outcome. We'll center and scale the data since some models require the same units.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nset.seed(207)\nbin_split <- \n\tmodeldata::two_class_dat |> \n\trename(class = Class) |> \n\tinitial_split(prop = 0.994, strata = class)\nbin_split\n#> \n#> <785/6/791>\n\nbin_rec <- \n recipe(class ~ ., data = training(bin_split)) |> \n step_normalize(all_numeric_predictors()) |> \n prep()\n\nbin_train <- bake(bin_rec, new_data = NULL)\nbin_test <- bake(bin_rec, new_data = testing(bin_split))\n```\n:::\n\n\nFor models that _only_ work for three or more classes, we'll simulate:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nset.seed(1752)\nmtl_data <-\n sim_multinomial(\n 200,\n ~ -0.5 + 0.6 * abs(A),\n ~ ifelse(A > 0 & B > 0, 1.0 + 0.2 * A / B, - 2),\n ~ A + B - A * B)\n\nmtl_split <- initial_split(mtl_data, prop = 0.967, strata = class)\nmtl_split\n#> \n#> <192/8/200>\n\n# Predictors are in the same units\nmtl_train <- training(mtl_split)\nmtl_test <- testing(mtl_split)\n```\n:::\n\n\nFinally, we have some models that handle hierarchical data, where some rows are statistically correlated with other rows. For these examples, we'll use data from a clinical trial where patients were followed over time. The outcome is binary. The data are in the HSAUR3 package. We'll split these data in a way where all rows for a specific subject are either in the training or test sets: \n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nset.seed(72)\ncls_group_split <- \n HSAUR3::toenail |> \n group_initial_split(group = patientID)\ncls_group_train <- training(cls_group_split)\ncls_group_test <- testing(cls_group_split)\n```\n:::\n\n\nThere are 219 subjects in the training set and 75 in the test set. \n\nIf using the **Apache Spark** engine, we will need to identify the data source and then use it to create the splits. For this article, we will copy the `two_class_dat` and the `mtl_data` data sets into the Spark session.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(sparklyr)\nsc <- spark_connect(\"local\")\n#> Re-using existing Spark connection to local\n\ntbl_two_class <- copy_to(sc, modeldata::two_class_dat)\n\ntbl_bin <- sdf_random_split(tbl_two_class, training = 0.994, test = 1-0.994, seed = 100)\n\ntbl_sim_mtl <- copy_to(sc, mtl_data)\n\ntbl_mtl <- sdf_random_split(tbl_sim_mtl, training = 0.967, test = 1-0.967, seed = 100)\n```\n:::\n\n\n\n## Bagged MARS (`bag_mars()`) \n\n:::{.panel-tabset}\n\n## `earth` \n\nThis engine requires the baguette extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(baguette)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nbag_mars_spec <- bag_mars() |>\n # We need to set the mode since this engine works with multiple modes\n # and earth is the default engine so there is no need to set that either.\n set_mode(\"classification\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(268)\nbag_mars_fit <- bag_mars_spec |> fit(class ~ ., data = bin_train)\n#> \n#> Attaching package: 'plotrix'\n#> The following object is masked from 'package:scales':\n#> \n#> rescale\n#> Registered S3 method overwritten by 'butcher':\n#> method from \n#> as.character.dev_topic generics\nbag_mars_fit\n#> parsnip model object\n#> \n#> Bagged MARS (classification with 11 members)\n#> \n#> Variable importance scores include:\n#> \n#> # A tibble: 2 × 4\n#> term value std.error used\n#> \n#> 1 B 100 0 11\n#> 2 A 40.4 1.60 11\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(bag_mars_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(bag_mars_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.452 0.548 \n#> 2 0.854 0.146 \n#> 3 0.455 0.545 \n#> 4 0.968 0.0316\n#> 5 0.939 0.0610\n#> 6 0.872 0.128\n```\n:::\n\n\n:::\n\n## Bagged Neural Networks (`bag_mlp()`) \n\n:::{.panel-tabset}\n\n## `nnet` \n\nThis engine requires the baguette extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(baguette)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nbag_mlp_spec <- bag_mlp() |>\n # We need to set the mode since this engine works with multiple modes\n # and nnet is the default engine so there is no need to set that either.\n set_mode(\"classification\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(318)\nbag_mlp_fit <- bag_mlp_spec |> fit(class ~ ., data = bin_train)\nbag_mlp_fit\n#> parsnip model object\n#> \n#> Bagged nnet (classification with 11 members)\n#> \n#> Variable importance scores include:\n#> \n#> # A tibble: 2 × 4\n#> term value std.error used\n#> \n#> 1 A 52.1 2.16 11\n#> 2 B 47.9 2.16 11\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(bag_mlp_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(bag_mlp_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.439 0.561\n#> 2 0.676 0.324\n#> 3 0.428 0.572\n#> 4 0.727 0.273\n#> 5 0.709 0.291\n#> 6 0.660 0.340\n```\n:::\n\n\n:::\n\n## Bagged Decision Trees (`bag_tree()`) \n\n:::{.panel-tabset}\n\n## `rpart` \n\nThis engine requires the baguette extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(baguette)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nbag_tree_spec <- bag_tree() |>\n # We need to set the mode since this engine works with multiple modes\n # and rpart is the default engine so there is no need to set that either.\n set_mode(\"classification\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(985)\nbag_tree_fit <- bag_tree_spec |> fit(class ~ ., data = bin_train)\nbag_tree_fit\n#> parsnip model object\n#> \n#> Bagged CART (classification with 11 members)\n#> \n#> Variable importance scores include:\n#> \n#> # A tibble: 2 × 4\n#> term value std.error used\n#> \n#> 1 B 271. 4.35 11\n#> 2 A 237. 5.58 11\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(bag_tree_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(bag_tree_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0 1 \n#> 2 1 0 \n#> 3 0.0909 0.909 \n#> 4 1 0 \n#> 5 0.727 0.273 \n#> 6 0.909 0.0909\n```\n:::\n\n\n## `C5.0` \n\nThis engine requires the baguette extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(baguette)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nbag_tree_spec <- bag_tree() |> \n set_mode(\"classification\") |> \n set_engine(\"C5.0\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(937)\nbag_tree_fit <- bag_tree_spec |> fit(class ~ ., data = bin_train)\nbag_tree_fit\n#> parsnip model object\n#> \n#> Bagged C5.0 (classification with 11 members)\n#> \n#> Variable importance scores include:\n#> \n#> # A tibble: 2 × 4\n#> term value std.error used\n#> \n#> 1 B 100 0 11\n#> 2 A 48.7 7.33 11\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(bag_tree_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(bag_tree_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.269 0.731\n#> 2 0.863 0.137\n#> 3 0.259 0.741\n#> 4 0.897 0.103\n#> 5 0.897 0.103\n#> 6 0.870 0.130\n```\n:::\n\n\n:::\n\n## Bayesian Additive Regression Trees (`bart()`) \n\n:::{.panel-tabset}\n\n## `dbarts` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nbart_spec <- bart() |>\n # We need to set the mode since this engine works with multiple modes\n # and dbarts is the default engine so there is no need to set that either.\n set_mode(\"classification\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(217)\nbart_fit <- bart_spec |> fit(class ~ ., data = bin_train)\nbart_fit\n#> parsnip model object\n#> \n#> \n#> Call:\n#> `NULL`()\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(bart_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(bart_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.439 0.561\n#> 2 0.734 0.266\n#> 3 0.34 0.66 \n#> 4 0.957 0.043\n#> 5 0.931 0.069\n#> 6 0.782 0.218\npredict(bart_fit, type = \"conf_int\", new_data = bin_test)\n#> # A tibble: 6 × 4\n#> .pred_lower_Class1 .pred_lower_Class2 .pred_upper_Class1 .pred_upper_Class2\n#> \n#> 1 0.815 0.00280 0.997 0.185\n#> 2 0.781 0.0223 0.978 0.219\n#> 3 0.558 0.0702 0.930 0.442\n#> 4 0.540 0.105 0.895 0.460\n#> 5 0.239 0.345 0.655 0.761\n#> 6 0.195 0.469 0.531 0.805\npredict(bart_fit, type = \"pred_int\", new_data = bin_test)\n#> # A tibble: 6 × 4\n#> .pred_lower_Class1 .pred_lower_Class2 .pred_upper_Class1 .pred_upper_Class2\n#> \n#> 1 0 0 1 1\n#> 2 0 0 1 1\n#> 3 0 0 1 1\n#> 4 0 0 1 1\n#> 5 0 0 1 1\n#> 6 0 0 1 1\n```\n:::\n\n\n:::\n\n## Boosted Decision Trees (`boost_tree()`) \n\n:::{.panel-tabset}\n\n## `xgboost` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nboost_tree_spec <- boost_tree() |>\n # We need to set the mode since this engine works with multiple modes\n # and xgboost is the default engine so there is no need to set that either.\n set_mode(\"classification\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(738)\nboost_tree_fit <- boost_tree_spec |> fit(class ~ ., data = bin_train)\nboost_tree_fit\n#> parsnip model object\n#> \n#> ##### xgb.Booster\n#> raw: 40.4 Kb \n#> call:\n#> xgboost::xgb.train(params = list(eta = 0.3, max_depth = 6, gamma = 0, \n#> colsample_bytree = 1, colsample_bynode = 1, min_child_weight = 1, \n#> subsample = 1), data = x$data, nrounds = 15, watchlist = x$watchlist, \n#> verbose = 0, nthread = 1, objective = \"binary:logistic\")\n#> params (as set within xgb.train):\n#> eta = \"0.3\", max_depth = \"6\", gamma = \"0\", colsample_bytree = \"1\", colsample_bynode = \"1\", min_child_weight = \"1\", subsample = \"1\", nthread = \"1\", objective = \"binary:logistic\", validate_parameters = \"TRUE\"\n#> xgb.attributes:\n#> niter\n#> callbacks:\n#> cb.evaluation.log()\n#> # of features: 2 \n#> niter: 15\n#> nfeatures : 2 \n#> evaluation_log:\n#> iter training_logloss\n#> \n#> 1 0.5546750\n#> 2 0.4719804\n#> --- ---\n#> 14 0.2587640\n#> 15 0.2528938\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(boost_tree_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(boost_tree_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.244 0.756 \n#> 2 0.770 0.230 \n#> 3 0.307 0.693 \n#> 4 0.944 0.0565\n#> 5 0.821 0.179 \n#> 6 0.938 0.0621\n```\n:::\n\n\n## `C5.0` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nboost_tree_spec <- boost_tree() |> \n set_mode(\"classification\") |> \n set_engine(\"C5.0\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(984)\nboost_tree_fit <- boost_tree_spec |> fit(class ~ ., data = bin_train)\nboost_tree_fit\n#> parsnip model object\n#> \n#> \n#> Call:\n#> C5.0.default(x = x, y = y, trials = 15, control = C50::C5.0Control(minCases\n#> = 2, sample = 0))\n#> \n#> Classification Tree\n#> Number of samples: 785 \n#> Number of predictors: 2 \n#> \n#> Number of boosting iterations: 15 requested; 7 used due to early stopping\n#> Average tree size: 3.1 \n#> \n#> Non-standard options: attempt to group attributes\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(boost_tree_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(boost_tree_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.307 0.693\n#> 2 0.756 0.244\n#> 3 0.281 0.719\n#> 4 1 0 \n#> 5 1 0 \n#> 6 0.626 0.374\n```\n:::\n\n\n## `catboost` \n\nThis engine requires the bonsai extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(bonsai)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nboost_tree_spec <- boost_tree() |>\n # We need to set the mode since this engine works with multiple modes\n set_mode(\"classification\") |>\n set_engine(\"catboost\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(644)\nboost_tree_fit <- boost_tree_spec |> fit(class ~ ., data = bin_train)\nboost_tree_fit\n#> parsnip model object\n#> \n#> CatBoost model (1000 trees)\n#> Loss function: Logloss\n#> Fit to 2 feature(s)\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(boost_tree_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(boost_tree_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.291 0.709 \n#> 2 0.836 0.164 \n#> 3 0.344 0.656 \n#> 4 0.998 0.00245\n#> 5 0.864 0.136 \n#> 6 0.902 0.0983\n```\n:::\n\n\n## `h2o` \n\nThis engine requires the agua extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(agua)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nboost_tree_spec <- boost_tree() |>\n # We need to set the mode since this engine works with multiple modes\n set_mode(\"classification\") |>\n set_engine(\"h2o_gbm\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(186)\nboost_tree_fit <- boost_tree_spec |> fit(class ~ ., data = bin_train)\nboost_tree_fit\n#> parsnip model object\n#> \n#> Model Details:\n#> ==============\n#> \n#> H2OBinomialModel: gbm\n#> Model ID: GBM_model_R_1763571327438_5515 \n#> Model Summary: \n#> number_of_trees number_of_internal_trees model_size_in_bytes min_depth\n#> 1 50 50 25377 6\n#> max_depth mean_depth min_leaves max_leaves mean_leaves\n#> 1 6 6.00000 21 55 35.70000\n#> \n#> \n#> H2OBinomialMetrics: gbm\n#> ** Reported on training data. **\n#> \n#> MSE: 0.007948832\n#> RMSE: 0.08915622\n#> LogLoss: 0.05942305\n#> Mean Per-Class Error: 0\n#> AUC: 1\n#> AUCPR: 1\n#> Gini: 1\n#> R^2: 0.9678452\n#> \n#> Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:\n#> Class1 Class2 Error Rate\n#> Class1 434 0 0.000000 =0/434\n#> Class2 0 351 0.000000 =0/351\n#> Totals 434 351 0.000000 =0/785\n#> \n#> Maximum Metrics: Maximum metrics at their respective thresholds\n#> metric threshold value idx\n#> 1 max f1 0.598690 1.000000 200\n#> 2 max f2 0.598690 1.000000 200\n#> 3 max f0point5 0.598690 1.000000 200\n#> 4 max accuracy 0.598690 1.000000 200\n#> 5 max precision 0.998192 1.000000 0\n#> 6 max recall 0.598690 1.000000 200\n#> 7 max specificity 0.998192 1.000000 0\n#> 8 max absolute_mcc 0.598690 1.000000 200\n#> 9 max min_per_class_accuracy 0.598690 1.000000 200\n#> 10 max mean_per_class_accuracy 0.598690 1.000000 200\n#> 11 max tns 0.998192 434.000000 0\n#> 12 max fns 0.998192 349.000000 0\n#> 13 max fps 0.000831 434.000000 399\n#> 14 max tps 0.598690 351.000000 200\n#> 15 max tnr 0.998192 1.000000 0\n#> 16 max fnr 0.998192 0.994302 0\n#> 17 max fpr 0.000831 1.000000 399\n#> 18 max tpr 0.598690 1.000000 200\n#> \n#> Gains/Lift Table: Extract with `h2o.gainsLift(, )` or `h2o.gainsLift(, valid=, xval=)`\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(boost_tree_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(boost_tree_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.0496 0.950 \n#> 2 0.905 0.0953 \n#> 3 0.0738 0.926 \n#> 4 0.997 0.00273\n#> 5 0.979 0.0206 \n#> 6 0.878 0.122\n```\n:::\n\n\n## `h2o_gbm` \n\nThis engine requires the agua extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(agua)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nboost_tree_spec <- boost_tree() |>\n # We need to set the mode since this engine works with multiple modes\n set_mode(\"classification\") |>\n set_engine(\"h2o_gbm\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(724)\nboost_tree_fit <- boost_tree_spec |> fit(class ~ ., data = bin_train)\nboost_tree_fit\n#> parsnip model object\n#> \n#> Model Details:\n#> ==============\n#> \n#> H2OBinomialModel: gbm\n#> Model ID: GBM_model_R_1763571327438_5567 \n#> Model Summary: \n#> number_of_trees number_of_internal_trees model_size_in_bytes min_depth\n#> 1 50 50 25378 6\n#> max_depth mean_depth min_leaves max_leaves mean_leaves\n#> 1 6 6.00000 21 55 35.70000\n#> \n#> \n#> H2OBinomialMetrics: gbm\n#> ** Reported on training data. **\n#> \n#> MSE: 0.007948832\n#> RMSE: 0.08915622\n#> LogLoss: 0.05942305\n#> Mean Per-Class Error: 0\n#> AUC: 1\n#> AUCPR: 1\n#> Gini: 1\n#> R^2: 0.9678452\n#> \n#> Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:\n#> Class1 Class2 Error Rate\n#> Class1 434 0 0.000000 =0/434\n#> Class2 0 351 0.000000 =0/351\n#> Totals 434 351 0.000000 =0/785\n#> \n#> Maximum Metrics: Maximum metrics at their respective thresholds\n#> metric threshold value idx\n#> 1 max f1 0.598690 1.000000 200\n#> 2 max f2 0.598690 1.000000 200\n#> 3 max f0point5 0.598690 1.000000 200\n#> 4 max accuracy 0.598690 1.000000 200\n#> 5 max precision 0.998192 1.000000 0\n#> 6 max recall 0.598690 1.000000 200\n#> 7 max specificity 0.998192 1.000000 0\n#> 8 max absolute_mcc 0.598690 1.000000 200\n#> 9 max min_per_class_accuracy 0.598690 1.000000 200\n#> 10 max mean_per_class_accuracy 0.598690 1.000000 200\n#> 11 max tns 0.998192 434.000000 0\n#> 12 max fns 0.998192 349.000000 0\n#> 13 max fps 0.000831 434.000000 399\n#> 14 max tps 0.598690 351.000000 200\n#> 15 max tnr 0.998192 1.000000 0\n#> 16 max fnr 0.998192 0.994302 0\n#> 17 max fpr 0.000831 1.000000 399\n#> 18 max tpr 0.598690 1.000000 200\n#> \n#> Gains/Lift Table: Extract with `h2o.gainsLift(, )` or `h2o.gainsLift(, valid=, xval=)`\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(boost_tree_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(boost_tree_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.0496 0.950 \n#> 2 0.905 0.0953 \n#> 3 0.0738 0.926 \n#> 4 0.997 0.00273\n#> 5 0.979 0.0206 \n#> 6 0.878 0.122\n```\n:::\n\n\n## `lightgbm` \n\nThis engine requires the bonsai extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(bonsai)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nboost_tree_spec <- boost_tree() |>\n # We need to set the mode since this engine works with multiple modes\n set_mode(\"classification\") |>\n set_engine(\"lightgbm\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(906)\nboost_tree_fit <- boost_tree_spec |> fit(class ~ ., data = bin_train)\nboost_tree_fit\n#> parsnip model object\n#> \n#> LightGBM Model (100 trees)\n#> Objective: binary\n#> Fitted to dataset with 2 columns\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(boost_tree_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(boost_tree_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.147 0.853 \n#> 2 0.930 0.0699\n#> 3 0.237 0.763 \n#> 4 0.990 0.0101\n#> 5 0.929 0.0714\n#> 6 0.956 0.0445\n```\n:::\n\n\n## `spark` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nboost_tree_spec <- boost_tree() |> \n set_mode(\"classification\") |> \n set_engine(\"spark\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(285)\nboost_tree_fit <- boost_tree_spec |> fit(Class ~ ., data = tbl_bin$training)\nboost_tree_fit\n#> parsnip model object\n#> \n#> Formula: Class ~ .\n#> \n#> GBTClassificationModel: uid = gradient_boosted_trees__0d66c197_daaa_47eb_ba06_62029801a638, numTrees=20, numClasses=2, numFeatures=2\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(boost_tree_fit, type = \"class\", new_data = tbl_bin$test)\n#> # Source: SQL [?? x 1]\n#> # Database: spark_connection\n#> pred_class\n#> \n#> 1 Class2 \n#> 2 Class2 \n#> 3 Class1 \n#> 4 Class2 \n#> 5 Class2 \n#> 6 Class1 \n#> 7 Class2\npredict(boost_tree_fit, type = \"prob\", new_data = tbl_bin$test)\n#> # Source: SQL [?? x 2]\n#> # Database: spark_connection\n#> pred_Class1 pred_Class2\n#> \n#> 1 0.307 0.693 \n#> 2 0.292 0.708 \n#> 3 0.856 0.144 \n#> 4 0.192 0.808 \n#> 5 0.332 0.668 \n#> 6 0.952 0.0476\n#> 7 0.0865 0.914\n```\n:::\n\n\n:::\n\n## C5 Rules (`C5_rules()`) \n\n:::{.panel-tabset}\n\n## `C5.0` \n\nThis engine requires the rules extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(rules)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# This engine works with a single mode so no need to set that\n# and C5.0 is the default engine so there is no need to set that either.\nC5_rules_spec <- C5_rules()\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(93)\nC5_rules_fit <- C5_rules_spec |> fit(class ~ ., data = bin_train)\nC5_rules_fit\n#> parsnip model object\n#> \n#> \n#> Call:\n#> C5.0.default(x = x, y = y, trials = trials, rules = TRUE, control\n#> = C50::C5.0Control(minCases = minCases, seed = sample.int(10^5,\n#> 1), earlyStopping = FALSE))\n#> \n#> Rule-Based Model\n#> Number of samples: 785 \n#> Number of predictors: 2 \n#> \n#> Number of Rules: 4 \n#> \n#> Non-standard options: attempt to group attributes\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(C5_rules_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class1 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(C5_rules_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 1 0\n#> 2 1 0\n#> 3 0 1\n#> 4 1 0\n#> 5 1 0\n#> 6 1 0\n```\n:::\n\n\n:::\n\n## Decision Tree (`decision_tree()`) \n\n:::{.panel-tabset}\n\n## `rpart` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndecision_tree_spec <- decision_tree() |>\n # We need to set the mode since this engine works with multiple modes\n # and rpart is the default engine so there is no need to set that either.\n set_mode(\"classification\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndecision_tree_fit <- decision_tree_spec |> fit(class ~ ., data = bin_train)\ndecision_tree_fit\n#> parsnip model object\n#> \n#> n= 785 \n#> \n#> node), split, n, loss, yval, (yprob)\n#> * denotes terminal node\n#> \n#> 1) root 785 351 Class1 (0.5528662 0.4471338) \n#> 2) B< -0.06526451 399 61 Class1 (0.8471178 0.1528822) *\n#> 3) B>=-0.06526451 386 96 Class2 (0.2487047 0.7512953) \n#> 6) B< 0.7339337 194 72 Class2 (0.3711340 0.6288660) \n#> 12) A>=0.6073948 49 13 Class1 (0.7346939 0.2653061) *\n#> 13) A< 0.6073948 145 36 Class2 (0.2482759 0.7517241) *\n#> 7) B>=0.7339337 192 24 Class2 (0.1250000 0.8750000) *\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(decision_tree_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class1 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(decision_tree_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.735 0.265\n#> 2 0.847 0.153\n#> 3 0.248 0.752\n#> 4 0.847 0.153\n#> 5 0.847 0.153\n#> 6 0.847 0.153\n```\n:::\n\n\n## `C5.0` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndecision_tree_spec <- decision_tree() |> \n set_mode(\"classification\") |> \n set_engine(\"C5.0\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndecision_tree_fit <- decision_tree_spec |> fit(class ~ ., data = bin_train)\ndecision_tree_fit\n#> parsnip model object\n#> \n#> \n#> Call:\n#> C5.0.default(x = x, y = y, trials = 1, control = C50::C5.0Control(minCases =\n#> 2, sample = 0))\n#> \n#> Classification Tree\n#> Number of samples: 785 \n#> Number of predictors: 2 \n#> \n#> Tree size: 4 \n#> \n#> Non-standard options: attempt to group attributes\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(decision_tree_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class1 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(decision_tree_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.732 0.268\n#> 2 0.846 0.154\n#> 3 0.236 0.764\n#> 4 0.846 0.154\n#> 5 0.846 0.154\n#> 6 0.846 0.154\n```\n:::\n\n\n## `partykit` \n\nThis engine requires the bonsai extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(bonsai)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndecision_tree_spec <- decision_tree() |>\n # We need to set the mode since this engine works with multiple modes\n set_mode(\"classification\") |>\n set_engine(\"partykit\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndecision_tree_fit <- decision_tree_spec |> fit(class ~ ., data = bin_train)\ndecision_tree_fit\n#> parsnip model object\n#> \n#> \n#> Model formula:\n#> class ~ A + B\n#> \n#> Fitted party:\n#> [1] root\n#> | [2] B <= -0.06906\n#> | | [3] B <= -0.50486: Class1 (n = 291, err = 8.2%)\n#> | | [4] B > -0.50486\n#> | | | [5] A <= -0.07243: Class1 (n = 77, err = 45.5%)\n#> | | | [6] A > -0.07243: Class1 (n = 31, err = 6.5%)\n#> | [7] B > -0.06906\n#> | | [8] B <= 0.72938\n#> | | | [9] A <= 0.60196: Class2 (n = 145, err = 24.8%)\n#> | | | [10] A > 0.60196\n#> | | | | [11] B <= 0.44701: Class1 (n = 23, err = 4.3%)\n#> | | | | [12] B > 0.44701: Class1 (n = 26, err = 46.2%)\n#> | | [13] B > 0.72938: Class2 (n = 192, err = 12.5%)\n#> \n#> Number of inner nodes: 6\n#> Number of terminal nodes: 7\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(decision_tree_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class1 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(decision_tree_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.538 0.462 \n#> 2 0.935 0.0645\n#> 3 0.248 0.752 \n#> 4 0.918 0.0825\n#> 5 0.918 0.0825\n#> 6 0.935 0.0645\n```\n:::\n\n\n\n## `spark` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndecision_tree_spec <- decision_tree() |>\n set_mode(\"classification\") |> \n set_engine(\"spark\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndecision_tree_fit <- decision_tree_spec |> fit(Class ~ ., data = tbl_bin$training)\ndecision_tree_fit\n#> parsnip model object\n#> \n#> Formula: Class ~ .\n#> \n#> DecisionTreeClassificationModel: uid=decision_tree_classifier__1e1401b8_a95f_48a9_8969_2fd48eb813d7, depth=5, numNodes=43, numClasses=2, numFeatures=2\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(decision_tree_fit, type = \"class\", new_data = tbl_bin$test)\n#> # Source: SQL [?? x 1]\n#> # Database: spark_connection\n#> pred_class\n#> \n#> 1 Class2 \n#> 2 Class2 \n#> 3 Class1 \n#> 4 Class2 \n#> 5 Class2 \n#> 6 Class1 \n#> 7 Class2\npredict(decision_tree_fit, type = \"prob\", new_data = tbl_bin$test)\n#> # Source: SQL [?? x 2]\n#> # Database: spark_connection\n#> pred_Class1 pred_Class2\n#> \n#> 1 0.260 0.740 \n#> 2 0.260 0.740 \n#> 3 0.860 0.140 \n#> 4 0.260 0.740 \n#> 5 0.260 0.740 \n#> 6 0.923 0.0769\n#> 7 0.0709 0.929\n```\n:::\n\n\n:::\n\n## Flexible Discriminant Analysis (`discrim_flexible()`) \n\n:::{.panel-tabset}\n\n## `earth` \n\nThis engine requires the discrim extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(discrim)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# This engine works with a single mode so no need to set that\n# and earth is the default engine so there is no need to set that either.\ndiscrim_flexible_spec <- discrim_flexible()\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndiscrim_flexible_fit <- discrim_flexible_spec |> fit(class ~ ., data = bin_train)\ndiscrim_flexible_fit\n#> parsnip model object\n#> \n#> Call:\n#> mda::fda(formula = class ~ ., data = data, method = earth::earth)\n#> \n#> Dimension: 1 \n#> \n#> Percent Between-Group Variance Explained:\n#> v1 \n#> 100 \n#> \n#> Training Misclassification Error: 0.1707 ( N = 785 )\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(discrim_flexible_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(discrim_flexible_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.339 0.661 \n#> 2 0.848 0.152 \n#> 3 0.342 0.658 \n#> 4 0.964 0.0360\n#> 5 0.964 0.0360\n#> 6 0.875 0.125\n```\n:::\n\n\n:::\n\n## Linear Discriminant Analysis (`discrim_linear()`) \n\n:::{.panel-tabset}\n\n## `MASS` \n\nThis engine requires the discrim extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(discrim)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# This engine works with a single mode so no need to set that\n# and MASS is the default engine so there is no need to set that either.\ndiscrim_linear_spec <- discrim_linear()\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndiscrim_linear_fit <- discrim_linear_spec |> fit(class ~ ., data = bin_train)\ndiscrim_linear_fit\n#> parsnip model object\n#> \n#> Call:\n#> lda(class ~ ., data = data)\n#> \n#> Prior probabilities of groups:\n#> Class1 Class2 \n#> 0.5528662 0.4471338 \n#> \n#> Group means:\n#> A B\n#> Class1 -0.2982900 -0.5573140\n#> Class2 0.3688258 0.6891006\n#> \n#> Coefficients of linear discriminants:\n#> LD1\n#> A -0.6068479\n#> B 1.7079953\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(discrim_linear_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class1 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(discrim_linear_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.369 0.631 \n#> 2 0.868 0.132 \n#> 3 0.541 0.459 \n#> 4 0.984 0.0158\n#> 5 0.928 0.0718\n#> 6 0.854 0.146\n```\n:::\n\n\n## `mda` \n\nThis engine requires the discrim extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(discrim)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndiscrim_linear_spec <- discrim_linear() |> \n # This engine works with a single mode so no need to set that\n set_engine(\"mda\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndiscrim_linear_fit <- discrim_linear_spec |> fit(class ~ ., data = bin_train)\ndiscrim_linear_fit\n#> parsnip model object\n#> \n#> Call:\n#> mda::fda(formula = class ~ ., data = data, method = mda::gen.ridge, \n#> keep.fitted = FALSE)\n#> \n#> Dimension: 1 \n#> \n#> Percent Between-Group Variance Explained:\n#> v1 \n#> 100 \n#> \n#> Degrees of Freedom (per dimension): 1.99423 \n#> \n#> Training Misclassification Error: 0.17707 ( N = 785 )\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(discrim_linear_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class1 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(discrim_linear_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.368 0.632 \n#> 2 0.867 0.133 \n#> 3 0.542 0.458 \n#> 4 0.984 0.0158\n#> 5 0.928 0.0718\n#> 6 0.853 0.147\n```\n:::\n\n\n## `sda` \n\nThis engine requires the discrim extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(discrim)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndiscrim_linear_spec <- discrim_linear() |> \n # This engine works with a single mode so no need to set that\n set_engine(\"sda\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndiscrim_linear_fit <- discrim_linear_spec |> fit(class ~ ., data = bin_train)\ndiscrim_linear_fit\n#> parsnip model object\n#> \n#> $regularization\n#> lambda lambda.var lambda.freqs \n#> 0.003136201 0.067551534 0.112819609 \n#> \n#> $freqs\n#> Class1 Class2 \n#> 0.5469019 0.4530981 \n#> \n#> $alpha\n#> Class1 Class2 \n#> -0.8934125 -1.2349286 \n#> \n#> $beta\n#> A B\n#> Class1 0.4565325 -1.298858\n#> Class2 -0.5510473 1.567757\n#> attr(,\"class\")\n#> [1] \"shrinkage\"\n#> \n#> attr(,\"class\")\n#> [1] \"sda\"\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(discrim_linear_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class1 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(discrim_linear_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.366 0.634 \n#> 2 0.860 0.140 \n#> 3 0.536 0.464 \n#> 4 0.982 0.0176\n#> 5 0.923 0.0768\n#> 6 0.845 0.155\n```\n:::\n\n\n## `sparsediscrim` \n\nThis engine requires the discrim extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(discrim)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndiscrim_linear_spec <- discrim_linear() |> \n # This engine works with a single mode so no need to set that\n set_engine(\"sparsediscrim\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndiscrim_linear_fit <- discrim_linear_spec |> fit(class ~ ., data = bin_train)\ndiscrim_linear_fit\n#> parsnip model object\n#> \n#> Diagonal LDA\n#> \n#> Sample Size: 785 \n#> Number of Features: 2 \n#> \n#> Classes and Prior Probabilities:\n#> Class1 (55.29%), Class2 (44.71%)\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(discrim_linear_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(discrim_linear_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.182 0.818 \n#> 2 0.755 0.245 \n#> 3 0.552 0.448 \n#> 4 0.996 0.00372\n#> 5 0.973 0.0274 \n#> 6 0.629 0.371\n```\n:::\n\n\n:::\n\n## Quandratic Discriminant Analysis (`discrim_quad()`) \n\n:::{.panel-tabset}\n\n## `MASS` \n\nThis engine requires the discrim extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(discrim)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndiscrim_quad_spec <- discrim_quad()\n # This engine works with a single mode so no need to set that\n # and MASS is the default engine so there is no need to set that either.\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndiscrim_quad_fit <- discrim_quad_spec |> fit(class ~ ., data = bin_train)\ndiscrim_quad_fit\n#> parsnip model object\n#> \n#> Call:\n#> qda(class ~ ., data = data)\n#> \n#> Prior probabilities of groups:\n#> Class1 Class2 \n#> 0.5528662 0.4471338 \n#> \n#> Group means:\n#> A B\n#> Class1 -0.2982900 -0.5573140\n#> Class2 0.3688258 0.6891006\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(discrim_quad_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class1 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(discrim_quad_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.340 0.660 \n#> 2 0.884 0.116 \n#> 3 0.500 0.500 \n#> 4 0.965 0.0349\n#> 5 0.895 0.105 \n#> 6 0.895 0.105\n```\n:::\n\n\n## `sparsediscrim` \n\nThis engine requires the discrim extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(discrim)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndiscrim_quad_spec <- discrim_quad() |> \n # This engine works with a single mode so no need to set that\n set_engine(\"sparsediscrim\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndiscrim_quad_fit <- discrim_quad_spec |> fit(class ~ ., data = bin_train)\ndiscrim_quad_fit\n#> parsnip model object\n#> \n#> Diagonal QDA\n#> \n#> Sample Size: 785 \n#> Number of Features: 2 \n#> \n#> Classes and Prior Probabilities:\n#> Class1 (55.29%), Class2 (44.71%)\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(discrim_quad_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(discrim_quad_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.180 0.820 \n#> 2 0.750 0.250 \n#> 3 0.556 0.444 \n#> 4 0.994 0.00634\n#> 5 0.967 0.0328 \n#> 6 0.630 0.370\n```\n:::\n\n\n:::\n\n## Regularized Discriminant Analysis (`discrim_regularized()`) \n\n:::{.panel-tabset}\n\n## `klaR` \n\nThis engine requires the discrim extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(discrim)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# This engine works with a single mode so no need to set that\n# and klaR is the default engine so there is no need to set that either.\ndiscrim_regularized_spec <- discrim_regularized()\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndiscrim_regularized_fit <- discrim_regularized_spec |> fit(class ~ ., data = bin_train)\ndiscrim_regularized_fit\n#> parsnip model object\n#> \n#> Call: \n#> rda(formula = class ~ ., data = data)\n#> \n#> Regularization parameters: \n#> gamma lambda \n#> 3.348721e-05 3.288193e-04 \n#> \n#> Prior probabilities of groups: \n#> Class1 Class2 \n#> 0.5528662 0.4471338 \n#> \n#> Misclassification rate: \n#> apparent: 17.707 %\n#> cross-validated: 17.566 %\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(discrim_regularized_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class1 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(discrim_regularized_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.340 0.660 \n#> 2 0.884 0.116 \n#> 3 0.501 0.499 \n#> 4 0.965 0.0349\n#> 5 0.895 0.105 \n#> 6 0.895 0.105\n```\n:::\n\n\n:::\n\n## Generalized Additive Models (`gen_additive_mod()`) \n\n:::{.panel-tabset}\n\n## `mgcv` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ngen_additive_mod_spec <- gen_additive_mod() |>\n # We need to set the mode since this engine works with multiple modes\n # and mgcv is the default engine so there is no need to set that either.\n set_mode(\"classification\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ngen_additive_mod_fit <- \n gen_additive_mod_spec |> \n fit(class ~ s(A) + s(B), data = bin_train)\ngen_additive_mod_fit\n#> parsnip model object\n#> \n#> \n#> Family: binomial \n#> Link function: logit \n#> \n#> Formula:\n#> class ~ s(A) + s(B)\n#> \n#> Estimated degrees of freedom:\n#> 2.76 4.22 total = 7.98 \n#> \n#> UBRE score: -0.153537\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(gen_additive_mod_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(gen_additive_mod_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.400 0.600 \n#> 2 0.826 0.174 \n#> 3 0.454 0.546 \n#> 4 0.975 0.0250\n#> 5 0.929 0.0711\n#> 6 0.829 0.171\npredict(gen_additive_mod_fit, type = \"conf_int\", new_data = bin_test)\n#> # A tibble: 6 × 4\n#> .pred_lower_Class1 .pred_upper_Class1 .pred_lower_Class2 .pred_upper_Class2\n#> \n#> 1 0.304 0.504 0.496 0.696\n#> 2 0.739 0.889 0.111 0.261\n#> 3 0.364 0.546 0.454 0.636\n#> 4 0.846 0.996 0.00358 0.154\n#> 5 0.881 0.958 0.0416 0.119\n#> 6 0.735 0.894 0.106 0.265\n```\n:::\n\n\n:::\n\n## Logistic Regression (`logistic_reg()`) \n\n:::{.panel-tabset}\n\n## `glm` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlogistic_reg_spec <- logistic_reg()\n # This engine works with a single mode so no need to set that\n # and glm is the default engine so there is no need to set that either.\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlogistic_reg_fit <- logistic_reg_spec |> fit(class ~ ., data = bin_train)\nlogistic_reg_fit\n#> parsnip model object\n#> \n#> \n#> Call: stats::glm(formula = class ~ ., family = stats::binomial, data = data)\n#> \n#> Coefficients:\n#> (Intercept) A B \n#> -0.3563 -1.1250 2.8154 \n#> \n#> Degrees of Freedom: 784 Total (i.e. Null); 782 Residual\n#> Null Deviance:\t 1079 \n#> Residual Deviance: 666.9 \tAIC: 672.9\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(logistic_reg_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class1 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(logistic_reg_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.400 0.600 \n#> 2 0.862 0.138 \n#> 3 0.541 0.459 \n#> 4 0.977 0.0234\n#> 5 0.909 0.0905\n#> 6 0.853 0.147\npredict(logistic_reg_fit, type = \"conf_int\", new_data = bin_test)\n#> # A tibble: 6 × 4\n#> .pred_lower_Class1 .pred_upper_Class1 .pred_lower_Class2 .pred_upper_Class2\n#> \n#> 1 0.339 0.465 0.535 0.661 \n#> 2 0.816 0.897 0.103 0.184 \n#> 3 0.493 0.588 0.412 0.507 \n#> 4 0.960 0.986 0.0137 0.0395\n#> 5 0.875 0.935 0.0647 0.125 \n#> 6 0.800 0.894 0.106 0.200\n```\n:::\n\n\n## `brulee` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlogistic_reg_spec <- logistic_reg() |> \n # This engine works with a single mode so no need to set that\n set_engine(\"brulee\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(466)\nlogistic_reg_fit <- logistic_reg_spec |> fit(class ~ ., data = bin_train)\nlogistic_reg_fit\n#> parsnip model object\n#> \n#> Logistic regression\n#> \n#> 785 samples, 2 features, 2 classes \n#> class weights Class1=1, Class2=1 \n#> weight decay: 0.001 \n#> batch size: 707 \n#> validation loss after 1 epoch: 0.283\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(logistic_reg_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class1 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(logistic_reg_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.412 0.588 \n#> 2 0.854 0.146 \n#> 3 0.537 0.463 \n#> 4 0.971 0.0294\n#> 5 0.896 0.104 \n#> 6 0.848 0.152\n```\n:::\n\n\n## `gee` \n\nThis engine requires the multilevelmod extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(multilevelmod)\n\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlogistic_reg_spec <- logistic_reg() |> \n # This engine works with a single mode so no need to set that\n set_engine(\"gee\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlogistic_reg_fit <- \n logistic_reg_spec |> \n fit(outcome ~ treatment * visit + id_var(patientID), data = cls_group_train)\n#> Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27\n#> running glm to get initial regression estimate\nlogistic_reg_fit\n#> parsnip model object\n#> \n#> \n#> GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA\n#> gee S-function, version 4.13 modified 98/01/27 (1998) \n#> \n#> Model:\n#> Link: Logit \n#> Variance to Mean Relation: Binomial \n#> Correlation Structure: Independent \n#> \n#> Call:\n#> gee::gee(formula = outcome ~ treatment + visit, id = data$patientID, \n#> data = data, family = binomial)\n#> \n#> Number of observations : 1433 \n#> \n#> Maximum cluster size : 7 \n#> \n#> \n#> Coefficients:\n#> (Intercept) treatmentterbinafine visit \n#> -0.06853546 -0.25700680 -0.35646522 \n#> \n#> Estimated Scale Parameter: 0.9903994\n#> Number of Iterations: 1\n#> \n#> Working Correlation[1:4,1:4]\n#> [,1] [,2] [,3] [,4]\n#> [1,] 1 0 0 0\n#> [2,] 0 1 0 0\n#> [3,] 0 0 1 0\n#> [4,] 0 0 0 1\n#> \n#> \n#> Returned Error Value:\n#> [1] 0\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(logistic_reg_fit, type = \"class\", new_data = cls_group_test)\n#> # A tibble: 475 × 1\n#> .pred_class \n#> \n#> 1 none or mild\n#> 2 none or mild\n#> 3 none or mild\n#> 4 none or mild\n#> 5 none or mild\n#> 6 none or mild\n#> 7 none or mild\n#> 8 none or mild\n#> 9 none or mild\n#> 10 none or mild\n#> # ℹ 465 more rows\npredict(logistic_reg_fit, type = \"prob\", new_data = cls_group_test)\n#> # A tibble: 475 × 2\n#> `.pred_none or mild` `.pred_moderate or severe`\n#> \n#> 1 0.664 0.336 \n#> 2 0.739 0.261 \n#> 3 0.801 0.199 \n#> 4 0.852 0.148 \n#> 5 0.892 0.108 \n#> 6 0.922 0.0784\n#> 7 0.944 0.0562\n#> 8 0.605 0.395 \n#> 9 0.686 0.314 \n#> 10 0.757 0.243 \n#> # ℹ 465 more rows\n```\n:::\n\n\n## `glmer` \n\nThis engine requires the multilevelmod extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(multilevelmod)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlogistic_reg_spec <- logistic_reg() |> \n # This engine works with a single mode so no need to set that\n set_engine(\"glmer\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlogistic_reg_fit <- \n logistic_reg_spec |> \n fit(outcome ~ treatment * visit + (1 | patientID), data = cls_group_train)\nlogistic_reg_fit\n#> parsnip model object\n#> \n#> Generalized linear mixed model fit by maximum likelihood (Laplace\n#> Approximation) [glmerMod]\n#> Family: binomial ( logit )\n#> Formula: outcome ~ treatment * visit + (1 | patientID)\n#> Data: data\n#> AIC BIC logLik -2*log(L) df.resid \n#> 863.8271 890.1647 -426.9135 853.8271 1428 \n#> Random effects:\n#> Groups Name Std.Dev.\n#> patientID (Intercept) 8.35 \n#> Number of obs: 1433, groups: patientID, 219\n#> Fixed Effects:\n#> (Intercept) treatmentterbinafine \n#> -4.57420 -0.51193 \n#> visit treatmentterbinafine:visit \n#> -0.98725 -0.00112\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(logistic_reg_fit, type = \"class\", new_data = cls_group_test)\n#> # A tibble: 475 × 1\n#> .pred_class \n#> \n#> 1 none or mild\n#> 2 none or mild\n#> 3 none or mild\n#> 4 none or mild\n#> 5 none or mild\n#> 6 none or mild\n#> 7 none or mild\n#> 8 none or mild\n#> 9 none or mild\n#> 10 none or mild\n#> # ℹ 465 more rows\npredict(logistic_reg_fit, type = \"prob\", new_data = cls_group_test)\n#> # A tibble: 475 × 2\n#> `.pred_none or mild` `.pred_moderate or severe`\n#> \n#> 1 0.998 0.00230 \n#> 2 0.999 0.000856 \n#> 3 1.000 0.000319 \n#> 4 1.000 0.000119 \n#> 5 1.000 0.0000441 \n#> 6 1.000 0.0000164 \n#> 7 1.000 0.00000612\n#> 8 0.996 0.00383 \n#> 9 0.999 0.00143 \n#> 10 0.999 0.000533 \n#> # ℹ 465 more rows\n```\n:::\n\n\n## `glmnet` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlogistic_reg_spec <- logistic_reg(penalty = 0.01) |> \n # This engine works with a single mode so no need to set that\n set_engine(\"glmnet\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlogistic_reg_fit <- logistic_reg_spec |> fit(class ~ ., data = bin_train)\nlogistic_reg_fit\n#> parsnip model object\n#> \n#> \n#> Call: glmnet::glmnet(x = maybe_matrix(x), y = y, family = \"binomial\") \n#> \n#> Df %Dev Lambda\n#> 1 0 0.00 0.308300\n#> 2 1 4.75 0.280900\n#> 3 1 8.73 0.256000\n#> 4 1 12.10 0.233200\n#> 5 1 14.99 0.212500\n#> 6 1 17.46 0.193600\n#> 7 1 19.60 0.176400\n#> 8 1 21.45 0.160800\n#> 9 1 23.05 0.146500\n#> 10 1 24.44 0.133500\n#> 11 1 25.65 0.121600\n#> 12 1 26.70 0.110800\n#> 13 1 27.61 0.101000\n#> 14 1 28.40 0.091990\n#> 15 1 29.08 0.083820\n#> 16 1 29.68 0.076370\n#> 17 1 30.19 0.069590\n#> 18 1 30.63 0.063410\n#> 19 1 31.00 0.057770\n#> 20 1 31.33 0.052640\n#> 21 1 31.61 0.047960\n#> 22 1 31.85 0.043700\n#> 23 1 32.05 0.039820\n#> 24 2 32.62 0.036280\n#> 25 2 33.41 0.033060\n#> 26 2 34.10 0.030120\n#> 27 2 34.68 0.027450\n#> 28 2 35.19 0.025010\n#> 29 2 35.63 0.022790\n#> 30 2 36.01 0.020760\n#> 31 2 36.33 0.018920\n#> 32 2 36.62 0.017240\n#> 33 2 36.86 0.015710\n#> 34 2 37.06 0.014310\n#> 35 2 37.24 0.013040\n#> 36 2 37.39 0.011880\n#> 37 2 37.52 0.010830\n#> 38 2 37.63 0.009864\n#> 39 2 37.72 0.008988\n#> 40 2 37.80 0.008189\n#> 41 2 37.86 0.007462\n#> 42 2 37.92 0.006799\n#> 43 2 37.97 0.006195\n#> 44 2 38.01 0.005644\n#> 45 2 38.04 0.005143\n#> 46 2 38.07 0.004686\n#> 47 2 38.10 0.004270\n#> 48 2 38.12 0.003891\n#> 49 2 38.13 0.003545\n#> 50 2 38.15 0.003230\n#> 51 2 38.16 0.002943\n#> 52 2 38.17 0.002682\n#> 53 2 38.18 0.002443\n#> 54 2 38.18 0.002226\n#> 55 2 38.19 0.002029\n#> 56 2 38.19 0.001848\n#> 57 2 38.20 0.001684\n#> 58 2 38.20 0.001534\n#> 59 2 38.20 0.001398\n#> 60 2 38.21 0.001274\n#> 61 2 38.21 0.001161\n#> 62 2 38.21 0.001058\n#> 63 2 38.21 0.000964\n#> 64 2 38.21 0.000878\n#> 65 2 38.21 0.000800\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(logistic_reg_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class1 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(logistic_reg_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.383 0.617 \n#> 2 0.816 0.184 \n#> 3 0.537 0.463 \n#> 4 0.969 0.0313\n#> 5 0.894 0.106 \n#> 6 0.797 0.203\n```\n:::\n\n\n## `h2o` \n\nThis engine requires the agua extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(agua)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlogistic_reg_spec <- logistic_reg() |> \n # This engine works with a single mode so no need to set that\n set_engine(\"h2o\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlogistic_reg_fit <- logistic_reg_spec |> fit(class ~ ., data = bin_train)\nlogistic_reg_fit\n#> parsnip model object\n#> \n#> Model Details:\n#> ==============\n#> \n#> H2OBinomialModel: glm\n#> Model ID: GLM_model_R_1763571327438_5619 \n#> GLM Model: summary\n#> family link regularization\n#> 1 binomial logit Elastic Net (alpha = 0.5, lambda = 6.162E-4 )\n#> number_of_predictors_total number_of_active_predictors number_of_iterations\n#> 1 2 2 4\n#> training_frame\n#> 1 object_zkelygexok\n#> \n#> Coefficients: glm coefficients\n#> names coefficients standardized_coefficients\n#> 1 Intercept -0.350788 -0.350788\n#> 2 A -1.084233 -1.084233\n#> 3 B 2.759366 2.759366\n#> \n#> H2OBinomialMetrics: glm\n#> ** Reported on training data. **\n#> \n#> MSE: 0.130451\n#> RMSE: 0.3611799\n#> LogLoss: 0.4248206\n#> Mean Per-Class Error: 0.1722728\n#> AUC: 0.8889644\n#> AUCPR: 0.8520865\n#> Gini: 0.7779288\n#> R^2: 0.4722968\n#> Residual Deviance: 666.9684\n#> AIC: 672.9684\n#> \n#> Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:\n#> Class1 Class2 Error Rate\n#> Class1 350 84 0.193548 =84/434\n#> Class2 53 298 0.150997 =53/351\n#> Totals 403 382 0.174522 =137/785\n#> \n#> Maximum Metrics: Maximum metrics at their respective thresholds\n#> metric threshold value idx\n#> 1 max f1 0.411045 0.813097 213\n#> 2 max f2 0.229916 0.868991 279\n#> 3 max f0point5 0.565922 0.816135 166\n#> 4 max accuracy 0.503565 0.826752 185\n#> 5 max precision 0.997356 1.000000 0\n#> 6 max recall 0.009705 1.000000 395\n#> 7 max specificity 0.997356 1.000000 0\n#> 8 max absolute_mcc 0.411045 0.652014 213\n#> 9 max min_per_class_accuracy 0.454298 0.822581 201\n#> 10 max mean_per_class_accuracy 0.411045 0.827727 213\n#> 11 max tns 0.997356 434.000000 0\n#> 12 max fns 0.997356 349.000000 0\n#> 13 max fps 0.001723 434.000000 399\n#> 14 max tps 0.009705 351.000000 395\n#> 15 max tnr 0.997356 1.000000 0\n#> 16 max fnr 0.997356 0.994302 0\n#> 17 max fpr 0.001723 1.000000 399\n#> 18 max tpr 0.009705 1.000000 395\n#> \n#> Gains/Lift Table: Extract with `h2o.gainsLift(, )` or `h2o.gainsLift(, valid=, xval=)`\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(logistic_reg_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(logistic_reg_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.399 0.601 \n#> 2 0.857 0.143 \n#> 3 0.540 0.460 \n#> 4 0.976 0.0243\n#> 5 0.908 0.0925\n#> 6 0.848 0.152\n```\n:::\n\n\n## `keras` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlogistic_reg_spec <- logistic_reg() |> \n # This engine works with a single mode so no need to set that\n set_engine(\"keras\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(730)\nlogistic_reg_fit <- logistic_reg_spec |> fit(class ~ ., data = bin_train)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlogistic_reg_fit\n#> parsnip model object\n#> \n#> Model: \"sequential\"\n#> ________________________________________________________________________________\n#> Layer (type) Output Shape Param # \n#> ================================================================================\n#> dense (Dense) (None, 1) 3 \n#> dense_1 (Dense) (None, 2) 4 \n#> ================================================================================\n#> Total params: 7 (28.00 Byte)\n#> Trainable params: 7 (28.00 Byte)\n#> Non-trainable params: 0 (0.00 Byte)\n#> ________________________________________________________________________________\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(logistic_reg_fit, type = \"class\", new_data = bin_test)\n#> 1/1 - 0s - 91ms/epoch - 91ms/step\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class1 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class2\npredict(logistic_reg_fit, type = \"prob\", new_data = bin_test)\n#> 1/1 - 0s - 7ms/epoch - 7ms/step\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.212 0.788 \n#> 2 0.626 0.374 \n#> 3 0.579 0.421 \n#> 4 0.990 0.0103\n#> 5 0.953 0.0467\n#> 6 0.471 0.529\n```\n:::\n\n\n## `LiblineaR` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlogistic_reg_spec <- logistic_reg() |> \n # This engine works with a single mode so no need to set that\n set_engine(\"LiblineaR\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlogistic_reg_fit <- logistic_reg_spec |> fit(class ~ ., data = bin_train)\nlogistic_reg_fit\n#> parsnip model object\n#> \n#> $TypeDetail\n#> [1] \"L2-regularized logistic regression primal (L2R_LR)\"\n#> \n#> $Type\n#> [1] 0\n#> \n#> $W\n#> A B Bias\n#> [1,] 1.014233 -2.65166 0.3363362\n#> \n#> $Bias\n#> [1] 1\n#> \n#> $ClassNames\n#> [1] Class1 Class2\n#> Levels: Class1 Class2\n#> \n#> $NbClass\n#> [1] 2\n#> \n#> attr(,\"class\")\n#> [1] \"LiblineaR\"\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(logistic_reg_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class1 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(logistic_reg_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.397 0.603 \n#> 2 0.847 0.153 \n#> 3 0.539 0.461 \n#> 4 0.973 0.0267\n#> 5 0.903 0.0974\n#> 6 0.837 0.163\n```\n:::\n\n\n## `stan` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlogistic_reg_spec <- logistic_reg() |> \n # This engine works with a single mode so no need to set that\n set_engine(\"stan\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(96)\nlogistic_reg_fit <- \n logistic_reg_spec |> \n fit(outcome ~ treatment * visit, data = cls_group_train)\nlogistic_reg_fit |> print(digits = 3)\n#> parsnip model object\n#> \n#> stan_glm\n#> family: binomial [logit]\n#> formula: outcome ~ treatment * visit\n#> observations: 1433\n#> predictors: 4\n#> ------\n#> Median MAD_SD\n#> (Intercept) -0.137 0.187\n#> treatmentterbinafine -0.108 0.264\n#> visit -0.335 0.050\n#> treatmentterbinafine:visit -0.048 0.073\n#> \n#> ------\n#> * For help interpreting the printed output see ?print.stanreg\n#> * For info on the priors used see ?prior_summary.stanreg\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(logistic_reg_fit, type = \"class\", new_data = cls_group_test)\n#> # A tibble: 475 × 1\n#> .pred_class \n#> \n#> 1 none or mild\n#> 2 none or mild\n#> 3 none or mild\n#> 4 none or mild\n#> 5 none or mild\n#> 6 none or mild\n#> 7 none or mild\n#> 8 none or mild\n#> 9 none or mild\n#> 10 none or mild\n#> # ℹ 465 more rows\npredict(logistic_reg_fit, type = \"prob\", new_data = cls_group_test)\n#> # A tibble: 475 × 2\n#> `.pred_none or mild` `.pred_moderate or severe`\n#> \n#> 1 0.652 0.348 \n#> 2 0.734 0.266 \n#> 3 0.802 0.198 \n#> 4 0.856 0.144 \n#> 5 0.898 0.102 \n#> 6 0.928 0.0721\n#> 7 0.950 0.0502\n#> 8 0.617 0.383 \n#> 9 0.692 0.308 \n#> 10 0.759 0.241 \n#> # ℹ 465 more rows\npredict(logistic_reg_fit, type = \"conf_int\", new_data = cls_group_test)\n#> # A tibble: 475 × 4\n#> `.pred_lower_none or mild` `.pred_upper_none or mild` .pred_lower_moderate …¹\n#> \n#> 1 0.583 0.715 0.285 \n#> 2 0.689 0.776 0.224 \n#> 3 0.771 0.832 0.168 \n#> 4 0.827 0.883 0.117 \n#> 5 0.868 0.924 0.0761\n#> 6 0.899 0.952 0.0482\n#> 7 0.922 0.970 0.0302\n#> 8 0.547 0.683 0.317 \n#> 9 0.644 0.736 0.264 \n#> 10 0.723 0.791 0.209 \n#> # ℹ 465 more rows\n#> # ℹ abbreviated name: ¹`.pred_lower_moderate or severe`\n#> # ℹ 1 more variable: `.pred_upper_moderate or severe` \npredict(logistic_reg_fit, type = \"pred_int\", new_data = cls_group_test)\n#> # A tibble: 475 × 4\n#> `.pred_lower_none or mild` `.pred_upper_none or mild` .pred_lower_moderate …¹\n#> \n#> 1 0 1 0\n#> 2 0 1 0\n#> 3 0 1 0\n#> 4 0 1 0\n#> 5 0 1 0\n#> 6 0 1 0\n#> 7 0 1 0\n#> 8 0 1 0\n#> 9 0 1 0\n#> 10 0 1 0\n#> # ℹ 465 more rows\n#> # ℹ abbreviated name: ¹`.pred_lower_moderate or severe`\n#> # ℹ 1 more variable: `.pred_upper_moderate or severe` \n```\n:::\n\n\n## `stan_glmer` \n\nThis engine requires the multilevelmod extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(multilevelmod)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlogistic_reg_spec <- logistic_reg() |> \n # This engine works with a single mode so no need to set that\n set_engine(\"stan_glmer\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(484)\nlogistic_reg_fit <- \n logistic_reg_spec |> \n fit(outcome ~ treatment * visit + (1 | patientID), data = cls_group_train)\nlogistic_reg_fit |> print(digits = 3)\n#> parsnip model object\n#> \n#> stan_glmer\n#> family: binomial [logit]\n#> formula: outcome ~ treatment * visit + (1 | patientID)\n#> observations: 1433\n#> ------\n#> Median MAD_SD\n#> (Intercept) -0.628 0.585\n#> treatmentterbinafine -0.686 0.821\n#> visit -0.830 0.105\n#> treatmentterbinafine:visit -0.023 0.143\n#> \n#> Error terms:\n#> Groups Name Std.Dev.\n#> patientID (Intercept) 4.376 \n#> Num. levels: patientID 219 \n#> \n#> ------\n#> * For help interpreting the printed output see ?print.stanreg\n#> * For info on the priors used see ?prior_summary.stanreg\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(logistic_reg_fit, type = \"class\", new_data = cls_group_test)\n#> # A tibble: 475 × 1\n#> .pred_class \n#> \n#> 1 none or mild\n#> 2 none or mild\n#> 3 none or mild\n#> 4 none or mild\n#> 5 none or mild\n#> 6 none or mild\n#> 7 none or mild\n#> 8 none or mild\n#> 9 none or mild\n#> 10 none or mild\n#> # ℹ 465 more rows\npredict(logistic_reg_fit, type = \"prob\", new_data = cls_group_test)\n#> # A tibble: 475 × 2\n#> `.pred_none or mild` `.pred_moderate or severe`\n#> \n#> 1 0.671 0.329 \n#> 2 0.730 0.270 \n#> 3 0.796 0.204 \n#> 4 0.847 0.153 \n#> 5 0.882 0.118 \n#> 6 0.909 0.0908\n#> 7 0.934 0.0655\n#> 8 0.613 0.387 \n#> 9 0.681 0.319 \n#> 10 0.744 0.256 \n#> # ℹ 465 more rows\npredict(logistic_reg_fit, type = \"conf_int\", new_data = cls_group_test)\n#> # A tibble: 475 × 4\n#> `.pred_lower_none or mild` `.pred_upper_none or mild` .pred_lower_moderate …¹\n#> \n#> 1 0.00184 1.000 0.0000217 \n#> 2 0.00417 1.000 0.00000942 \n#> 3 0.00971 1.000 0.00000412 \n#> 4 0.0214 1.000 0.00000169 \n#> 5 0.0465 1.000 0.000000706\n#> 6 0.101 1.000 0.000000300\n#> 7 0.203 1.000 0.000000120\n#> 8 0.000923 1.000 0.0000440 \n#> 9 0.00196 1.000 0.0000175 \n#> 10 0.00447 1.000 0.00000724 \n#> # ℹ 465 more rows\n#> # ℹ abbreviated name: ¹`.pred_lower_moderate or severe`\n#> # ℹ 1 more variable: `.pred_upper_moderate or severe` \npredict(logistic_reg_fit, type = \"pred_int\", new_data = cls_group_test)\n#> # A tibble: 475 × 4\n#> `.pred_lower_none or mild` `.pred_upper_none or mild` .pred_lower_moderate …¹\n#> \n#> 1 0 1 0\n#> 2 0 1 0\n#> 3 0 1 0\n#> 4 0 1 0\n#> 5 0 1 0\n#> 6 0 1 0\n#> 7 0 1 0\n#> 8 0 1 0\n#> 9 0 1 0\n#> 10 0 1 0\n#> # ℹ 465 more rows\n#> # ℹ abbreviated name: ¹`.pred_lower_moderate or severe`\n#> # ℹ 1 more variable: `.pred_upper_moderate or severe` \n```\n:::\n\n\n## `spark` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlogistic_reg_spec <- logistic_reg() |> \n set_engine(\"spark\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlogistic_reg_fit <- logistic_reg_spec |> fit(Class ~ ., data = tbl_bin$training)\nlogistic_reg_fit\n#> parsnip model object\n#> \n#> Formula: Class ~ .\n#> \n#> Coefficients:\n#> (Intercept) A B \n#> -3.731170 -1.214355 3.794186\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(logistic_reg_fit, type = \"class\", new_data = tbl_bin$test)\n#> # Source: SQL [?? x 1]\n#> # Database: spark_connection\n#> pred_class\n#> \n#> 1 Class2 \n#> 2 Class2 \n#> 3 Class1 \n#> 4 Class2 \n#> 5 Class2 \n#> 6 Class1 \n#> 7 Class2\npredict(logistic_reg_fit, type = \"prob\", new_data = tbl_bin$test)\n#> # Source: SQL [?? x 2]\n#> # Database: spark_connection\n#> pred_Class1 pred_Class2\n#> \n#> 1 0.130 0.870\n#> 2 0.262 0.738\n#> 3 0.787 0.213\n#> 4 0.279 0.721\n#> 5 0.498 0.502\n#> 6 0.900 0.100\n#> 7 0.161 0.839\n```\n:::\n\n\n:::\n\n## Multivariate Adaptive Regression Splines (`mars()`) \n\n:::{.panel-tabset}\n\n## `earth` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nmars_spec <- mars() |>\n # We need to set the mode since this engine works with multiple modes\n # and earth is the default engine so there is no need to set that either.\n set_mode(\"classification\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nmars_fit <- mars_spec |> fit(class ~ ., data = bin_train)\nmars_fit\n#> parsnip model object\n#> \n#> GLM (family binomial, link logit):\n#> nulldev df dev df devratio AIC iters converged\n#> 1079.45 784 638.975 779 0.408 651 5 1\n#> \n#> Earth selected 6 of 13 terms, and 2 of 2 predictors\n#> Termination condition: Reached nk 21\n#> Importance: B, A\n#> Number of terms at each degree of interaction: 1 5 (additive model)\n#> Earth GCV 0.1342746 RSS 102.4723 GRSq 0.4582121 RSq 0.4719451\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(mars_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(mars_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.410 0.590 \n#> 2 0.794 0.206 \n#> 3 0.356 0.644 \n#> 4 0.927 0.0729\n#> 5 0.927 0.0729\n#> 6 0.836 0.164\n```\n:::\n\n\n:::\n\n## Neural Networks (`mlp()`) \n\n:::{.panel-tabset}\n\n## `nnet` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nmlp_spec <- mlp() |>\n # We need to set the mode since this engine works with multiple modes\n # and nnet is the default engine so there is no need to set that either.\n set_mode(\"classification\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(839)\nmlp_fit <- mlp_spec |> fit(class ~ ., data = bin_train)\nmlp_fit\n#> parsnip model object\n#> \n#> a 2-5-1 network with 21 weights\n#> inputs: A B \n#> output(s): class \n#> options were - entropy fitting\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(mlp_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(mlp_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.390 0.610\n#> 2 0.685 0.315\n#> 3 0.433 0.567\n#> 4 0.722 0.278\n#> 5 0.720 0.280\n#> 6 0.684 0.316\n```\n:::\n\n\n## `brulee` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nmlp_spec <- mlp() |>\n # We need to set the mode since this engine works with multiple modes\n set_mode(\"classification\") |>\n set_engine(\"brulee\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(38)\nmlp_fit <- mlp_spec |> fit(class ~ ., data = bin_train)\nmlp_fit\n#> parsnip model object\n#> \n#> Multilayer perceptron\n#> \n#> relu activation,\n#> 3 hidden units,\n#> 17 model parameters\n#> 785 samples, 2 features, 2 classes \n#> class weights Class1=1, Class2=1 \n#> weight decay: 0.001 \n#> dropout proportion: 0 \n#> batch size: 707 \n#> learn rate: 0.01 \n#> validation loss after 5 epochs: 0.427\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(mlp_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class1 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(mlp_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.387 0.613 \n#> 2 0.854 0.146 \n#> 3 0.540 0.460 \n#> 4 0.941 0.0589\n#> 5 0.882 0.118 \n#> 6 0.842 0.158\n```\n:::\n\n\n## `brulee_two_layer` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nmlp_spec <- mlp() |>\n # We need to set the mode since this engine works with multiple modes\n set_mode(\"classification\") |>\n set_engine(\"brulee_two_layer\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(336)\nmlp_fit <- mlp_spec |> fit(class ~ ., data = bin_train)\nmlp_fit\n#> parsnip model object\n#> \n#> Multilayer perceptron\n#> \n#> c(relu,relu) activation,\n#> c(3,3) hidden units,\n#> 29 model parameters\n#> 785 samples, 2 features, 2 classes \n#> class weights Class1=1, Class2=1 \n#> weight decay: 0.001 \n#> dropout proportion: 0 \n#> batch size: 707 \n#> learn rate: 0.01 \n#> validation loss after 17 epochs: 0.405\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(mlp_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(mlp_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.392 0.608 \n#> 2 0.835 0.165 \n#> 3 0.440 0.560 \n#> 4 0.938 0.0620\n#> 5 0.938 0.0620\n#> 6 0.848 0.152\n```\n:::\n\n\n## `h2o` \n\nThis engine requires the agua extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(agua)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nmlp_spec <- mlp() |>\n # We need to set the mode since this engine works with multiple modes\n set_mode(\"classification\") |>\n set_engine(\"h2o\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(306)\nmlp_fit <- mlp_spec |> fit(class ~ ., data = bin_train)\nmlp_fit\n#> parsnip model object\n#> \n#> Model Details:\n#> ==============\n#> \n#> H2OBinomialModel: deeplearning\n#> Model ID: DeepLearning_model_R_1763571327438_5621 \n#> Status of Neuron Layers: predicting .outcome, 2-class classification, bernoulli distribution, CrossEntropy loss, 1,002 weights/biases, 16.9 KB, 7,850 training samples, mini-batch size 1\n#> layer units type dropout l1 l2 mean_rate rate_rms momentum\n#> 1 1 2 Input 0.00 % NA NA NA NA NA\n#> 2 2 200 Rectifier 0.00 % 0.000000 0.000000 0.008580 0.016179 0.000000\n#> 3 3 2 Softmax NA 0.000000 0.000000 0.003447 0.000623 0.000000\n#> mean_weight weight_rms mean_bias bias_rms\n#> 1 NA NA NA NA\n#> 2 0.001886 0.102603 0.497570 0.009971\n#> 3 0.003765 0.404187 0.013307 0.017630\n#> \n#> \n#> H2OBinomialMetrics: deeplearning\n#> ** Reported on training data. **\n#> ** Metrics reported on full training frame **\n#> \n#> MSE: 0.1322443\n#> RMSE: 0.3636541\n#> LogLoss: 0.4297999\n#> Mean Per-Class Error: 0.1780102\n#> AUC: 0.8891613\n#> AUCPR: 0.8503254\n#> Gini: 0.7783226\n#> \n#> Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:\n#> Class1 Class2 Error Rate\n#> Class1 324 110 0.253456 =110/434\n#> Class2 36 315 0.102564 =36/351\n#> Totals 360 425 0.185987 =146/785\n#> \n#> Maximum Metrics: Maximum metrics at their respective thresholds\n#> metric threshold value idx\n#> 1 max f1 0.305430 0.811856 245\n#> 2 max f2 0.235210 0.871535 274\n#> 3 max f0point5 0.456176 0.820152 193\n#> 4 max accuracy 0.456176 0.834395 193\n#> 5 max precision 0.992141 1.000000 0\n#> 6 max recall 0.007261 1.000000 395\n#> 7 max specificity 0.992141 1.000000 0\n#> 8 max absolute_mcc 0.456176 0.664266 193\n#> 9 max min_per_class_accuracy 0.412899 0.823362 210\n#> 10 max mean_per_class_accuracy 0.456176 0.830888 193\n#> 11 max tns 0.992141 434.000000 0\n#> 12 max fns 0.992141 349.000000 0\n#> 13 max fps 0.001274 434.000000 399\n#> 14 max tps 0.007261 351.000000 395\n#> 15 max tnr 0.992141 1.000000 0\n#> 16 max fnr 0.992141 0.994302 0\n#> 17 max fpr 0.001274 1.000000 399\n#> 18 max tpr 0.007261 1.000000 395\n#> \n#> Gains/Lift Table: Extract with `h2o.gainsLift(, )` or `h2o.gainsLift(, valid=, xval=)`\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(mlp_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(mlp_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.491 0.509 \n#> 2 0.884 0.116 \n#> 3 0.595 0.405 \n#> 4 0.971 0.0294\n#> 5 0.908 0.0923\n#> 6 0.883 0.117\n```\n:::\n\n\n## `keras` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nmlp_spec <- mlp() |>\n # We need to set the mode since this engine works with multiple modes\n set_mode(\"classification\") |>\n set_engine(\"keras\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(216)\nmlp_fit <- mlp_spec |> fit(class ~ ., data = bin_train)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nmlp_fit\n#> parsnip model object\n#> \n#> Model: \"sequential_1\"\n#> ________________________________________________________________________________\n#> Layer (type) Output Shape Param # \n#> ================================================================================\n#> dense_2 (Dense) (None, 5) 15 \n#> dense_3 (Dense) (None, 2) 12 \n#> ================================================================================\n#> Total params: 27 (108.00 Byte)\n#> Trainable params: 27 (108.00 Byte)\n#> Non-trainable params: 0 (0.00 Byte)\n#> ________________________________________________________________________________\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(mlp_fit, type = \"class\", new_data = bin_test)\n#> 1/1 - 0s - 42ms/epoch - 42ms/step\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class1 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class2\npredict(mlp_fit, type = \"prob\", new_data = bin_test)\n#> 1/1 - 0s - 6ms/epoch - 6ms/step\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.315 0.685\n#> 2 0.579 0.421\n#> 3 0.505 0.495\n#> 4 0.892 0.108\n#> 5 0.867 0.133\n#> 6 0.471 0.529\n```\n:::\n\n\n:::\n\n## Multinom Regression (`multinom_reg()`) \n\n:::{.panel-tabset}\n\n## `nnet` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# This engine works with a single mode so no need to set that\n# and nnet is the default engine so there is no need to set that either.\nmultinom_reg_spec <- multinom_reg()\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(634)\nmultinom_reg_fit <- multinom_reg_spec |> fit(class ~ ., data = mtl_train)\nmultinom_reg_fit\n#> parsnip model object\n#> \n#> Call:\n#> nnet::multinom(formula = class ~ ., data = data, trace = FALSE)\n#> \n#> Coefficients:\n#> (Intercept) A B\n#> two -0.5868435 1.881920 1.379106\n#> three 0.2910810 1.129622 1.292802\n#> \n#> Residual Deviance: 315.8164 \n#> AIC: 327.8164\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(multinom_reg_fit, type = \"class\", new_data = mtl_test)\n#> # A tibble: 8 × 1\n#> .pred_class\n#> \n#> 1 three \n#> 2 three \n#> 3 three \n#> 4 one \n#> 5 one \n#> 6 two \n#> 7 three \n#> 8 one\npredict(multinom_reg_fit, type = \"prob\", new_data = mtl_test)\n#> # A tibble: 8 × 3\n#> .pred_one .pred_two .pred_three\n#> \n#> 1 0.145 0.213 0.641 \n#> 2 0.308 0.178 0.514 \n#> 3 0.350 0.189 0.461 \n#> 4 0.983 0.00123 0.0155\n#> 5 0.956 0.00275 0.0415\n#> 6 0.00318 0.754 0.243 \n#> 7 0.0591 0.414 0.527 \n#> 8 0.522 0.0465 0.431\n```\n:::\n\n\n## `brulee` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nmultinom_reg_spec <- multinom_reg() |> \n # This engine works with a single mode so no need to set that\n set_engine(\"brulee\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(837)\nmultinom_reg_fit <- multinom_reg_spec |> fit(class ~ ., data = mtl_train)\nmultinom_reg_fit\n#> parsnip model object\n#> \n#> Multinomial regression\n#> \n#> 192 samples, 2 features, 3 classes \n#> class weights one=1, two=1, three=1 \n#> weight decay: 0.001 \n#> batch size: 173 \n#> validation loss after 1 epoch: 0.953\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(multinom_reg_fit, type = \"class\", new_data = mtl_test)\n#> # A tibble: 8 × 1\n#> .pred_class\n#> \n#> 1 three \n#> 2 three \n#> 3 three \n#> 4 one \n#> 5 one \n#> 6 two \n#> 7 three \n#> 8 three\npredict(multinom_reg_fit, type = \"prob\", new_data = mtl_test)\n#> # A tibble: 8 × 3\n#> .pred_one .pred_two .pred_three\n#> \n#> 1 0.131 0.190 0.679 \n#> 2 0.303 0.174 0.523 \n#> 3 0.358 0.192 0.449 \n#> 4 0.983 0.00125 0.0154\n#> 5 0.948 0.00275 0.0491\n#> 6 0.00344 0.796 0.200 \n#> 7 0.0611 0.420 0.518 \n#> 8 0.443 0.0390 0.518\n```\n:::\n\n\n## `glmnet` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nmultinom_reg_spec <- multinom_reg(penalty = 0.01) |> \n # This engine works with a single mode so no need to set that\n set_engine(\"glmnet\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nmultinom_reg_fit <- multinom_reg_spec |> fit(class ~ ., data = mtl_train)\nmultinom_reg_fit\n#> parsnip model object\n#> \n#> \n#> Call: glmnet::glmnet(x = maybe_matrix(x), y = y, family = \"multinomial\") \n#> \n#> Df %Dev Lambda\n#> 1 0 0.00 0.219200\n#> 2 1 1.61 0.199700\n#> 3 2 3.90 0.181900\n#> 4 2 6.07 0.165800\n#> 5 2 7.93 0.151100\n#> 6 2 9.52 0.137600\n#> 7 2 10.90 0.125400\n#> 8 2 12.09 0.114300\n#> 9 2 13.13 0.104100\n#> 10 2 14.22 0.094870\n#> 11 2 15.28 0.086440\n#> 12 2 16.20 0.078760\n#> 13 2 16.99 0.071760\n#> 14 2 17.68 0.065390\n#> 15 2 18.28 0.059580\n#> 16 2 18.80 0.054290\n#> 17 2 19.24 0.049460\n#> 18 2 19.63 0.045070\n#> 19 2 19.96 0.041070\n#> 20 2 20.25 0.037420\n#> 21 2 20.49 0.034090\n#> 22 2 20.70 0.031070\n#> 23 2 20.88 0.028310\n#> 24 2 21.04 0.025790\n#> 25 2 21.17 0.023500\n#> 26 2 21.28 0.021410\n#> 27 2 21.38 0.019510\n#> 28 2 21.46 0.017780\n#> 29 2 21.53 0.016200\n#> 30 2 21.58 0.014760\n#> 31 2 21.63 0.013450\n#> 32 2 21.67 0.012250\n#> 33 2 21.71 0.011160\n#> 34 2 21.74 0.010170\n#> 35 2 21.77 0.009269\n#> 36 2 21.79 0.008445\n#> 37 2 21.82 0.007695\n#> 38 2 21.83 0.007011\n#> 39 2 21.85 0.006389\n#> 40 2 21.86 0.005821\n#> 41 2 21.87 0.005304\n#> 42 2 21.88 0.004833\n#> 43 2 21.89 0.004403\n#> 44 2 21.89 0.004012\n#> 45 2 21.90 0.003656\n#> 46 2 21.90 0.003331\n#> 47 2 21.91 0.003035\n#> 48 2 21.91 0.002765\n#> 49 2 21.91 0.002520\n#> 50 2 21.91 0.002296\n#> 51 2 21.92 0.002092\n#> 52 2 21.92 0.001906\n#> 53 2 21.92 0.001737\n#> 54 2 21.92 0.001582\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(multinom_reg_fit, type = \"class\", new_data = mtl_test)\n#> # A tibble: 8 × 1\n#> .pred_class\n#> \n#> 1 three \n#> 2 three \n#> 3 three \n#> 4 one \n#> 5 one \n#> 6 two \n#> 7 three \n#> 8 one\npredict(multinom_reg_fit, type = \"prob\", new_data = mtl_test)\n#> # A tibble: 8 × 3\n#> .pred_one .pred_two .pred_three\n#> \n#> 1 0.163 0.211 0.626 \n#> 2 0.318 0.185 0.496 \n#> 3 0.358 0.198 0.444 \n#> 4 0.976 0.00268 0.0217\n#> 5 0.940 0.00529 0.0544\n#> 6 0.00617 0.699 0.295 \n#> 7 0.0757 0.390 0.534 \n#> 8 0.506 0.0563 0.438\n```\n:::\n\n\n## `h2o` \n\nThis engine requires the agua extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(agua)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nmultinom_reg_spec <- multinom_reg() |> \n # This engine works with a single mode so no need to set that\n set_engine(\"h2o\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nmultinom_reg_fit <- multinom_reg_spec |> fit(class ~ ., data = mtl_train)\nmultinom_reg_fit\n#> parsnip model object\n#> \n#> Model Details:\n#> ==============\n#> \n#> H2OMultinomialModel: glm\n#> Model ID: GLM_model_R_1763571327438_5625 \n#> GLM Model: summary\n#> family link regularization\n#> 1 multinomial multinomial Elastic Net (alpha = 0.5, lambda = 4.372E-4 )\n#> number_of_predictors_total number_of_active_predictors number_of_iterations\n#> 1 9 6 4\n#> training_frame\n#> 1 object_jbhwnlsrno\n#> \n#> Coefficients: glm multinomial coefficients\n#> names coefs_class_0 coefs_class_1 coefs_class_2 std_coefs_class_0\n#> 1 Intercept -1.119482 -0.831434 -1.706488 -1.083442\n#> 2 A -1.119327 0.002894 0.750746 -1.029113\n#> 3 B -1.208210 0.078752 0.162842 -1.187423\n#> std_coefs_class_1 std_coefs_class_2\n#> 1 -0.819868 -1.830487\n#> 2 0.002661 0.690238\n#> 3 0.077397 0.160041\n#> \n#> H2OMultinomialMetrics: glm\n#> ** Reported on training data. **\n#> \n#> Training Set Metrics: \n#> =====================\n#> \n#> Extract training frame with `h2o.getFrame(\"object_jbhwnlsrno\")`\n#> MSE: (Extract with `h2o.mse`) 0.2982118\n#> RMSE: (Extract with `h2o.rmse`) 0.5460878\n#> Logloss: (Extract with `h2o.logloss`) 0.822443\n#> Mean Per-Class Error: 0.4583896\n#> AUC: (Extract with `h2o.auc`) NaN\n#> AUCPR: (Extract with `h2o.aucpr`) NaN\n#> Null Deviance: (Extract with `h2o.nulldeviance`) 404.5036\n#> Residual Deviance: (Extract with `h2o.residual_deviance`) 315.8181\n#> R^2: (Extract with `h2o.r2`) 0.4682043\n#> AIC: (Extract with `h2o.aic`) NaN\n#> Confusion Matrix: Extract with `h2o.confusionMatrix(,train = TRUE)`)\n#> =========================================================================\n#> Confusion Matrix: Row labels: Actual class; Column labels: Predicted class\n#> one three two Error Rate\n#> one 59 18 1 0.2436 = 19 / 78\n#> three 19 52 5 0.3158 = 24 / 76\n#> two 7 24 7 0.8158 = 31 / 38\n#> Totals 85 94 13 0.3854 = 74 / 192\n#> \n#> Hit Ratio Table: Extract with `h2o.hit_ratio_table(,train = TRUE)`\n#> =======================================================================\n#> Top-3 Hit Ratios: \n#> k hit_ratio\n#> 1 1 0.614583\n#> 2 2 0.890625\n#> 3 3 1.000000\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(multinom_reg_fit, type = \"class\", new_data = mtl_test)\n#> # A tibble: 8 × 1\n#> .pred_class\n#> \n#> 1 three \n#> 2 three \n#> 3 three \n#> 4 one \n#> 5 one \n#> 6 two \n#> 7 three \n#> 8 one\npredict(multinom_reg_fit, type = \"prob\", new_data = mtl_test)\n#> # A tibble: 8 × 3\n#> .pred_one .pred_three .pred_two\n#> \n#> 1 0.146 0.641 0.213 \n#> 2 0.308 0.513 0.179 \n#> 3 0.350 0.460 0.190 \n#> 4 0.983 0.0158 0.00128\n#> 5 0.955 0.0422 0.00284\n#> 6 0.00329 0.244 0.752 \n#> 7 0.0599 0.527 0.413 \n#> 8 0.521 0.432 0.0469\n```\n:::\n\n\n## `keras` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nmultinom_reg_spec <- multinom_reg() |> \n # This engine works with a single mode so no need to set that\n set_engine(\"keras\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nmultinom_reg_fit <- multinom_reg_spec |> fit(class ~ ., data = mtl_train)\n```\n:::\n\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nmultinom_reg_fit\n#> parsnip model object\n#> \n#> Model: \"sequential_2\"\n#> ________________________________________________________________________________\n#> Layer (type) Output Shape Param # \n#> ================================================================================\n#> dense_4 (Dense) (None, 1) 3 \n#> dense_5 (Dense) (None, 3) 6 \n#> ================================================================================\n#> Total params: 9 (36.00 Byte)\n#> Trainable params: 9 (36.00 Byte)\n#> Non-trainable params: 0 (0.00 Byte)\n#> ________________________________________________________________________________\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(multinom_reg_fit, type = \"class\", new_data = mtl_test)\n#> 1/1 - 0s - 41ms/epoch - 41ms/step\n#> # A tibble: 8 × 1\n#> .pred_class\n#> \n#> 1 three \n#> 2 one \n#> 3 one \n#> 4 one \n#> 5 one \n#> 6 three \n#> 7 three \n#> 8 one\npredict(multinom_reg_fit, type = \"prob\", new_data = mtl_test)\n#> 1/1 - 0s - 6ms/epoch - 6ms/step\n#> # A tibble: 8 × 3\n#> .pred_one .pred_two .pred_three\n#> \n#> 1 0.264 0.342 0.394 \n#> 2 0.338 0.325 0.337 \n#> 3 0.355 0.321 0.325 \n#> 4 0.753 0.155 0.0914\n#> 5 0.684 0.191 0.125 \n#> 6 0.0930 0.338 0.569 \n#> 7 0.205 0.349 0.446 \n#> 8 0.421 0.301 0.279\n```\n:::\n\n\n## `spark` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nmultinom_reg_spec <- multinom_reg() |> \n set_engine(\"spark\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nmultinom_reg_fit <- multinom_reg_spec |> fit(class ~ ., data = tbl_mtl$training)\nmultinom_reg_fit\n#> parsnip model object\n#> \n#> Formula: class ~ .\n#> \n#> Coefficients:\n#> (Intercept) A B\n#> one 0.05447853 -1.0569131 -0.9049194\n#> three 0.41207949 0.1458870 0.3959664\n#> two -0.46655802 0.9110261 0.5089529\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(multinom_reg_fit, type = \"class\", new_data = tbl_mtl$test)\n#> # Source: SQL [?? x 1]\n#> # Database: spark_connection\n#> pred_class\n#> \n#> 1 one \n#> 2 one \n#> 3 three \n#> 4 three \n#> 5 three \n#> 6 three \n#> 7 three\npredict(multinom_reg_fit, type = \"prob\", new_data = tbl_mtl$test)\n#> # Source: SQL [?? x 3]\n#> # Database: spark_connection\n#> pred_one pred_three pred_two\n#> \n#> 1 0.910 0.0814 0.00904\n#> 2 0.724 0.233 0.0427 \n#> 3 0.124 0.620 0.256 \n#> 4 0.0682 0.610 0.322 \n#> 5 0.130 0.571 0.300 \n#> 6 0.115 0.549 0.336 \n#> 7 0.0517 0.524 0.424\n```\n:::\n\n\n:::\n\n## Naive Bayes (`naive_Bayes()`) \n\n:::{.panel-tabset}\n\n## `h2o` \n\nThis engine requires the agua extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(agua)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nnaive_Bayes_spec <- naive_Bayes() |> \n # This engine works with a single mode so no need to set that\n set_engine(\"h2o\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nnaive_Bayes_fit <- naive_Bayes_spec |> fit(class ~ ., data = bin_train)\nnaive_Bayes_fit\n#> parsnip model object\n#> \n#> Model Details:\n#> ==============\n#> \n#> H2OBinomialModel: naivebayes\n#> Model ID: NaiveBayes_model_R_1763571327438_5626 \n#> Model Summary: \n#> number_of_response_levels min_apriori_probability max_apriori_probability\n#> 1 2 0.44713 0.55287\n#> \n#> \n#> H2OBinomialMetrics: naivebayes\n#> ** Reported on training data. **\n#> \n#> MSE: 0.1737113\n#> RMSE: 0.4167869\n#> LogLoss: 0.5473431\n#> Mean Per-Class Error: 0.2356138\n#> AUC: 0.8377152\n#> AUCPR: 0.788608\n#> Gini: 0.6754303\n#> \n#> Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:\n#> Class1 Class2 Error Rate\n#> Class1 274 160 0.368664 =160/434\n#> Class2 36 315 0.102564 =36/351\n#> Totals 310 475 0.249682 =196/785\n#> \n#> Maximum Metrics: Maximum metrics at their respective thresholds\n#> metric threshold value idx\n#> 1 max f1 0.175296 0.762712 286\n#> 2 max f2 0.133412 0.851119 306\n#> 3 max f0point5 0.497657 0.731343 183\n#> 4 max accuracy 0.281344 0.765605 248\n#> 5 max precision 0.999709 1.000000 0\n#> 6 max recall 0.020983 1.000000 390\n#> 7 max specificity 0.999709 1.000000 0\n#> 8 max absolute_mcc 0.280325 0.541898 249\n#> 9 max min_per_class_accuracy 0.398369 0.758065 215\n#> 10 max mean_per_class_accuracy 0.280325 0.771945 249\n#> 11 max tns 0.999709 434.000000 0\n#> 12 max fns 0.999709 347.000000 0\n#> 13 max fps 0.006522 434.000000 399\n#> 14 max tps 0.020983 351.000000 390\n#> 15 max tnr 0.999709 1.000000 0\n#> 16 max fnr 0.999709 0.988604 0\n#> 17 max fpr 0.006522 1.000000 399\n#> 18 max tpr 0.020983 1.000000 390\n#> \n#> Gains/Lift Table: Extract with `h2o.gainsLift(, )` or `h2o.gainsLift(, valid=, xval=)`\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(naive_Bayes_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class2 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class2\npredict(naive_Bayes_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.181 0.819 \n#> 2 0.750 0.250 \n#> 3 0.556 0.444 \n#> 4 0.994 0.00643\n#> 5 0.967 0.0331 \n#> 6 0.630 0.370\n```\n:::\n\n\n## `klaR` \n\nThis engine requires the discrim extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(discrim)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# This engine works with a single mode so no need to set that\n# and klaR is the default engine so there is no need to set that either.\nnaive_Bayes_spec <- naive_Bayes()\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nnaive_Bayes_fit <- naive_Bayes_spec |> fit(class ~ ., data = bin_train)\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(naive_Bayes_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(naive_Bayes_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.250 0.750 \n#> 2 0.593 0.407 \n#> 3 0.333 0.667 \n#> 4 0.993 0.00658\n#> 5 0.978 0.0223 \n#> 6 0.531 0.469\n```\n:::\n\n\n## `naivebayes` \n\nThis engine requires the discrim extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(discrim)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nnaive_Bayes_spec <- naive_Bayes() |> \n # This engine works with a single mode so no need to set that\n set_engine(\"naivebayes\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nnaive_Bayes_fit <- naive_Bayes_spec |> fit(class ~ ., data = bin_train)\nnaive_Bayes_fit\n#> parsnip model object\n#> \n#> \n#> ================================= Naive Bayes ==================================\n#> \n#> Call:\n#> naive_bayes.default(x = maybe_data_frame(x), y = y, usekernel = TRUE)\n#> \n#> -------------------------------------------------------------------------------- \n#> \n#> Laplace smoothing: 0\n#> \n#> -------------------------------------------------------------------------------- \n#> \n#> A priori probabilities: \n#> \n#> Class1 Class2 \n#> 0.5528662 0.4471338 \n#> \n#> -------------------------------------------------------------------------------- \n#> \n#> Tables: \n#> \n#> -------------------------------------------------------------------------------- \n#> :: A::Class1 (KDE)\n#> -------------------------------------------------------------------------------- \n#> \n#> Call:\n#> \tdensity.default(x = x, na.rm = TRUE)\n#> \n#> Data: x (434 obs.);\tBandwidth 'bw' = 0.2548\n#> \n#> x y \n#> Min. :-2.5638 Min. :0.0002915 \n#> 1st Qu.:-1.2013 1st Qu.:0.0506201 \n#> Median : 0.1612 Median :0.1619843 \n#> Mean : 0.1612 Mean :0.1831190 \n#> 3rd Qu.: 1.5237 3rd Qu.:0.2581668 \n#> Max. : 2.8862 Max. :0.5370762 \n#> -------------------------------------------------------------------------------- \n#> :: A::Class2 (KDE)\n#> -------------------------------------------------------------------------------- \n#> \n#> Call:\n#> \tdensity.default(x = x, na.rm = TRUE)\n#> \n#> Data: x (351 obs.);\tBandwidth 'bw' = 0.2596\n#> \n#> x y \n#> Min. :-2.5428 Min. :4.977e-05 \n#> 1st Qu.:-1.1840 1st Qu.:2.672e-02 \n#> Median : 0.1748 Median :2.239e-01 \n#> Mean : 0.1748 Mean :1.836e-01 \n#> 3rd Qu.: 1.5336 3rd Qu.:2.926e-01 \n#> Max. : 2.8924 Max. :3.740e-01 \n#> \n#> -------------------------------------------------------------------------------- \n#> :: B::Class1 (KDE)\n#> -------------------------------------------------------------------------------- \n#> \n#> Call:\n#> \tdensity.default(x = x, na.rm = TRUE)\n#> \n#> Data: x (434 obs.);\tBandwidth 'bw' = 0.1793\n#> \n#> x y \n#> Min. :-2.4501 Min. :5.747e-05 \n#> 1st Qu.:-1.0894 1st Qu.:1.424e-02 \n#> Median : 0.2713 Median :8.798e-02 \n#> Mean : 0.2713 Mean :1.834e-01 \n#> 3rd Qu.: 1.6320 3rd Qu.:2.758e-01 \n#> Max. : 2.9927 Max. :6.872e-01 \n#> \n#> -------------------------------------------------------------------------------- \n#> :: B::Class2 (KDE)\n#> -------------------------------------------------------------------------------- \n#> \n#> Call:\n#> \tdensity.default(x = x, na.rm = TRUE)\n#> \n#> Data: x (351 obs.);\tBandwidth 'bw' = 0.2309\n#> \n#> x y \n#> Min. :-2.4621 Min. :5.623e-05 \n#> 1st Qu.:-0.8979 1st Qu.:1.489e-02 \n#> Median : 0.6663 Median :7.738e-02 \n#> Mean : 0.6663 Mean :1.595e-01 \n#> 3rd Qu.: 2.2305 3rd Qu.:3.336e-01 \n#> Max. : 3.7948 Max. :4.418e-01 \n#> \n#> --------------------------------------------------------------------------------\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(naive_Bayes_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(naive_Bayes_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.249 0.751 \n#> 2 0.593 0.407 \n#> 3 0.332 0.668 \n#> 4 0.993 0.00674\n#> 5 0.978 0.0224 \n#> 6 0.532 0.468\n```\n:::\n\n\n:::\n\n## K-Nearest Neighbors (`nearest_neighbor()`) \n\n:::{.panel-tabset}\n\n## `kknn` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nnearest_neighbor_spec <- nearest_neighbor() |>\n # We need to set the mode since this engine works with multiple modes\n # and kknn is the default engine so there is no need to set that either.\n set_mode(\"classification\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nnearest_neighbor_fit <- nearest_neighbor_spec |> fit(class ~ ., data = bin_train)\nnearest_neighbor_fit\n#> parsnip model object\n#> \n#> \n#> Call:\n#> kknn::train.kknn(formula = class ~ ., data = data, ks = min_rows(5, data, 5))\n#> \n#> Type of response variable: nominal\n#> Minimal misclassification: 0.2101911\n#> Best kernel: optimal\n#> Best k: 5\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(nearest_neighbor_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(nearest_neighbor_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.2 0.8 \n#> 2 0.72 0.28\n#> 3 0.32 0.68\n#> 4 1 0 \n#> 5 1 0 \n#> 6 1 0\n```\n:::\n\n\n:::\n\n## Null Model (`null_model()`) \n\n:::{.panel-tabset}\n\n## `parsnip` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nnull_model_spec <- null_model() |>\n # We need to set the mode since this engine works with multiple modes\n # and parsnip is the default engine so there is no need to set that either.\n set_mode(\"classification\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nnull_model_fit <- null_model_spec |> fit(class ~ ., data = bin_train)\nnull_model_fit\n#> parsnip model object\n#> \n#> Null Regression Model\n#> Predicted Value: Class1\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(null_model_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class1 \n#> 2 Class1 \n#> 3 Class1 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(null_model_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.553 0.447\n#> 2 0.553 0.447\n#> 3 0.553 0.447\n#> 4 0.553 0.447\n#> 5 0.553 0.447\n#> 6 0.553 0.447\n```\n:::\n\n\n:::\n\n## Partial Least Squares (`pls()`) \n\n:::{.panel-tabset}\n\n## `mixOmics` \n\nThis engine requires the plsmod extension package, so let's load this first:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(plsmod)\n```\n:::\n\n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npls_spec <- pls() |>\n # We need to set the mode since this engine works with multiple modes\n # and mixOmics is the default engine so there is no need to set that either.\n set_mode(\"classification\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npls_fit <- pls_spec |> fit(class ~ ., data = bin_train)\npls_fit\n#> parsnip model object\n#> \n#> \n#> Call:\n#> mixOmics::splsda(X = x, Y = y, ncomp = ncomp, keepX = keepX) \n#> \n#> sPLS-DA (regression mode) with 2 sPLS-DA components. \n#> You entered data X of dimensions: 785 2 \n#> You entered data Y with 2 classes. \n#> \n#> Selection of [2] [2] variables on each of the sPLS-DA components on the X data set. \n#> No Y variables can be selected. \n#> \n#> Main numerical outputs: \n#> -------------------- \n#> loading vectors: see object$loadings \n#> variates: see object$variates \n#> variable names: see object$names \n#> \n#> Functions to visualise samples: \n#> -------------------- \n#> plotIndiv, plotArrow, cim \n#> \n#> Functions to visualise variables: \n#> -------------------- \n#> plotVar, plotLoadings, network, cim \n#> \n#> Other functions: \n#> -------------------- \n#> selectVar, tune, perf, auc\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(pls_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(pls_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.462 0.538\n#> 2 0.631 0.369\n#> 3 0.512 0.488\n#> 4 0.765 0.235\n#> 5 0.675 0.325\n#> 6 0.624 0.376\n```\n:::\n\n\n:::\n\n## Random Forests (`rand_forest()`) \n\n:::{.panel-tabset}\n\n## `ranger` \n\nWe create a model specification via:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nrand_forest_spec <- rand_forest() |>\n # We need to set the mode since this engine works with multiple modes\n # and ranger is the default engine so there is no need to set that either.\n set_engine(\"ranger\", keep.inbag = TRUE) |> \n # However, we'll set the engine and use the keep.inbag=TRUE option so that we \n # can produce interval predictions. This is not generally required. \n set_mode(\"classification\")\n```\n:::\n\n\nNow we create the model fit object:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n# Set the random number seed to an integer for reproducibility: \nset.seed(841)\nrand_forest_fit <- rand_forest_spec |> fit(class ~ ., data = bin_train)\nrand_forest_fit\n#> parsnip model object\n#> \n#> Ranger result\n#> \n#> Call:\n#> ranger::ranger(x = maybe_data_frame(x), y = y, keep.inbag = ~TRUE, num.threads = 1, verbose = FALSE, seed = sample.int(10^5, 1), probability = TRUE) \n#> \n#> Type: Probability estimation \n#> Number of trees: 500 \n#> Sample size: 785 \n#> Number of independent variables: 2 \n#> Mtry: 1 \n#> Target node size: 10 \n#> Variable importance mode: none \n#> Splitrule: gini \n#> OOB prediction error (Brier s.): 0.1477679\n```\n:::\n\n\nThe holdout data can be predicted:\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npredict(rand_forest_fit, type = \"class\", new_data = bin_test)\n#> # A tibble: 6 × 1\n#> .pred_class\n#> \n#> 1 Class2 \n#> 2 Class1 \n#> 3 Class2 \n#> 4 Class1 \n#> 5 Class1 \n#> 6 Class1\npredict(rand_forest_fit, type = \"prob\", new_data = bin_test)\n#> # A tibble: 6 × 2\n#> .pred_Class1 .pred_Class2\n#> \n#> 1 0.220 0.780 \n#> 2 0.837 0.163 \n#> 3 0.220 0.780 \n#> 4 0.951 0.0485\n#> 5 0.785 0.215 \n#> 6 0.913 0.0868\npredict(rand_forest_fit, type = \"conf_int\", new_data = bin_test)\n#> Warning in rInfJack(x, inbag.counts): Sample size <=20, no calibration\n#> performed.\n#> Warning in rInfJack(x, inbag.counts): Sample size <=20, no calibration\n#> performed.\n#> Warning in sqrt(infjack): NaNs produced\n#> # A tibble: 6 × 4\n#> .pred_lower_Class1 .pred_upper_Class1 .pred_lower_Class2 .pred_upper_Class2\n#>