This is a simple overview of machine learning in R. It shall make use of the tidymodels libraries and is intended to be a personal reminder and help sheet for any new learners. I intend to continuously improve the site over time as I better learn the features of tidymodels!
The table below shows the libraries included with tidymodels and the methods most commonly used by me.
| Packages | Methods | Description |
|---|---|---|
| rsample | initial_split | Splits data into a testing and training set |
| rsample | training | returns the training data set |
| rsample | testing | returns the testing data set |
| parsnip | various parsnip | pending description |
| recipes | recipe | pending description |
| recipes | step_<name> | pending description |
| recipes | prep | pending description |
| recipes | bake | pending description |
| recipes | juice | pending description |
| workflows | add_recipe | pending description |
| workflows | add_model | pending description |
| tune | various tune | pending description |
| yardstick | metrics | pending description |
| yardstick | roc_auc | pending description |
| broom | tidy | pending description |
| dials | various dials | pending description |
| corrr | corrr_stuff | pending description |
Additional useful libraries
| Packages | Methods | Description |
|---|---|---|
| corrplot | corrplot | creates a pretty correlation plot |
In order to determine which machine learning algorithm should be employed you must study your data! Creating tidy data is fundamental to machine learning. The following steps must be understood:
Supervised machine learning requires labelled data…