This is a simple overview of machine learning in R. It shall make use of the tidymodels libraries and is intended to be a personal reminder and help sheet for any new learners. I intend to continuously improve the site over time as I better learn the features of tidymodels!
The table below shows the libraries included with tidymodels and the methods most commonly used by me.
Packages | Methods | Description |
---|---|---|
rsample | initial_split | Splits data into a testing and training set |
rsample | training | returns the training data set |
rsample | testing | returns the testing data set |
parsnip | various parsnip | pending description |
recipes | recipe | pending description |
recipes | step_<name> | pending description |
recipes | prep | pending description |
recipes | bake | pending description |
recipes | juice | pending description |
workflows | add_recipe | pending description |
workflows | add_model | pending description |
tune | various tune | pending description |
yardstick | metrics | pending description |
yardstick | roc_auc | pending description |
broom | tidy | pending description |
dials | various dials | pending description |
corrr | corrr_stuff | pending description |
Additional useful libraries
Packages | Methods | Description |
---|---|---|
corrplot | corrplot | creates a pretty correlation plot |
In order to determine which machine learning algorithm should be employed you must study your data! Creating tidy data is fundamental to machine learning. The following steps must be understood:
Supervised machine learning requires labelled data…