This is a simple overview of machine learning in R. It shall make use of the tidymodels libraries and is intended to be a personal reminder and help sheet for any new learners. I intend to continuously improve the site over time as I better learn the features of tidymodels!

The table below shows the libraries included with tidymodels and the methods most commonly used by me.

Packages Methods Description
rsample initial_split Splits data into a testing and training set
rsample training returns the training data set
rsample testing returns the testing data set
parsnip various parsnip pending description
recipes recipe pending description
recipes step_<name> pending description
recipes prep pending description
recipes bake pending description
recipes juice pending description
workflows add_recipe pending description
workflows add_model pending description
tune various tune pending description
yardstick metrics pending description
yardstick roc_auc pending description
broom tidy pending description
dials various dials pending description
corrr corrr_stuff pending description

Additional useful libraries

Packages Methods Description
corrplot corrplot creates a pretty correlation plot

Data Preprocessing

In order to determine which machine learning algorithm should be employed you must study your data! Creating tidy data is fundamental to machine learning. The following steps must be understood:

  1. Dealing with missingness
  2. Correlations within data
  3. How sampling works

Other Useful Concepts

  1. Parallel Processing
  2. Metrics

Supervised Learning

Supervised machine learning requires labelled data…

  1. Linear Regression
  2. Logistic Regression

Unsupervised Learning


Additional Resources