This demo will use the starwars dataset from the dplyr library with list columns dropped, renamed to sw.

regex is a powerful tool in SQL. It is one of the more common uses day-to-day. The following special characters are particularly important:

Character Definition
% zero or multiple characters
_ single character
[] any single character within brackets
^ any character not in brackets
range of characters

The best way to understand how this works is with some practical(ish) examples.

Example 1: How many characters have eye colours beginning with a b?

sqldf("
  SELECT COUNT(name) FROM sw
  WHERE eye_color LIKE 'b%';
")
##   COUNT(name)
## 1          51

Example 2: List all the planets containing “oo”

sqldf("
  SELECT DISTINCT homeworld FROM sw
  WHERE homeworld LIKE '%oo%';
")
##   homeworld
## 1  Tatooine
## 2     Naboo

The following example uses NOT NULL to filter rows containing NA.

Example 3: Who is the youngest character from any planet beginning with N?

sqldf("
  SELECT name, homeworld, birth_year FROM sw
  WHERE homeworld LIKE 'N%'
  AND birth_year NOT NULL
  ORDER BY birth_year ASC
  LIMIT 1;
")
##    name homeworld birth_year
## 1 R2-D2     Naboo         33

Example 5: What is the average height of characters whose names don’t end in an “a”

sqldf("SELECT AVG(height) AS average_height FROM sw
      WHERE name NOT LIKE '%a'")
##   average_height
## 1        175.942