This demo will use the starwars dataset from the dplyr library with list columns dropped, renamed to sw.
regex is a powerful tool in SQL. It is one of the more common uses day-to-day. The following special characters are particularly important:
Character | Definition |
---|---|
% | zero or multiple characters |
_ | single character |
[] | any single character within brackets |
^ | any character not in brackets |
|
range of characters |
The best way to understand how this works is with some practical(ish) examples.
Example 1: How many characters have eye colours beginning with a b?
sqldf("
SELECT COUNT(name) FROM sw
WHERE eye_color LIKE 'b%';
")
## COUNT(name)
## 1 51
Example 2: List all the planets containing “oo”
sqldf("
SELECT DISTINCT homeworld FROM sw
WHERE homeworld LIKE '%oo%';
")
## homeworld
## 1 Tatooine
## 2 Naboo
The following example uses NOT NULL to filter rows containing NA.
Example 3: Who is the youngest character from any planet beginning with N?
sqldf("
SELECT name, homeworld, birth_year FROM sw
WHERE homeworld LIKE 'N%'
AND birth_year NOT NULL
ORDER BY birth_year ASC
LIMIT 1;
")
## name homeworld birth_year
## 1 R2-D2 Naboo 33
Example 5: What is the average height of characters whose names don’t end in an “a”
sqldf("SELECT AVG(height) AS average_height FROM sw
WHERE name NOT LIKE '%a'")
## average_height
## 1 175.942