This demo uses both the iris and dplyr::starwars datasets. The startwars dataset has list columns removed.
The GROUP BY statement allows data to be aggregated in subsets.
Example: What is the average petal length of each iris species?
sqldf("
SELECT Species, AVG(`Petal.Length`) AS average_petal_length
FROM iris
GROUP BY Species;
")
## Species average_petal_length
## 1 setosa 1.462
## 2 versicolor 4.260
## 3 virginica 5.552
Example: For the virginica and setosa species what is the median sepal length? List the longest first.
sqldf("
SELECT MEDIAN(`Sepal.Width`) AS mean_sepal_width, Species FROM iris
WHERE Species IN ('virginica','setosa')
GROUP BY Species
ORDER BY mean_sepal_width DESC
")
## mean_sepal_width Species
## 1 3.4 setosa
## 2 3.0 virginica
Example: From the star wars dataset, what is the average character height from each homeworld? List the 7 with the shortest average.
sqldf("
SELECT AVG(height) AS average_height, homeworld FROM sw
WHERE homeworld NOT NULL AND
height NOT NULL
GROUP BY homeworld
ORDER BY average_height ASC
LIMIT 7
")
## average_height homeworld
## 1 79 Aleen Minor
## 2 88 Endor
## 3 94 Vulpter
## 4 112 Malastare
## 5 122 Troiken
## 6 137 Toydaria
## 7 150 Chandrila
Example: What is the maximum height from species from each homeworld? List the top 3
sqldf("
SELECT MAX(height) AS max_height, homeworld FROM sw
WHERE homeworld NOT NULL AND
height NOT NULL
GROUP BY homeworld
ORDER BY max_height DESC
LIMIT 3
")
## max_height homeworld
## 1 264 Quermia
## 2 234 Kashyyyk
## 3 229 Kamino