This article discusses MySQL aggregate functions,
This Articles Contents
Aggregate Methods
In this lesson, we’ll demonstrate working with a few of the important aggregate functions.
Example Syntax :
SELECT AggregateFunction(col1)
FROM table;
-- Query 1
SELECT COUNT(*) FROM Actors;
-- Query 2
SELECT SUM(NetworthInMillions) FROM Actors;
-- Query 3
SELECT AVG(NetWorthInMillions) FROM Actors;
-- Query 4
SELECT MIN(NetWorthInMillions) FROM Actors;
-- Query 5
SELECT MAX(NetWorthInMillions) FROM Actors;
-- Query 6
SELECT STDDEV(NetWorthInMillions) FROM Actors;
SELECT COUNT(*) FROM tablename;
We can count the number of rows in a table using the COUNT function.
SELECT COUNT(*) FROM Actors;
Note the output of the query is a single value rather than rows.
SELECT SUM(column) FROM table;
Using the SUM function, we can add up the numeric values of a column. For instance, the following query will sum the net worth of all the actors in our example setup to report the cumulative worth of all the actors.
SELECT SUM(NetworthInMillions) FROM Actors;
SELECT AVG(column) FROM table;
We can use the AVG function to calculate the average net worth of actors as follows:
SELECT AVG(NetWorthInMillions) FROM Actors;
SELECT MIN(column) FROM table;
We can find the actor with the least net worth as follows:
SELECT MIN(NetWorthInMillions) FROM Actors;
SELECT MAX(column) FROM table;
Similarly, we can find the actor with the most net worth as follows:
SELECT MAX(NetWorthInMillions) FROM Actors;
Note that we can also apply the MIN and MAX functions to non-numerical columns such as FirstName. MySQL would return the actor with the first name that occurs first or last when first names are sorted for MIN and MAX respectively. The queries are shown below:
SELECT STDDEV(column) FROM table;
We can find the income disparity among actors using the standard deviation function STD or STDDEV as follows:
SELECT STDDEV(NetWorthInMillions) FROM Actors;
GROUP BY
The GROUP BY, as the name suggests, sorts rows together into groups. The clause returns one row for each group. Data is organized using a comma separated list of columns as the criteria specified after the GROUP BY clause. The GROUP BY statement is often used with aggregate functions such as COUNT, MAX, MIN, SUM, and AVG to calculate an aggregated stat for each group.
Syntactically, the GROUP BY clause must appear after the FROM and WHERE clauses and is also evaluated after them. However, GROUP BY is evaluated before the ORDER BY, LIMIT, and HAVING clauses.
Example Syntax :
SELECT col1, AggregateFunction(col3)
FROM table;
GROUP BY col1, col2, ... coln
ORDER BY col2;
-- Query 1
SELECT FirstName FROM Actors GROUP BY FirstName;
-- Query 2
SELECT FirstName, SecondName FROM Actors GROUP BY FirstName;
-- Query 3
SELECT Gender, COUNT(*) FROM Actors GROUP BY Gender;
-- Query 4
SELECT Gender FROM Actors GROUP BY Gender;
-- Query 5
SELECT MaritalStatus, AVG(NetworthInMillions) FROM Actors GROUP BY MaritalStatus ORDER BY Mar
SELECT column FROM table GROUP BY column;
To get a feel for using the GROUP BY clause, we’ll write a query to group results by first name. We already know the table consists of rows with all unique first names and we’ll be returned eleven groups equal to the number of rows in the table.
SELECT FirstName FROM Actors GROUP BY FirstName;
SELECT column1, column2 FROM table GROUP BY column1;
Note that in the previous query the SELECT clause also specifies the same column name as the GROUP BY clause. The following query would fail:
SELECT FirstName, SecondName FROM Actors GROUP BY FirstName;
Recall that GROUP BY returns one row for each group. Each group will have the same value for FirstName but may have different values for the SecondName. Thus, it is nonsensical to associate any one value for SecondName with the entire group and display SecondName in the output. Attempting to do so results in a syntax error as shown below:
ERROR 1055 (42000): Expression #2 of SELECT list is not in GROUPBY clause and contains nonaggregated column ‘MovieIndustry.Actors.SecondName’ which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
We can’t have non-aggregated columns in the SELECT, ORDER BY,and HAVING clauses when these columns don’t appear in the GROUP BY clause or are functionally dependent on columns that do appear. Though there are exceptions, when the non-aggregated column has a single value it can appear in the SELECT, ORDER BY, and HAVING clauses. This restriction is ensured by setting sql_mode to only_full_group_by. We can unset this setting and retry our query, but the value chosen for SecondName would be arbitrary if multiple actors share the same first name.
SELECT Gender, COUNT(*) FROM Actors GROUP BY Gender;
Now we’ll try something more useful with the GROUP BY clause. If we are asked to find the number of male and female actors, we can use group by gender to fulfill the query.
SELECT Gender, COUNT(*) FROM Actors GROUP BY Gender;
Each group has the count function applied to it which outputs the rows in each group. If we don’t ask for the count, the query will return the two groups it finds:
SELECT Gender FROM Actors GROUP BY Gender;
SELECT column, AVG(column) FROM table GROUP BY column ORDER BY column ASC;
As another example, we can find the average net worth of actors according to their marital status as follows:
SELECT MaritalStatus, AVG(NetworthInMillions) FROM Actors GROUP BY MaritalStatus ORDER BY MaritalStatus ASC;
HAVING
We are familiar with the WHERE clause that can be used to filter rows. In the same way, the HAVING clause allows us to filter groups. At times, the HAVING clause can be used to filter rows to display but that is not the intended use and can make the query slower. The HAVING clause should be used to decide what rows form each group. Remember the HAVING clause works on groups of rows whereas the WHERE clause works on individual rows. We’ll work with a couple of queries in this lesson to clarify the concept.
Example Syntax #
SELECT col1, AggregateFunction(col3) AS count
FROM table
GROUP BY col1, col2, ... coln
HAVING count > 75
ORDER BY col2;
-- Query 1
SELECT MaritalStatus, AVG(NetworthInMillions) AS NetWorth FROM Actors GROUP BY MaritalStatus HAVING NetWorth > 450 OR NetWorth < 250;
-- Query 2
SELECT MaritalStatus, AVG(NetworthInMillions) AS NetWorth FROM Actors GROUP BY MaritalStatus HAVING MaritalStatus='Married';
-- Query 3
SELECT MaritalStatus, AVG(NetworthInMillions) AS NetWorth FROM Actors WHERE MaritalStatus='Married' GROUP BY MaritalStatus;
SELECT column1, AVG(column2) AS column3 FROM table GROUP BY column1 HAVING column3 > 450 OR column3 < 250;
In the lesson on the GROUP BY clause we wrote a query to find out the average net worth of actors by their marital status. Now we can filter on the group results so that we only see those groups whose net worth is either greater than 450 million or less than 250 million.
SELECT MaritalStatus, AVG(NetworthInMillions) AS NetWorth FROM Actors GROUP BY MaritalStatus HAVING NetWorth > 450 OR NetWorth < 250;
Observe that one group has been filtered out because it didn’t meet the conditions specified in the HAVING clause. Also notice, we define an alias for the function result so that we don’t have to rewrite the function in the conditions for the HAVING clause.
SELECT column1, AVG(column2) AS column3 FROM table GROUP BY column1 HAVING column1=’Married’;
Usually, the HAVING clause is used with aggregate functions. If you find yourself writing a HAVING clause that uses a column or expression that isn’t in the SELECT clause, it is likely you should be using the WHERE clause instead. For instance, consider the following query, which uses the marital status column in the HAVING clause.
SELECT MaritalStatus, AVG(NetworthInMillions) AS NetWorth FROM Actors GROUP BY MaritalStatus HAVING MaritalStatus='Married';
The same query using the WHERE clause can be rewritten as follows:
SELECT MaritalStatus, AVG(NetworthInMillions) AS NetWorth FROM Actors WHERE MaritalStatus='Married' GROUP BY MaritalStatus;