The GROUP BY statement in SQL is used to group the rows in a result set based on the values in one or more columns. The GROUP BY statement is used in combination with aggregate functions (such as SUM, AVG, MAX, MIN, and COUNT) to produce summarized results for each group of rows in the result set.
Syntax of SQL GROUP BY Statement
SELECT column1, aggregate_function(column2) FROM table_name WHERE condition GROUP BY column1;
In the above syntax, the “column1” is used to group the rows in the result set, and the “aggregate_function(column2)” is used to produce summarized results for each group of rows. The “WHERE” clause is used to filter the rows based on a specified condition, and it is optional.
Example of SQL GROUP BY Statement
Consider the following table, “orders”:
CustomerID | ProductID | Quantity |
---|---|---|
C001 | P001 | 2 |
C002 | P002 | 3 |
C001 | P002 | 1 |
C002 | P001 | 4 |
Now, if we want to find the total quantity of each product ordered by each customer, we can use the following SQL statement with the GROUP BY statement:
SELECT CustomerID, ProductID, SUM(Quantity) as TotalQuantity FROM orders GROUP BY CustomerID, ProductID;
The output of this SQL statement would be:
CustomerID | ProductID | TotalQuantity |
---|---|---|
C001 | P001 | 2 |
C001 | P002 | 1 |
C002 | P001 | 4 |
C002 | P002 | 3 |
In the above output, the rows in the “orders” table are grouped by the values in the “CustomerID” and “ProductID” columns, and the total quantity of each product ordered by each customer is calculated using the SUM aggregate function. The result set contains one row for each unique combination of values in the “CustomerID” and “ProductID” columns.
Using GROUP BY with HAVING Clause
The HAVING clause in SQL can be used in combination with the GROUP BY statement to filter the groups based on a specified condition. The HAVING clause is used to specify the conditions that the groups must meet in order to be included in the result set.
For example, if we want to find the total quantity of each product ordered by each customer, but only for customers who have ordered a total quantity of more than 3, we can use the following SQL statement:
SELECT CustomerID, ProductID, SUM(Quantity) as TotalQuantity FROM orders GROUP BY CustomerID, ProductID HAVING SUM(Quantity) > 3;
The output of this SQL statement would be:
CustomerID | ProductID | TotalQuantity |
---|---|---|
C002 | P001 | 4 |
In the above output, the rows in the “orders” table are grouped by the values in the “CustomerID” and “ProductID” columns, and the total quantity of each product ordered by each customer is calculated using the SUM aggregate function. The HAVING clause is used to filter the groups and only the groups that meet the specified condition (i.e. total quantity is more than 3) are included in the result set.
Conclusion
The SQL GROUP BY statement is an important tool for summarizing and aggregating data in a result set. It is used in combination with aggregate functions to produce summarized results for each group of rows in the result set. When used with the HAVING clause, the GROUP BY statement provides even more powerful functionality for filtering and aggregating data based on specified conditions.