Pages

Thursday, November 10, 2011

GROUP BY with ROLLUP, CUBE, and GROUPING SETS in SQL Server 2008

Introduction:
This post describes a new feature in SQL Server 2008 – GROUP BY with ROLLUP, CUBE, and GROUPING SETS


The ROLLUP, CUBE, and GROUPING SETS operators are extensions of the GROUP BY clause. The ROLLUP, CUBE, or GROUPING SETS operators can generate the same result set as when you use UNION ALL to combine single grouping queries; however, using one of the GROUP BY operators is usually more efficient.


ROLLUP: 
ROLLUP clause is used to do aggregate operation on multiple levels in hierarchy. 


CUBE: 
CUBE is similar to the WITH ROLLUP clause in its operation as it adds new summary rows for groups of results. However, rather than creating a hierarchy of results, a cube contains totals for every possible permutation of group. Returning to our pivot table layout, every total will be calculated.


GROUPING SETS:
The GROUPING SETS operator can generate the same result set as that generated by using a simple GROUP BY, ROLLUP, or CUBE operator. When all the groupings that are generated by using a full ROLLUP or CUBE operator are not required, you can use GROUPING SETS to specify only the groupings that you want. The GROUPING SETS list can contain duplicate groupings; and, when GROUPING SETS is used with ROLLUP and CUBE, it might generate duplicate groupings. Duplicate groupings are retained as they would be by using UNION ALL.



Multiple columns that are in inner parentheses in theGROUPING SETS list are treated as a single set. For example, in the clause GROUP BY GROUPING SETS ((Colum1, Column2), Column3, Column4), Column1 and Column2 are treated as one column. For an example of how to use GROUPING SETS with composite elements.


When the GROUPING SETS list contains multiple sets in inner parentheses, separated by commas, the output of the sets are concatenated. The result set is the cross product or Cartesian product of the grouping sets. For an example of how to use GROUP BY with concatenated ROLLUP operations.


Example:
First, let's see how we rewrite simple WITH ROLLUP and CUBE queries using the new syntax.  I'll use the same schema and queries as in my previous posts:


CREATE TABLE #Product (ProductId INT, ProductName Varchar(500), Price MONEY)
INSERT INTO #Product (ProductId ,ProductName,Price)
VALUES(1, 'Test1', 12000),
 (2, 'Test2', 18000),
 (3, 'Test3', 25000),
 (4, 'Test4', 15000),
 (1, 'Test4', 15000)


SELECT ProductId ,ProductName, SUM(Price) AS Price
FROM #Product
GROUP BY ProductId ,ProductName WITH ROLLUP


SELECT ProductId ,ProductName, SUM(Price) AS Price
FROM #Product
GROUP BY ProductId ,ProductName WITH CUBE


We can rewrite these two queries using the new syntax as:


SELECT ProductId ,ProductName, SUM(Price) AS Price
FROM #Product
GROUP BY ROLLUP(ProductId ,ProductName)


SELECT ProductId ,ProductName, SUM(Price) AS Price
FROM #Product
GROUP BY CUBE(ProductId ,ProductName)


These new queries are semantically equivalent to and use the same query plans as the original queries.  Note that the new ROLLUP and CUBE syntax is only available in compatibility level 100.  The more general GROUPING SETS syntax, which I will discuss next, is also available in earlier compatibility levels.


The new GROUPING SETS syntax is considerably more powerful.  It allows us to specify precisely which aggregations we want to compute. 


ROLLUP and CUBE are just shorthand for two common usages of GROUPING SETS.  We can express the above ROLLUP query as:
SELECT ProductId ,ProductName, SUM(Price) AS Price


FROM #Product Group By
GROUPING SETS((ProductId, ProductName), (ProductId), ())
Result:



This query explicitly asks SQL Server to aggregate price by productID and product, to aggregate by product only, and to compute the total for all productID for all product. The () syntax with no GROUP BY columns denotes the total.  Similarly, we can express the above CUBE query by asking SQL Server to compute all possible aggregate combinations:


SELECT ProductId ,ProductName, SUM(Price) AS Price
FROM #Product Group By
GROUPING SETS((ProductId, ProductName), (ProductId), (productName), ())
Result:






















We can skip certain rollup levels.  For example, we can compute the total sales by employee and year and the total sales for all employees and all years without computing any of the intermediate results:


SELECT ProductId ,ProductName, SUM(Price) AS Price
FROM #Product Group By
GROUPING SETS((ProductId, ProductName), ())
Result:






1 comment:

dharmendra_mohapatra said...


SELECT ProductId ,ProductName, SUM(Price) AS Price
FROM #Product Group By
GROUPING SETS((ProductId, ProductName), ())

gives two rows for against productid 1