Skip to content

Commit c40e6a8

Browse files
jovanpop-msftVanMSFT
authored andcommitted
Refactor GROUP BY documentation for clarity
Updated section headers for consistency and clarity in the documentation regarding the GROUP BY clause in Transact-SQL. Added details about support for ISO and ANSI SQL-2006 features. Revise GROUP BY options in T-SQL documentation Updated the documentation for GROUP BY options in T-SQL, including allowed and disallowed statements, and added details on hierarchical aggregation and multidimensional summarization. Update documentation for GROUP BY syntax in T-SQL Clarified syntax for Analytics Platform System and updated explanation for WITH (DISTRIBUTED_AGG). Revise GROUP BY ROLLUP and grouping sets documentation Updated the documentation for GROUP BY ROLLUP and clarified restrictions on grouping sets in Transact-SQL. Clarify non-aggregate and compatibility terms Update documentation for GROUP BY syntax Update docs/t-sql/queries/select-group-by-transact-sql.md Co-authored-by: Randolph West MSFT <97149825+rwestMSFT@users.noreply.github.com> Update docs/t-sql/queries/select-group-by-transact-sql.md Co-authored-by: Randolph West MSFT <97149825+rwestMSFT@users.noreply.github.com> Update docs/t-sql/queries/select-group-by-transact-sql.md Co-authored-by: Randolph West MSFT <97149825+rwestMSFT@users.noreply.github.com> Update docs/t-sql/queries/select-group-by-transact-sql.md Co-authored-by: Randolph West MSFT <97149825+rwestMSFT@users.noreply.github.com> Update docs/t-sql/queries/select-group-by-transact-sql.md Co-authored-by: Randolph West MSFT <97149825+rwestMSFT@users.noreply.github.com> Update docs/t-sql/queries/select-group-by-transact-sql.md Co-authored-by: Randolph West MSFT <97149825+rwestMSFT@users.noreply.github.com> Update docs/t-sql/queries/select-group-by-transact-sql.md Co-authored-by: Randolph West MSFT <97149825+rwestMSFT@users.noreply.github.com> Apply suggestion from @rwestMSFT Co-authored-by: Randolph West MSFT <97149825+rwestMSFT@users.noreply.github.com> Fix formatting in GROUP BY restrictions section Apply suggestions from code review Co-authored-by: Van To <40007119+VanMSFT@users.noreply.github.com>
1 parent bf148ba commit c40e6a8

1 file changed

Lines changed: 88 additions & 68 deletions

File tree

docs/t-sql/queries/select-group-by-transact-sql.md

Lines changed: 88 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ GROUP BY {
9090
} [ , ...n ]
9191
```
9292

93-
Syntax for Analytics Platform System (PDW):
93+
Syntax for Analytics Platform System/Paralel Data Warehouse (APS/PDW):
9494

9595
```syntaxsql
9696
GROUP BY {
@@ -103,55 +103,32 @@ GROUP BY {
103103

104104
### *column-expression*
105105

106-
Specifies a column or a non-aggregate calculation on a column. This column can belong to a table, derived table, or view. The column must appear in the `FROM` clause of the `SELECT` statement, but doesn't need to appear in the `SELECT` list.
106+
Specifies a column or a nonaggregate calculation on a column. This column can belong to a table, derived table, or view. The column must appear in the `FROM` clause of the `SELECT` statement, but doesn't need to appear in the `SELECT` list.
107107

108108
For valid expressions, see [expression](../language-elements/expressions-transact-sql.md).
109109

110110
The column must appear in the `FROM` clause of the `SELECT` statement, but isn't required to appear in the `SELECT` list. However, each table or view column in any nonaggregate expression in the `<select>` list must be included in the `GROUP BY` list.
111111

112-
The following statements are allowed:
113-
114-
```sql
115-
SELECT ColumnA,
116-
ColumnB
117-
FROM T
118-
GROUP BY ColumnA, ColumnB;
119-
120-
SELECT ColumnA + ColumnB
121-
FROM T
122-
GROUP BY ColumnA, ColumnB;
123-
124-
SELECT ColumnA + ColumnB
125-
FROM T
126-
GROUP BY ColumnA + ColumnB;
127-
128-
SELECT ColumnA + ColumnB + constant
129-
FROM T
130-
GROUP BY ColumnA, ColumnB;
131-
```
132-
133-
The following statements aren't allowed:
134-
135-
```sql
136-
SELECT ColumnA,
137-
ColumnB
138-
FROM T
139-
GROUP BY ColumnA + ColumnB;
140-
141-
SELECT ColumnA + constant + ColumnB
142-
FROM T
143-
GROUP BY ColumnA + ColumnB;
144-
```
112+
### GROUP BY options
145113

146-
The column expression can't contain:
114+
The following options extend the basic `GROUP BY` clause to support hierarchical aggregation, multidimensional summarization, custom grouping combinations, and platform‑specific execution behaviors. These options allow queries to produce subtotals and grand totals in a single logical operation.
147115

148-
- A column alias that you define in the `SELECT` list. It can use a column alias for a derived table that is defined in the `FROM` clause.
149-
- A column of type **text**, **ntext**, or **image**. However, you can use a column of text, ntext, or image as an argument to a function that returns a value of a valid data type. For example, the expression can use `SUBSTRING()` and `CAST()`. This rule also applies to expressions in the `HAVING` clause.
150-
- xml data type methods. It can include a user-defined function that uses xml data type methods. It can include a computed column that uses xml data type methods.
151-
- A subquery. Error 144 is returned.
152-
- A column from an indexed view.
116+
- **ROLLUP ( <group_by_expression> [ , ...n ] )**
117+
Generates hierarchical subtotals for the listed columns and a final grand total (for example, `(a,b,c)`, `(a,b)`, `(a)`, `()`). Use for drill‑up reports like **year** > **quarter** > **month**.
118+
- **CUBE ( <group_by_expression> [ , ...n ] )**
119+
Produces all combinations of the specified columns (the full 2^n lattice) plus the grand total. Best suited for multi‑dimensional analysis across every slice.
120+
- **GROUPING SETS ( <grouping_set> [ , ...n ] )**
121+
Defines the exact groupings to compute (including `()` for grand total) in one pass; functionally similar to a `UNION ALL` of multiple `GROUP BY` queries but optimized together.
122+
- **`()` (empty grouping set)**
123+
Shorthand for computing only the **grand total** across all rows—used alone as `GROUP BY ()` or inside `GROUPING SETS`.
124+
- **ALL column-expression [ , ...n ]** *(non‑ISO; backward compatibility)*
125+
Shorthand to group by all nonaggregated select items. Retained for compatibility; availability and semantics vary.
126+
- **column-expression [ , ...n ] WITH { CUBE | ROLLUP }** *(legacy form)*
127+
Older, non‑ISO syntax equivalent to `GROUP BY CUBE(...)` or `GROUP BY ROLLUP(...)`. Supported for backward compatibility; use the ISO subclauses when possible.
128+
- **WITH (DISTRIBUTED_AGG)**
129+
Hints distributed execution for aggregations when grouping by a single column. It's supported only in Azure Synapse Analytics dedicated SQL pools and Analytics Platform System/Parallel Data Warehouse (APS/PDW).
153130

154-
### GROUP BY *column-expression* [ ,...n ]
131+
## GROUP BY *column-expression* [ ,...n ]
155132

156133
Groups the `SELECT` statement results according to the values in a list of one or more column expressions.
157134

@@ -199,9 +176,51 @@ The query result has three rows since there are three combinations of values for
199176
| Canada | British Columbia | 500 |
200177
| United States | Montana | 100 |
201178

202-
### GROUP BY ROLLUP
179+
The column expression in `GROUP BY` can't contain:
180+
181+
- A column alias that you define in the `SELECT` list. It can use a column alias for a derived table that's defined in the `FROM` clause.
182+
- A column of type **text**, **ntext**, or **image**. However, you can use a column of **text**, **ntext**, or **image** as an argument to a function that returns a value of a valid data type. For example, the expression can use `SUBSTRING()` and `CAST()`. This rule also applies to expressions in the `HAVING` clause.
183+
- **xml** data type methods. It can include a user-defined function that uses **xml** data type methods. It can include a computed column that uses **xml** data type methods.
184+
- A subquery. The query returns error 144.
185+
- A column from an indexed view.
186+
187+
The following statements are allowed:
188+
189+
```sql
190+
SELECT ColumnA,
191+
ColumnB
192+
FROM T
193+
GROUP BY ColumnA, ColumnB;
194+
195+
SELECT ColumnA + ColumnB
196+
FROM T
197+
GROUP BY ColumnA, ColumnB;
198+
199+
SELECT ColumnA + ColumnB
200+
FROM T
201+
GROUP BY ColumnA + ColumnB;
202+
203+
SELECT ColumnA + ColumnB + constant
204+
FROM T
205+
GROUP BY ColumnA, ColumnB;
206+
```
207+
208+
The following statements aren't allowed:
209+
210+
```sql
211+
SELECT ColumnA,
212+
ColumnB
213+
FROM T
214+
GROUP BY ColumnA + ColumnB;
215+
216+
SELECT ColumnA + constant + ColumnB
217+
FROM T
218+
GROUP BY ColumnA + ColumnB;
219+
```
220+
221+
## GROUP BY ROLLUP ()
203222

204-
Creates a group for each combination of column expressions. In addition, it "rolls up" the results into subtotals and grand totals. To do this, it moves from right to left, decreasing the number of column expressions over which it creates groups and the aggregations.
223+
Creates a group for each combination of column expressions. In addition, it *rolls up* the results into subtotals and grand totals. While creating the groups, it moves from right to left, decreasing the number of column expressions over which it creates groups and the aggregations.
205224

206225
The column order affects the `ROLLUP` output and can affect the number of rows in the result set.
207226

@@ -211,7 +230,7 @@ For example, `GROUP BY ROLLUP (col1, col2, col3, col4)` creates groups for each
211230
- col1, col2, col3, NULL
212231
- col1, col2, NULL, NULL
213232
- col1, NULL, NULL, NULL
214-
- NULL, NULL, NULL, NULL (This is the grand total)
233+
- NULL, NULL, NULL, NULL (The group with the NULL values is the grand total)
215234

216235
Using the table from the previous example, this code runs a `GROUP BY ROLLUP` operation instead of a simple `GROUP BY`.
217236

@@ -234,7 +253,7 @@ The query result has the same aggregations as the simple `GROUP BY` without the
234253
| United States | NULL | 100 |
235254
| NULL | NULL | 700 |
236255

237-
### GROUP BY CUBE ()
256+
## GROUP BY CUBE ()
238257

239258
`GROUP BY CUBE` creates groups for all possible combinations of columns. For `GROUP BY CUBE (a, b)`, the results have groups for unique values of `(a, b)`, `(NULL, b)`, `(a, NULL)`, and `(NULL, NULL)`.
240259

@@ -262,7 +281,7 @@ The query result has groups for unique values of `(Region, Territory)`, `(NULL,
262281
| Canada | NULL | 600 |
263282
| United States | NULL | 100 |
264283

265-
### GROUP BY GROUPING SETS ()
284+
## GROUP BY GROUPING SETS ()
266285

267286
The `GROUPING SETS` option combines multiple `GROUP BY` clauses into one `GROUP BY` clause. The results are the same as using `UNION ALL` on the specified groups.
268287

@@ -296,6 +315,14 @@ GROUP BY CUBE(Region, Territory);
296315

297316
SQL doesn't consolidate duplicate groups generated for a `GROUPING SETS` list. For example, in `GROUP BY ((), CUBE (Region, Territory))`, both elements return a row for the grand total, and both rows appear in the results.
298317

318+
### Support for ISO and ANSI SQL-2006 GROUP BY features
319+
320+
The `GROUP BY` clause supports all `GROUP BY` features that are included in the SQL-2006 standard with the following syntax exceptions:
321+
322+
- Grouping sets aren't allowed in the `GROUP BY` clause unless they're part of an explicit `GROUPING SETS` list. For example, `GROUP BY Column1, (Column2, ...ColumnN)` is allowed in the standard but not in Transact-SQL. Transact-SQL supports `GROUP BY C1, GROUPING SETS ((Column2, ...ColumnN))` and `GROUP BY Column1, Column2, ... ColumnN`, which are semantically equivalent. These clauses are semantically equivalent to the previous `GROUP BY` example. This restriction avoids the possibility that `GROUP BY Column1, (Column2, ...ColumnN)` could be misinterpreted as `GROUP BY C1, GROUPING SETS ((Column2, ...ColumnN))`, which aren't semantically equivalent.
323+
324+
- Grouping sets aren't allowed inside grouping sets. For example, `GROUP BY GROUPING SETS (A1, A2,...An, GROUPING SETS (C1, C2, ...Cn))` is allowed in the SQL-2006 standard but not in Transact-SQL. Transact-SQL allows `GROUP BY GROUPING SETS( A1, A2,...An, C1, C2, ...Cn)` or `GROUP BY GROUPING SETS( (A1), (A2), ... (An), (C1), (C2), ... (Cn))`, which are semantically equivalent to the first `GROUP BY` example and have clearer syntax.
325+
299326
### GROUP BY ()
300327

301328
Specifies the empty group, which generates the grand total. This group is useful as one of the elements of a `GROUPING SET`. For example, this statement gives the total sales for each region and then gives the grand total for all regions.
@@ -307,7 +334,7 @@ FROM Sales
307334
GROUP BY GROUPING SETS(Region, ());
308335
```
309336

310-
### GROUP BY ALL column-expression [ ,...n ]
337+
## GROUP BY ALL column-expression [ ,...n ]
311338

312339
**Applies to**: SQL Server and Azure SQL Database
313340

@@ -321,21 +348,26 @@ Specifies whether to include all groups in the results, regardless of whether th
321348
- Isn't supported in queries that access remote tables if there's also a `WHERE` clause in the query.
322349
- Fails on columns that have the FILESTREAM attribute.
323350

324-
### GROUP BY column-expression [ ,...n ] WITH { CUBE | ROLLUP }
351+
### Support for ISO and ANSI SQL-2006 GROUP BY Features
352+
353+
The `GROUP BY` clause supports all `GROUP BY` features that are included in the SQL-2006 standard with the following syntax exceptions:
354+
- `GROUP BY ALL` and `GROUP BY DISTINCT` are only allowed in a simple `GROUP BY` clause that contains column expressions. You can't use them with the `GROUPING SETS`, `ROLLUP`, `CUBE`, `WITH CUBE`, or `WITH ROLLUP` constructs. `ALL` is the default and is implicit. It's also only allowed in the backward compatible syntax.
355+
356+
## GROUP BY column-expression [ ,...n ] WITH { CUBE | ROLLUP }
325357

326358
**Applies to**: SQL Server and Azure SQL Database
327359

328360
> [!NOTE]
329361
> Use this syntax only for backward compatibility. Avoid using this syntax in new development work, and plan to modify applications that currently use this syntax.
330362
331-
### WITH (DISTRIBUTED_AGG)
363+
## WITH (DISTRIBUTED_AGG)
332364

333365
**Applies to**: [!INCLUDE [ssazuresynapse-md](../../includes/ssazuresynapse-md.md)] and [!INCLUDE [ssPDW](../../includes/sspdw-md.md)]
334366

335367
The `DISTRIBUTED_AGG` query hint forces the massively parallel processing (MPP) system to redistribute a table on a specific column before performing an aggregation. Only one column in the `GROUP BY` clause can have a `DISTRIBUTED_AGG` query hint. After the query finishes, the redistributed table is dropped. The original table isn't changed.
336368

337369
> [!NOTE]
338-
> The `DISTRIBUTED_AGG` query hint is provided for backward compatibility with earlier [!INCLUDE [ssPDW](../../includes/sspdw-md.md)] versions and doesn't improve performance for most queries. By default, MPP already redistributes data as necessary to improve performance for aggregations.
370+
> The `DISTRIBUTED_AGG` query hint provides backward compatibility with earlier [!INCLUDE [ssPDW](../../includes/sspdw-md.md)] versions and doesn't improve performance for most queries. By default, MPP already redistributes data as necessary to improve performance for aggregations.
339371
340372
## Remarks
341373

@@ -362,12 +394,10 @@ The `DISTRIBUTED_AGG` query hint forces the massively parallel processing (MPP)
362394

363395
- If a grouping column contains `NULL` values, all `NULL` values are considered equal, and they're collected into a single group.
364396

365-
## Limitations
397+
### Limitations
366398

367399
**Applies to**: SQL Server and [!INCLUDE [ssazuresynapse-md](../../includes/ssazuresynapse-md.md)]
368400

369-
### Maximum capacity
370-
371401
For a `GROUP BY` clause that uses `ROLLUP`, `CUBE`, or `GROUPING SETS`, the maximum number of expressions is 32. The maximum number of groups is 4,096 (2<sup>12</sup>). The following examples fail because the `GROUP BY` clause has more than 4,096 groups.
372402

373403
- The following example generates 4,097 (2<sup>12</sup> + 1) grouping sets and then fails.
@@ -389,17 +419,7 @@ For a `GROUP BY` clause that uses `ROLLUP`, `CUBE`, or `GROUPING SETS`, the maxi
389419
GROUP BY a1, ..., a13 WITH CUBE
390420
```
391421

392-
For backward compatible `GROUP BY` clauses that don't contain `CUBE` or `ROLLUP`, the number of `GROUP BY` items is limited by the `GROUP BY` column sizes, the aggregated columns, and the aggregate values involved in the query. This limit originates from the limit of 8,060 bytes on the intermediate worktable that holds intermediate query results. A maximum of 12 grouping expressions is permitted when `CUBE` or `ROLLUP` is specified.
393-
394-
### Support for ISO and ANSI SQL-2006 GROUP BY Features
395-
396-
The `GROUP BY` clause supports all `GROUP BY` features that are included in the SQL-2006 standard with the following syntax exceptions:
397-
398-
- Grouping sets aren't allowed in the `GROUP BY` clause unless they're part of an explicit `GROUPING SETS` list. For example, `GROUP BY Column1, (Column2, ...ColumnN)` is allowed in the standard but not in Transact-SQL. Transact-SQL supports `GROUP BY C1, GROUPING SETS ((Column2, ...ColumnN))` and `GROUP BY Column1, Column2, ... ColumnN`, which are semantically equivalent. These clauses are semantically equivalent to the previous `GROUP BY` example. This restriction avoids the possibility that `GROUP BY Column1, (Column2, ...ColumnN)` might be misinterpreted as `GROUP BY C1, GROUPING SETS ((Column2, ...ColumnN))`, which aren't semantically equivalent.
399-
400-
- Grouping sets aren't allowed inside grouping sets. For example, `GROUP BY GROUPING SETS (A1, A2,...An, GROUPING SETS (C1, C2, ...Cn))` is allowed in the SQL-2006 standard but not in Transact-SQL. Transact-SQL allows `GROUP BY GROUPING SETS( A1, A2,...An, C1, C2, ...Cn)` or `GROUP BY GROUPING SETS( (A1), (A2), ... (An), (C1), (C2), ... (Cn))`, which are semantically equivalent to the first `GROUP BY` example and have clearer syntax.
401-
402-
- `GROUP BY ALL` and `GROUP BY DISTINCT` are only allowed in a simple `GROUP BY` clause that contains column expressions. You can't use them with the `GROUPING SETS`, `ROLLUP`, `CUBE`, `WITH CUBE`, or `WITH ROLLUP` constructs. `ALL` is the default and is implicit. It's also only allowed in the backward compatible syntax.
422+
For backward compatible `GROUP BY` clauses that don't contain `CUBE` or `ROLLUP`, the `GROUP BY` column sizes, the aggregated columns, and the aggregate values involved in the query limit the number of `GROUP BY` items. This limit originates from the limit of 8,060 bytes on the intermediate worktable that holds intermediate query results. You can use a maximum of 12 grouping expressions when you specify `CUBE` or `ROLLUP`.
403423

404424
### Comparison of supported `GROUP BY` features
405425

@@ -408,7 +428,7 @@ The following table describes the `GROUP BY` features that different SQL Server
408428
| Feature | SQL Server Integration Services | SQL Server compatibility level 100 or higher |
409429
| --- | --- | --- |
410430
| `DISTINCT` aggregates | Not supported for `WITH CUBE` or `WITH ROLLUP`. | Supported for `WITH CUBE`, `WITH ROLLUP`, `GROUPING SETS`, `CUBE`, or `ROLLUP`. |
411-
| User-defined function with `CUBE` or `ROLLUP` name in the `GROUP BY` clause | User-defined function `dbo.cube(<arg1>, ...<argN>)` or `dbo.rollup(<arg1>, ...<argN>)` in the `GROUP BY` clause is allowed.<br /><br />For example: `SELECT SUM (x) FROM T GROUP BY dbo.cube(y);` | User-defined function `dbo.cube (<arg1>, ...<argN>)` or `dbo.rollup(arg1>, ...<argN>)` in the `GROUP BY` clause isn't allowed.<br /><br />For example: `SELECT SUM (x) FROM T GROUP BY dbo.cube(y);`<br /><br />The following error message is returned: "Incorrect syntax near the keyword 'cube'&#124;'rollup'."<br /><br />To avoid this problem, replace `dbo.cube` with `[dbo].[cube]` or `dbo.rollup` with `[dbo].[rollup]`.<br /><br />The following example is allowed: `SELECT SUM (x) FROM T GROUP BY [dbo].[cube](y);` |
431+
| User-defined function with `CUBE` or `ROLLUP` name in the `GROUP BY` clause | User-defined function `dbo.cube(<arg1>, ...<argN>)` or `dbo.rollup(<arg1>, ...<argN>)` in the `GROUP BY` clause is allowed.<br /><br />For example: `SELECT SUM (x) FROM T GROUP BY dbo.cube(y);` | User-defined function `dbo.cube (<arg1>, ...<argN>)` or `dbo.rollup(arg1>, ...<argN>)` in the `GROUP BY` clause isn't allowed.<br /><br />For example: `SELECT SUM (x) FROM T GROUP BY dbo.cube(y);`<br /><br />SQL Server returns the following error message: "Incorrect syntax near the keyword 'cube'&#124;'rollup'."<br /><br />To avoid this problem, replace `dbo.cube` with `[dbo].[cube]` or `dbo.rollup` with `[dbo].[rollup]`.<br /><br />The following example is allowed: `SELECT SUM (x) FROM T GROUP BY [dbo].[cube](y);` |
412432
| `GROUPING SETS` | Not supported | Supported |
413433
| `CUBE` | Not supported | Supported |
414434
| `ROLLUP` | Not supported | Supported |
@@ -472,11 +492,11 @@ HAVING DATEPART(yyyy, OrderDate) >= N'2003'
472492
ORDER BY DATEPART(yyyy, OrderDate);
473493
```
474494

475-
## Examples: Azure Synapse Analytics and Analytics Platform System (PDW)
495+
## Examples: Azure Synapse Analytics and Analytics Platform System / Parallel Data Warehouse (PDW)
476496

477497
### E. Basic use of the GROUP BY clause
478498

479-
The following example finds the total amount for all sales on each day. One row containing the sum of all sales is returned for each day.
499+
The following example finds the total amount for all sales on each day. The query returns one row containing the sum of all sales for each day.
480500

481501
```sql
482502
-- Uses AdventureWorksDW

0 commit comments

Comments
 (0)