Skip to content

Commit 10d1394

Browse files
committed
Fix: Make VECTOR_SEARCH TOP_N parameter optional and add WITH (FORCE_ANN_ONLY) support
## Summary Fixes ScriptDOM parser to correctly handle optional `TOP_N` parameter and `WITH (FORCE_ANN_ONLY)` query hint in `VECTOR_SEARCH` syntax, aligning with SQL Server 2025 behavior. ## Changes **Grammar (TSql170.g)** - Made `TOP_N` parameter optional - Added `WITH (FORCE_ANN_ONLY)` clause parsing **AST & Script Generator** - Added `ForceAnnOnly` boolean to `VectorSearchTableReference` - Updated script generator for nullable `TOP_N` and `WITH` clause output - Full round-trip fidelity maintained **Tests** - Added `VectorSearchOptionalTopNTests170.sql` (4 test scenarios) - Removed obsolete error test expecting mandatory `TOP_N` ## Example Syntax Now Supported ```sql -- Without TOP_N SELECT * FROM VECTOR_SEARCH(...) AS ann -- With query hint SELECT * FROM VECTOR_SEARCH(...) WITH (FORCE_ANN_ONLY) AS ann -- With TOP_N (backward compatible) SELECT * FROM VECTOR_SEARCH(..., TOP_N = 10) AS ann ``` Backward compatible - No breaking changes.
1 parent af6556c commit 10d1394

File tree

10 files changed

+205
-28
lines changed

10 files changed

+205
-28
lines changed

.github/instructions/testing.guidelines.instructions.md

Lines changed: 56 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -321,6 +321,25 @@ Expected: 'SELECT JSON_ARRAY('value1', 'value2');'
321321

322322
**Solution**: Copy the "Actual" output to your baseline file (note spacing differences).
323323

324+
**CRITICAL CHECK**: After copying baseline, compare it against your test script:
325+
```sql
326+
-- Input script (TestScripts/MyTest.sql)
327+
SELECT * FROM FUNC() WITH (HINT);
328+
329+
-- Generated baseline (Baselines170/MyTest.sql)
330+
SELECT *
331+
FROM FUNC();
332+
-- ⚠️ WHERE IS 'WITH (HINT)'?
333+
```
334+
335+
**If baseline is missing syntax from input:**
336+
1. **This is likely a BUG** - not just formatting difference
337+
2. Check if AST has member to store the missing syntax
338+
3. Verify grammar stores value: `vResult.PropertyName = vValue;`
339+
4. Check script generator outputs the value: `GenerateFragmentIfNotNull(node.Property)`
340+
5. If syntax should be preserved, add AST storage and script generation
341+
6. Document in spec if intentional omission (e.g., query optimizer hints)
342+
324343
#### 2. Error Count Mismatch
325344
```
326345
TestYourFeature.sql: number of errors after parsing is different from expected.
@@ -800,6 +819,38 @@ new ParserTest160("RegressionBugFix12345Tests160.sql", nErrors150: 1), // Bug e
800819
new ParserTest170("RegressionBugFix12345Tests170.sql", nErrors160: 1), // Bug existed in SQL 2022
801820
```
802821

822+
## Round-Trip Fidelity Validation Checklist
823+
824+
**CRITICAL: After generating baseline files, ALWAYS verify:**
825+
826+
**Input Preservation Check**:
827+
1. Open test script side-by-side with baseline file
828+
2. For each SQL statement, verify baseline preserves all syntax from input
829+
3. Check optional clauses: `WITH`, `WHERE`, `HAVING`, `ORDER BY`, etc.
830+
4. Check hints: Table hints, query hints, join hints
831+
5. Check keywords: All keywords from input should appear in baseline (unless documented normalization)
832+
833+
**Missing Syntax Investigation**:
834+
If baseline omits syntax from input:
835+
- [ ] Is this intentional keyword normalization? (e.g., APPROX → APPROXIMATE)
836+
- [ ] Is this a query optimizer hint that doesn't need preservation?
837+
- [ ] Is this a BUG where AST doesn't store the value?
838+
839+
**Bug Indicators**:
840+
- ❌ Input: `FUNCTION() WITH (HINT)` → Baseline: `FUNCTION()` = **LIKELY BUG**
841+
- ❌ Input: `SELECT ... ORDER BY col` → Baseline: `SELECT ...` = **BUG**
842+
- ✅ Input: `FETCH APPROX` → Baseline: `FETCH APPROXIMATE` = Acceptable normalization
843+
- ✅ Input: `SELECT /*+ HINT */` → Baseline: `SELECT` = Query hint (document in spec)
844+
845+
**Resolution Steps**:
846+
1. Check AST definition in `Ast.xml` for member to store value
847+
2. Verify grammar assigns value: `vResult.Property = vValue;`
848+
3. Check script generator outputs value: `if (node.Property != null) { ... }`
849+
4. If missing: Add AST member, update grammar, update script generator, rebuild
850+
5. Document decision in spec if intentional omission
851+
852+
---
853+
803854
## Summary
804855

805856
The SqlScriptDOM testing framework provides comprehensive validation of parser functionality through:
@@ -808,7 +859,11 @@ The SqlScriptDOM testing framework provides comprehensive validation of parser f
808859
- **Cross-version validation** (Test syntax across SQL Server versions)
809860
- **Error condition testing** (Invalid syntax produces expected errors)
810861
- **Exact syntax verification** (Exact T-SQL from user requests is tested precisely)
862+
- **Round-trip fidelity validation** (Baseline preserves all input syntax unless documented)
811863

812864
Following these guidelines ensures robust test coverage for parser functionality and prevents regressions when adding new features or fixing bugs.
813865

814-
**Key Principle**: Always test the exact T-SQL syntax provided in user prompts or requests to verify that the specific syntax works as expected, rather than testing generalized or simplified versions of the syntax.
866+
**Key Principles**:
867+
1. Always test the exact T-SQL syntax provided in user prompts or requests
868+
2. Always verify baseline output preserves input syntax (missing syntax may indicate bugs)
869+
3. Document any intentional omissions (normalization, query hints) in spec

.gitignore

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -359,4 +359,15 @@ out/
359359
.packages/
360360

361361
# Temporary build artifacts
362-
tmp/
362+
tmp/
363+
364+
# Speckit files
365+
speckit.files
366+
.github/.specify/
367+
.github/agents/speckit.*.agent.md
368+
.github/prompts/speckit.*
369+
370+
# Specs directory - ignore all files except spec.md
371+
specs/**/*
372+
!specs/**/
373+
!specs/**/spec.md

SqlScriptDom/Parser/TSql/Ast.xml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4808,5 +4808,6 @@
48084808
<Member Name="SimilarTo" Type="ScalarExpression" Summary="The vector used for search." />
48094809
<Member Name="Metric" Type="StringLiteral" Summary="The distance metric to use for the search." />
48104810
<Member Name="TopN" Type="ScalarExpression" Summary="The maximum number of similar vectors that must be returned." />
4811+
<Member Name="ForceAnnOnly" Type="bool" Summary="Whether the WITH (FORCE_ANN_ONLY) hint is specified." />
48114812
</Class>
48124813
</Types>

SqlScriptDom/Parser/TSql/CodeGenerationSupporter.cs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@ internal static class CodeGenerationSupporter
7575
internal const string AnsiWarnings = "ANSI_WARNINGS";
7676
internal const string ForcePlan = "FORCEPLAN";
7777
internal const string ForAppend = "FOR_APPEND";
78+
internal const string ForceAnnOnly = "FORCE_ANN_ONLY";
7879
internal const string ShowPlanAll = "SHOWPLAN_ALL";
7980
internal const string ShowPlanText = "SHOWPLAN_TEXT";
8081
internal const string IO = "IO";

SqlScriptDom/Parser/TSql/TSql170.g

Lines changed: 26 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -19321,19 +19321,35 @@ vectorSearchTableReference returns [VectorSearchTableReference vResult = Fragmen
1932119321
MatchString(vMetric, CodeGenerationSupporter.Cosine, CodeGenerationSupporter.Dot, CodeGenerationSupporter.Euclidean);
1932219322
vResult.Metric = vMetric;
1932319323
}
19324-
Comma tTopN:Identifier EqualsSign vTopN = signedIntegerOrVariableOrColumnReference
19325-
{
19326-
Match(tTopN, CodeGenerationSupporter.TopN);
19327-
19328-
// Validate that TOP_N is not a negative number
19329-
if (vTopN is UnaryExpression unaryExpr && unaryExpr.UnaryExpressionType == UnaryExpressionType.Negative)
19324+
// TOP_N is optional per SQL Server 2025 (commit 12d3e8fc)
19325+
(
19326+
Comma tTopN:Identifier EqualsSign vTopN = signedIntegerOrVariableOrColumnReference
1933019327
{
19331-
ThrowParseErrorException("SQL46010", unaryExpr, TSqlParserResource.SQL46010Message, "-");
19328+
Match(tTopN, CodeGenerationSupporter.TopN);
19329+
19330+
// Validate that TOP_N is not a negative number
19331+
if (vTopN is UnaryExpression unaryExpr && unaryExpr.UnaryExpressionType == UnaryExpressionType.Negative)
19332+
{
19333+
ThrowParseErrorException("SQL46010", unaryExpr, TSqlParserResource.SQL46010Message, "-");
19334+
}
19335+
19336+
vResult.TopN = vTopN;
1933219337
}
19333-
19334-
vResult.TopN = vTopN;
19338+
)?
19339+
tRParen:RightParenthesis
19340+
{
19341+
UpdateTokenInfo(vResult, tRParen);
1933519342
}
19336-
RightParenthesis simpleTableReferenceAliasOpt[vResult]
19343+
// WITH clause per SQL Server 2025 (commit 12d3e8fc)
19344+
(
19345+
With LeftParenthesis tForceAnnOnly:Identifier tRParen2:RightParenthesis
19346+
{
19347+
Match(tForceAnnOnly, CodeGenerationSupporter.ForceAnnOnly);
19348+
UpdateTokenInfo(vResult, tRParen2);
19349+
vResult.ForceAnnOnly = true;
19350+
}
19351+
)?
19352+
simpleTableReferenceAliasOpt[vResult]
1933719353
;
1933819354

1933919355
predictTableReference[SubDmlFlags subDmlFlags] returns [PredictTableReference vResult]

SqlScriptDom/ScriptDom/SqlServer/ScriptGenerator/SqlScriptGeneratorVisitor.VectorSearchTableReference.cs

Lines changed: 22 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,9 @@ partial class SqlScriptGeneratorVisitor
1515
/// TABLE = object[AS source_table_alias],
1616
/// COLUMN = vector_column,
1717
/// SIMILAR_TO = query_vector,
18-
/// METRIC = { 'cosine' | 'dot' | 'euclidean' },
19-
/// TOP_N = k
20-
/// ) [AS result_table_alias]
18+
/// METRIC = { 'cosine' | 'dot' | 'euclidean' }
19+
/// [, TOP_N = k]
20+
/// ) [WITH (FORCE_ANN_ONLY)] [AS result_table_alias]
2121
/// </summary>
2222
public override void ExplicitVisit(VectorSearchTableReference node)
2323
{
@@ -41,13 +41,28 @@ public override void ExplicitVisit(VectorSearchTableReference node)
4141

4242
NewLineAndIndent();
4343
GenerateNameEqualsValue(CodeGenerationSupporter.Metric, node.Metric);
44-
GenerateSymbol(TSqlTokenType.Comma);
45-
46-
NewLineAndIndent();
47-
GenerateNameEqualsValue(CodeGenerationSupporter.TopN, node.TopN);
44+
45+
// TOP_N is optional per SQL Server 2025 (commit 12d3e8fc)
46+
if (node.TopN != null)
47+
{
48+
GenerateSymbol(TSqlTokenType.Comma);
49+
NewLineAndIndent();
50+
GenerateNameEqualsValue(CodeGenerationSupporter.TopN, node.TopN);
51+
}
4852

4953
NewLine();
5054
GenerateSymbol(TSqlTokenType.RightParenthesis);
55+
56+
// WITH (FORCE_ANN_ONLY) hint per SQL Server 2025 (commit 12d3e8fc)
57+
if (node.ForceAnnOnly)
58+
{
59+
GenerateSpaceAndKeyword(TSqlTokenType.With);
60+
GenerateSpace();
61+
GenerateSymbol(TSqlTokenType.LeftParenthesis);
62+
GenerateIdentifier(CodeGenerationSupporter.ForceAnnOnly);
63+
GenerateSymbol(TSqlTokenType.RightParenthesis);
64+
}
65+
5166
GenerateSpaceAndAlias(node.Alias);
5267

5368
PopAlignmentPoint();
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
SELECT *
2+
FROM VECTOR_SEARCH(
3+
TABLE = graphnode,
4+
COLUMN = embedding,
5+
SIMILAR_TO = @qembedding,
6+
METRIC = 'euclidean'
7+
);
8+
9+
SELECT *
10+
FROM VECTOR_SEARCH(
11+
TABLE = graphnode,
12+
COLUMN = embedding,
13+
SIMILAR_TO = @qembedding,
14+
METRIC = 'euclidean'
15+
) WITH (FORCE_ANN_ONLY);
16+
17+
SELECT *
18+
FROM VECTOR_SEARCH(
19+
TABLE = graphnode,
20+
COLUMN = embedding,
21+
SIMILAR_TO = @qembedding,
22+
METRIC = 'euclidean',
23+
TOP_N = 20
24+
);
25+
26+
SELECT *
27+
FROM VECTOR_SEARCH(
28+
TABLE = graphnode,
29+
COLUMN = embedding,
30+
SIMILAR_TO = @qembedding,
31+
METRIC = 'euclidean',
32+
TOP_N = 20
33+
) WITH (FORCE_ANN_ONLY);

Test/SqlDom/Only170SyntaxTests.cs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ public partial class SqlDomTests
1515
new ParserTest170("RegexpTVFTests170.sql", nErrors80: 1, nErrors90: 1, nErrors100: 0, nErrors110: 0, nErrors120: 0, nErrors130: 0, nErrors140: 0, nErrors150: 0, nErrors160: 0),
1616
new ParserTest170("JsonIndexTests170.sql", nErrors80: 2, nErrors90: 10, nErrors100: 10, nErrors110: 10, nErrors120: 10, nErrors130: 10, nErrors140: 10, nErrors150: 10, nErrors160: 10),
1717
new ParserTest170("VectorIndexTests170.sql", nErrors80: 2, nErrors90: 12, nErrors100: 12, nErrors110: 12, nErrors120: 12, nErrors130: 12, nErrors140: 12, nErrors150: 12, nErrors160: 12),
18+
new ParserTest170("VectorSearchOptionalTopNTests170.sql"),
1819
new ParserTest170("AlterDatabaseManualCutoverTests170.sql", nErrors80: 4, nErrors90: 4, nErrors100: 4, nErrors110: 4, nErrors120: 4, nErrors130: 4, nErrors140: 4, nErrors150: 4, nErrors160: 4),
1920
new ParserTest170("CreateColumnStoreIndexTests170.sql", nErrors80: 3, nErrors90: 3, nErrors100: 3, nErrors110: 3, nErrors120: 3, nErrors130: 0, nErrors140: 0, nErrors150: 0, nErrors160: 0),
2021
new ParserTest170("RegexpTests170.sql", nErrors80: 0, nErrors90: 0, nErrors100: 0, nErrors110: 0, nErrors120: 0, nErrors130: 0, nErrors140: 0, nErrors150: 0, nErrors160: 0),

Test/SqlDom/ParserErrorsTests.cs

Lines changed: 20 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7059,6 +7059,7 @@ RETURNS NVARCHAR(101)
70597059
RETURN @first + ' ' + @last
70607060
END;";
70617061
ParserTestUtils.ErrorTestFabricDW(scalarFunctionSyntax2, new ParserErrorInfo(scalarFunctionSyntax2.IndexOf("INLINE"), "SQL46010", "INLINE"));
7062+
70627063
string scalarFunctionSyntax3 = @"CREATE OR ALTER FUNCTION dbo.CountProducts
70637064
(
70647065
@ProductTable AS dbo.ProductType READONLY
@@ -7097,11 +7098,22 @@ FROM @SalesData
70977098
[SqlStudioTestCategory(Category.UnitTest)]
70987099
public void IdentityColumnNegativeTestsFabricDW()
70997100
{
7100-
string identityColumnSyntax = @"CREATE TABLE TestTable1 (ID INT IDENTITY(1,1), Name VARCHAR(50));";
7101-
ParserTestUtils.ErrorTestFabricDW(identityColumnSyntax, new ParserErrorInfo(40, "SQL46010", "("));
7102-
7103-
string identityColumnSyntax2 = @"CREATE TABLE TestTable2 (RecordID BIGINT IDENTITY(100,5), Description NVARCHAR(200));";
7104-
ParserTestUtils.ErrorTestFabricDW(identityColumnSyntax2, new ParserErrorInfo(49, "SQL46010", "("));
7101+
string identityColumnSyntax = @"CREATE TABLE TestTable1 (
7102+
ID INT IDENTITY(1,1),
7103+
Name VARCHAR(50)
7104+
);
7105+
";
7106+
string token = "IDENTITY";
7107+
int errorOffSet = identityColumnSyntax.IndexOf(token) + token.Length;
7108+
ParserTestUtils.ErrorTestFabricDW(identityColumnSyntax, new ParserErrorInfo(errorOffSet, "SQL46010", "("));
7109+
7110+
string identityColumnSyntax2 = @"CREATE TABLE TestTable2 (
7111+
RecordID BIGINT IDENTITY(100,5),
7112+
Description NVARCHAR(200)
7113+
);
7114+
";
7115+
errorOffSet = identityColumnSyntax2.IndexOf(token) + token.Length;
7116+
ParserTestUtils.ErrorTestFabricDW(identityColumnSyntax2, new ParserErrorInfo(errorOffSet, "SQL46010", "("));
71057117
}
71067118

71077119
/// <summary>
@@ -7445,10 +7457,9 @@ public void VectorSearchErrorTest170()
74457457
"SELECT * FROM VECTOR_SEARCH(TABLE = tbl1, COLUMN = col1, SIMILAR_TO = query_vector)",
74467458
new ParserErrorInfo(82, "SQL46010", ")"));
74477459

7448-
// Missing required parameters: TOP_N
7449-
ParserTestUtils.ErrorTest170(
7450-
"SELECT * FROM VECTOR_SEARCH(TABLE = tbl1, COLUMN = col1, SIMILAR_TO = query_vector, METRIC = 'dot')",
7451-
new ParserErrorInfo(98, "SQL46010", ")"));
7460+
// TOP_N is now OPTIONAL per SQL Server 2025 (commit 12d3e8fc)
7461+
// The following test case has been removed as it tested for TOP_N being mandatory
7462+
// which is no longer the case. VECTOR_SEARCH now accepts queries without TOP_N.
74527463

74537464
// Invalid order: COLUMN before TABLE
74547465
ParserTestUtils.ErrorTest170(
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
-- Test 1: VECTOR_SEARCH without TOP_N (validates optional parameter)
2+
SELECT * FROM VECTOR_SEARCH(
3+
TABLE = graphnode,
4+
COLUMN = embedding,
5+
SIMILAR_TO = @qembedding,
6+
METRIC = 'euclidean'
7+
);
8+
9+
-- Test 2: VECTOR_SEARCH without TOP_N + WITH (FORCE_ANN_ONLY) (validates both changes)
10+
SELECT * FROM VECTOR_SEARCH(
11+
TABLE = graphnode,
12+
COLUMN = embedding,
13+
SIMILAR_TO = @qembedding,
14+
METRIC = 'euclidean'
15+
) WITH (FORCE_ANN_ONLY);
16+
17+
-- Test 3: VECTOR_SEARCH with TOP_N (validates backward compatibility)
18+
SELECT * FROM VECTOR_SEARCH(
19+
TABLE = graphnode,
20+
COLUMN = embedding,
21+
SIMILAR_TO = @qembedding,
22+
METRIC = 'euclidean',
23+
TOP_N = 20
24+
);
25+
26+
-- Test 4: VECTOR_SEARCH with TOP_N + WITH (FORCE_ANN_ONLY) (validates both features together)
27+
SELECT * FROM VECTOR_SEARCH(
28+
TABLE = graphnode,
29+
COLUMN = embedding,
30+
SIMILAR_TO = @qembedding,
31+
METRIC = 'euclidean',
32+
TOP_N = 20
33+
) WITH (FORCE_ANN_ONLY);

0 commit comments

Comments
 (0)