Skip to content

Commit 0465d6e

Browse files
timsaucerclaude
andcommitted
Address Copilot review feedback on AGENTS.md
- Wrap CASE/WHEN method-chain examples in parentheses and assign to a variable so they are valid Python as shown (Copilot #1, #2). - Fix INTERSECT/EXCEPT mapping: the default distinct=False corresponds to INTERSECT ALL / EXCEPT ALL, not the distinct forms. Updated both the Set Operations section and the SQL reference table to show both the ALL and distinct variants (Copilot #4). - Change write_parquet / write_csv / write_json examples to file-style paths (output.parquet, etc.) to match the convention used in existing tests and examples. Note that a directory path is also valid for partitioned output (Copilot #5). Verified INTERSECT/EXCEPT semantics with a script: df1.intersect(df2) -> [1, 1, 2] (= INTERSECT ALL) df1.intersect(df2, distinct=True) -> [1, 2] (= INTERSECT) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent c620a80 commit 0465d6e

1 file changed

Lines changed: 18 additions & 7 deletions

File tree

python/datafusion/AGENTS.md

Lines changed: 18 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -222,7 +222,9 @@ df1.union(df2) # UNION ALL (by position)
222222
df1.union(df2, distinct=True) # UNION DISTINCT
223223
df1.union_by_name(df2) # match columns by name, not position
224224
df1.intersect(df2) # INTERSECT ALL
225+
df1.intersect(df2, distinct=True) # INTERSECT (distinct)
225226
df1.except_all(df2) # EXCEPT ALL
227+
df1.except_all(df2, distinct=True) # EXCEPT (distinct)
226228
```
227229

228230
### Limit and Offset
@@ -267,11 +269,14 @@ for batch in stream:
267269
### Writing Results
268270

269271
```python
270-
df.write_parquet("output/")
271-
df.write_csv("output/")
272-
df.write_json("output/")
272+
df.write_parquet("output.parquet")
273+
df.write_csv("output.csv")
274+
df.write_json("output.json")
273275
```
274276

277+
You can also pass a directory path (e.g., `"output/"`) to write a multi-file
278+
partitioned output.
279+
275280
## Expression Building
276281

277282
### Column References and Literals
@@ -364,15 +369,19 @@ F.nullif(col("a"), lit(0)) # return NULL if a == 0
364369

365370
```python
366371
# Simple CASE (matching on a single expression)
367-
F.case(col("status"))
372+
status_label = (
373+
F.case(col("status"))
368374
.when(lit("A"), lit("Active"))
369375
.when(lit("I"), lit("Inactive"))
370376
.otherwise(lit("Unknown"))
377+
)
371378

372379
# Searched CASE (each branch has its own predicate)
373-
F.when(col("value") > lit(100), lit("high"))
380+
severity = (
381+
F.when(col("value") > lit(100), lit("high"))
374382
.when(col("value") > lit(50), lit("medium"))
375383
.otherwise(lit("low"))
384+
)
376385
```
377386

378387
### Casting
@@ -426,8 +435,10 @@ col("array_col")[1:3] # array slice (0-indexed)
426435
| `WHERE NOT EXISTS (SELECT ...)` | `a.join(b, on="key", how="anti")` |
427436
| `UNION ALL` | `df1.union(df2)` |
428437
| `UNION` (distinct) | `df1.union(df2, distinct=True)` |
429-
| `INTERSECT` | `df1.intersect(df2)` |
430-
| `EXCEPT` | `df1.except_all(df2)` |
438+
| `INTERSECT ALL` | `df1.intersect(df2)` |
439+
| `INTERSECT` (distinct) | `df1.intersect(df2, distinct=True)` |
440+
| `EXCEPT ALL` | `df1.except_all(df2)` |
441+
| `EXCEPT` (distinct) | `df1.except_all(df2, distinct=True)` |
431442
| `CASE x WHEN 1 THEN 'a' END` | `F.case(col("x")).when(lit(1), lit("a")).end()` |
432443
| `CASE WHEN x > 1 THEN 'a' END` | `F.when(col("x") > lit(1), lit("a")).end()` |
433444
| `x IN (1, 2, 3)` | `F.in_list(col("x"), [lit(1), lit(2), lit(3)])` |

0 commit comments

Comments
 (0)