Skip to content

Commit b245515

Browse files
timsaucerclaude
andcommitted
docs: add aggregate function literal detection to make-pythonic skill
Add Technique 1a to detect literal-only arguments in aggregate functions. Unlike scalar UDFs which enforce literals in invoke_with_args(), aggregate functions enforce them in accumulator() via get_scalar_value(), validate_percentile_expr(), or downcast_ref::<Literal>(). Without this technique, the skill would incorrectly classify arguments like approx_percentile_cont's percentile as Category A (Expr | float) when they should be Category B (float only). Updates the decision flow to branch on scalar vs aggregate before checking for literal enforcement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 83ebb72 commit b245515

1 file changed

Lines changed: 65 additions & 7 deletions

File tree

.ai/skills/make-pythonic/SKILL.md

Lines changed: 65 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,13 @@ Check the Rust binding in `crates/core/src/functions.rs` and the upstream DataFu
5858

5959
Check the DataFusion version in `crates/core/Cargo.toml` to find the right directory. Key subdirectories: `string/`, `datetime/`, `math/`, `regex/`.
6060

61-
There are three concrete techniques to check, in order of signal strength:
61+
For **aggregate functions**, the upstream source is in a separate crate:
62+
63+
```
64+
~/.cargo/registry/src/index.crates.io-*/datafusion-functions-aggregate-<VERSION>/src/
65+
```
66+
67+
There are four concrete techniques to check, in order of signal strength:
6268

6369
#### Technique 1: Check `invoke_with_args()` for literal-only enforcement (strongest signal)
6470

@@ -75,6 +81,48 @@ let granularity_str = if let ColumnarValue::Scalar(ScalarValue::Utf8(Some(v))) =
7581

7682
**If you find this pattern:** The argument is **Category B** — accept only the corresponding native Python type (e.g., `str`), not `Expr`. The function will error at runtime with a column expression anyway.
7783

84+
#### Technique 1a: Check `accumulator()` for literal-only enforcement (aggregate functions)
85+
86+
Technique 1 applies to scalar UDFs. Aggregate functions do not have `invoke_with_args()` — instead, they enforce literal-only arguments in their `accumulator()` (or `create_accumulator()`) method, which runs at planning time before any data is processed.
87+
88+
Look for these patterns inside `accumulator()`:
89+
90+
- `get_scalar_value(expr)` — evaluates the expression against an empty batch and errors if it's not a scalar
91+
- `validate_percentile_expr(expr)` — specific helper used by percentile functions
92+
- `downcast_ref::<Literal>()` — checks that the physical expression is a literal constant
93+
94+
Example from `approx_percentile_cont.rs`:
95+
```rust
96+
fn accumulator(&self, args: AccumulatorArgs) -> Result<ApproxPercentileAccumulator> {
97+
let percentile =
98+
validate_percentile_expr(&args.exprs[1], "APPROX_PERCENTILE_CONT")?;
99+
// ...
100+
}
101+
```
102+
103+
Where `validate_percentile_expr` calls `get_scalar_value` and errors with `"must be a literal"`.
104+
105+
Example from `string_agg.rs`:
106+
```rust
107+
fn accumulator(&self, acc_args: AccumulatorArgs) -> Result<Box<dyn Accumulator>> {
108+
let Some(lit) = acc_args.exprs[1].as_any().downcast_ref::<Literal>() else {
109+
return not_impl_err!(
110+
"The second argument of the string_agg function must be a string literal"
111+
);
112+
};
113+
// ...
114+
}
115+
```
116+
117+
**If you find this pattern:** The argument is **Category B** — accept only the corresponding native Python type, not `Expr`. The function will error at planning time with a non-literal expression.
118+
119+
Known aggregate functions with literal-only arguments:
120+
- `approx_percentile_cont``percentile` (float), `num_centroids` (int)
121+
- `approx_percentile_cont_with_weight``percentile` (float), `num_centroids` (int)
122+
- `percentile_cont``percentile` (float)
123+
- `string_agg``delimiter` (str)
124+
- `nth_value``n` (int)
125+
78126
#### Technique 2: Check the `Signature` for data type constraints
79127

80128
Each function defines a `Signature::coercible(...)` that specifies what data types each argument accepts, using `Coercion` entries. This tells you the expected **data type** even if it doesn't enforce literal-only.
@@ -125,12 +173,22 @@ let decimal_places: Option<i32> = match args.scalar_arguments.get(1) {
125173
#### Decision flow
126174

127175
```
128-
Is argument rejected at runtime if not a literal?
129-
(check invoke_with_args for ColumnarValue::Scalar-only match + exec_err!)
130-
→ YES: Category B — accept only native type, no Expr
131-
→ NO: Does the Signature constrain it to a specific data type?
132-
→ YES: Category A — accept Expr | <native type matching the constraint>
133-
→ NO: Leave as Expr only
176+
Is the function a scalar UDF or an aggregate?
177+
Scalar UDF:
178+
Is argument rejected at runtime if not a literal?
179+
(check invoke_with_args for ColumnarValue::Scalar-only match + exec_err!)
180+
→ YES: Category B — accept only native type, no Expr
181+
→ NO: continue below
182+
Aggregate:
183+
Is argument rejected at planning time if not a literal?
184+
(check accumulator() for get_scalar_value / validate_percentile_expr /
185+
downcast_ref::<Literal>() + error)
186+
→ YES: Category B — accept only native type, no Expr
187+
→ NO: continue below
188+
189+
Does the Signature constrain it to a specific data type?
190+
→ YES: Category A — accept Expr | <native type matching the constraint>
191+
→ NO: Leave as Expr only
134192
```
135193

136194
## Coercion Categories

0 commit comments

Comments
 (0)