Skip to content

Commit e0284c6

Browse files
timsaucerclaude
andauthored
feat: add AI skill to find and improve the Pythonic interface to functions (#1484)
* feat: accept native Python types in function arguments instead of requiring lit() Update 47 functions in functions.py to accept native Python types (int, float, str) for arguments that are contextually literals, eliminating verbose lit() wrapping. For example, users can now write split_part(col("a"), ",", 2) instead of split_part(col("a"), lit(","), lit(2)). All changes are backward compatible. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: update alias function signatures to match pythonic primary functions Update instr and position (aliases of strpos) to accept Expr | str for the substring parameter, matching the updated primary function signature. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update make-pythonic skill to require alias type hint updates Alias functions that delegate to a primary function must have their type hints updated to match, even though coercion logic is only added to the primary. Added a new Step 3 to the implementation workflow for this. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review feedback on pythonic skill and function signatures Update SKILL.md to prevent three classes of issues: clarify that float already accepts int per PEP 484 (avoiding redundant int | float that fails ruff PYI041), add backward-compat rule for Category B so existing Expr params aren't removed, and add guidance for inline coercion with many optional nullable params instead of local helpers. Replace regexp_instr's _to_raw() helper with inline coercion matching the pattern used throughout the rest of the file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: add coerce_to_expr helpers and replace inline coercion patterns Introduce coerce_to_expr() and coerce_to_expr_or_none() in expr.py as the complement to ensure_expr() — where ensure_expr rejects non-Expr values, these helpers wrap them via Expr.literal(). Replaces ~60 inline isinstance checks in functions.py with single-line helper calls, and updates the make-pythonic skill to document the new pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add aggregate function literal detection to make-pythonic skill Add Technique 1a to detect literal-only arguments in aggregate functions. Unlike scalar UDFs which enforce literals in invoke_with_args(), aggregate functions enforce them in accumulator() via get_scalar_value(), validate_percentile_expr(), or downcast_ref::<Literal>(). Without this technique, the skill would incorrectly classify arguments like approx_percentile_cont's percentile as Category A (Expr | float) when they should be Category B (float only). Updates the decision flow to branch on scalar vs aggregate before checking for literal enforcement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add window function literal detection to make-pythonic skill Add Technique 1b to detect literal-only arguments in window functions. Window functions enforce literals in partition_evaluator() via get_scalar_value_from_args() / downcast_ref::<Literal>(), not in invoke_with_args() (scalar) or accumulator() (aggregate). Updates the decision flow to branch on scalar vs aggregate vs window. Known window functions with literal-only arguments: ntile (n), lead/lag (offset, default_value), nth_value (n). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use explicit None checks, widen numeric type hints, and add tests Replace 7 fragile truthiness checks (x.expr if x else None) with explicit is not None checks to prevent silent None when zero-valued literals are passed. Widen log/power/pow type hints to Expr | int | float with noqa: PYI041 for clarity. Add unit tests for coerce_to_expr helpers and integration tests for pythonic calling conventions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: suppress FBT003 in tests and remove redundant noqa comments Add FBT003 (boolean positional value) to the per-file-ignores for python/tests/* in pyproject.toml, and remove the 6 now-redundant inline noqa: FBT003 comments across test_expr.py and test_context.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: replace static function lists with discovery instructions in skill Replace hardcoded "Known aggregate/window functions with literal-only arguments" lists with instructions to discover them dynamically by searching the upstream crate source. Keeps a few examples as validation anchors so the agent knows its search is working correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: make interrupt test reliable on Python 3.11 PyThreadState_SetAsyncExc only delivers exceptions when the thread is executing Python bytecode, not while in native (Rust/C) code. The previous test had two issues causing flakiness on Python 3.11: 1. The interrupt fired before df.collect() entered the UDF, while the thread was still in native code where async exceptions are ignored. 2. time.sleep(2.0) is a single C call where async exceptions are not checked — they're only checked between bytecode instructions. Fix by adding a threading.Event so the interrupt waits until the UDF is actually executing Python code, and by sleeping in small increments so the eval loop has opportunities to check for pending exceptions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 0357716 commit e0284c6

8 files changed

Lines changed: 893 additions & 193 deletions

File tree

.ai/skills/make-pythonic/SKILL.md

Lines changed: 430 additions & 0 deletions
Large diffs are not rendered by default.

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,7 @@ extend-allowed-calls = ["datafusion.lit", "lit"]
111111
"ARG",
112112
"BLE001",
113113
"D",
114+
"FBT003",
114115
"PD",
115116
"PLC0415",
116117
"PLR0913",

python/datafusion/expr.py

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -243,6 +243,8 @@
243243
"WindowExpr",
244244
"WindowFrame",
245245
"WindowFrameBound",
246+
"coerce_to_expr",
247+
"coerce_to_expr_or_none",
246248
"ensure_expr",
247249
"ensure_expr_list",
248250
]
@@ -255,6 +257,10 @@ def ensure_expr(value: Expr | Any) -> expr_internal.Expr:
255257
higher level APIs consistently require explicit :func:`~datafusion.col` or
256258
:func:`~datafusion.lit` expressions.
257259
260+
See Also:
261+
:func:`coerce_to_expr` — the opposite behavior: *wraps* non-``Expr``
262+
values as literals instead of rejecting them.
263+
258264
Args:
259265
value: Candidate expression or other object.
260266
@@ -299,6 +305,41 @@ def _iter(
299305
return list(_iter(exprs))
300306

301307

308+
def coerce_to_expr(value: Any) -> Expr:
309+
"""Coerce a native Python value to an ``Expr`` literal, passing ``Expr`` through.
310+
311+
This is the complement of :func:`ensure_expr`: where ``ensure_expr``
312+
*rejects* non-``Expr`` values, ``coerce_to_expr`` *wraps* them via
313+
:meth:`Expr.literal` so that functions can accept native Python types
314+
(``int``, ``float``, ``str``, ``bool``, etc.) alongside ``Expr``.
315+
316+
Args:
317+
value: An ``Expr`` instance (returned as-is) or a Python literal to wrap.
318+
319+
Returns:
320+
An ``Expr`` representing the value.
321+
"""
322+
if isinstance(value, Expr):
323+
return value
324+
return Expr.literal(value)
325+
326+
327+
def coerce_to_expr_or_none(value: Any | None) -> Expr | None:
328+
"""Coerce a value to ``Expr`` or pass ``None`` through unchanged.
329+
330+
Same as :func:`coerce_to_expr` but accepts ``None`` for optional parameters.
331+
332+
Args:
333+
value: An ``Expr`` instance, a Python literal to wrap, or ``None``.
334+
335+
Returns:
336+
An ``Expr`` representing the value, or ``None``.
337+
"""
338+
if value is None:
339+
return None
340+
return coerce_to_expr(value)
341+
342+
302343
def _to_raw_expr(value: Expr | str) -> expr_internal.Expr:
303344
"""Convert a Python expression or column name to its raw variant.
304345

0 commit comments

Comments
 (0)