Skip to content

Commit d0b5d23

Browse files
committed
Merge branch 'main' into feat/add-missing-registration-methods
2 parents 0f96ea3 + 46f9ab8 commit d0b5d23

21 files changed

Lines changed: 3551 additions & 117 deletions

File tree

.ai/skills/check-upstream/SKILL.md

Lines changed: 383 additions & 0 deletions
Large diffs are not rendered by default.

.claude/skills

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../.ai/skills

.github/workflows/build.yml

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,19 @@ jobs:
159159
with:
160160
enable-cache: true
161161

162+
- name: Add extra swap for release build
163+
if: inputs.build_mode == 'release'
164+
run: |
165+
set -euxo pipefail
166+
sudo swapoff -a || true
167+
sudo rm -f /swapfile
168+
sudo fallocate -l 8G /swapfile || sudo dd if=/dev/zero of=/swapfile bs=1M count=8192
169+
sudo chmod 600 /swapfile
170+
sudo mkswap /swapfile
171+
sudo swapon /swapfile
172+
free -h
173+
swapon --show
174+
162175
- name: Build (release mode)
163176
uses: PyO3/maturin-action@v1
164177
if: inputs.build_mode == 'release'
@@ -233,7 +246,7 @@ jobs:
233246
set -euxo pipefail
234247
sudo swapoff -a || true
235248
sudo rm -f /swapfile
236-
sudo fallocate -l 16G /swapfile || sudo dd if=/dev/zero of=/swapfile bs=1M count=16384
249+
sudo fallocate -l 8G /swapfile || sudo dd if=/dev/zero of=/swapfile bs=1M count=8192
237250
sudo chmod 600 /swapfile
238251
sudo mkswap /swapfile
239252
sudo swapon /swapfile

AGENTS.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
<!---
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# Agent Instructions
21+
22+
This project uses AI agent skills stored in `.ai/skills/`. Each skill is a directory containing a `SKILL.md` file with instructions for performing a specific task.
23+
24+
Skills follow the [Agent Skills](https://agentskills.io) open standard. Each skill directory contains:
25+
26+
- `SKILL.md` — The skill definition with YAML frontmatter (name, description, argument-hint) and detailed instructions.
27+
- Additional supporting files as needed.
28+
29+
## Python Function Docstrings
30+
31+
Every Python function must include a docstring with usage examples.
32+
33+
- **Examples are required**: Each function needs at least one doctest-style example
34+
demonstrating basic usage.
35+
- **Optional parameters**: If a function has optional parameters, include separate
36+
examples that show usage both without and with the optional arguments. Pass
37+
optional arguments using their keyword name (e.g., `step=dfn.lit(3)`) so readers
38+
can immediately see which parameter is being demonstrated.
39+
- **Reuse input data**: Use the same input data across examples wherever possible.
40+
The examples should demonstrate how different optional arguments change the output
41+
for the same input, making the effect of each option easy to understand.
42+
- **Alias functions**: Functions that are simple aliases (e.g., `list_sort` aliasing
43+
`array_sort`) only need a one-line description and a `See Also` reference to the
44+
primary function. They do not need their own examples.
45+
46+
## Aggregate and Window Function Documentation
47+
48+
When adding or updating an aggregate or window function, ensure the corresponding
49+
site documentation is kept in sync:
50+
51+
- **Aggregations**: `docs/source/user-guide/common-operations/aggregations.rst`
52+
add new aggregate functions to the "Aggregate Functions" list and include usage
53+
examples if appropriate.
54+
- **Window functions**: `docs/source/user-guide/common-operations/windows.rst`
55+
add new window functions to the "Available Functions" list and include usage
56+
examples if appropriate.

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
AGENTS.md

Cargo.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,8 +64,8 @@ pyo3-build-config = "0.28"
6464
datafusion-python-util = { path = "crates/util" }
6565

6666
[profile.release]
67-
lto = true
68-
codegen-units = 1
67+
lto = "thin"
68+
codegen-units = 2
6969

7070
# We cannot publish to crates.io with any patches in the below section. Developers
7171
# must remove any entries in this section before creating a release candidate.

README.md

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -275,7 +275,7 @@ needing to activate the virtual environment:
275275

276276
```bash
277277
uv run --no-project maturin develop --uv
278-
uv run --no-project pytest .
278+
uv run --no-project pytest
279279
```
280280

281281
To run the FFI tests within the examples folder, after you have built
@@ -312,6 +312,33 @@ There are scripts in `ci/scripts` for running Rust and Python linters.
312312
./ci/scripts/rust_toml_fmt.sh
313313
```
314314

315+
## Checking Upstream DataFusion Coverage
316+
317+
This project includes an [AI agent skill](.ai/skills/check-upstream/SKILL.md) for auditing which
318+
features from the upstream Apache DataFusion Rust library are not yet exposed in these Python
319+
bindings. This is useful when adding missing functions, auditing API coverage, or ensuring parity
320+
with upstream.
321+
322+
The skill accepts an optional area argument:
323+
324+
```
325+
scalar functions
326+
aggregate functions
327+
window functions
328+
dataframe
329+
session context
330+
ffi types
331+
all
332+
```
333+
334+
If no argument is provided, it defaults to checking all areas. The skill will fetch the upstream
335+
DataFusion documentation, compare it against the functions and methods exposed in this project, and
336+
produce a coverage report listing what is currently exposed and what is missing.
337+
338+
The skill definition lives in `.ai/skills/check-upstream/SKILL.md` and follows the
339+
[Agent Skills](https://agentskills.io) open standard. It can be used by any AI coding agent that
340+
supports skill discovery, or followed manually.
341+
315342
## How to update dependencies
316343

317344
To change test dependencies, change the `pyproject.toml` and run

crates/core/src/context.rs

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -434,11 +434,25 @@ impl PySessionContext {
434434
&upstream_host
435435
};
436436
let url_string = format!("{scheme}{derived_host}");
437-
let url = Url::parse(&url_string).unwrap();
437+
let url = Url::parse(&url_string).map_err(|e| PyValueError::new_err(e.to_string()))?;
438438
self.ctx.runtime_env().register_object_store(&url, store);
439439
Ok(())
440440
}
441441

442+
/// Deregister an object store with the given url
443+
#[pyo3(signature = (scheme, host=None))]
444+
pub fn deregister_object_store(
445+
&self,
446+
scheme: &str,
447+
host: Option<&str>,
448+
) -> PyDataFusionResult<()> {
449+
let host = host.unwrap_or("");
450+
let url_string = format!("{scheme}{host}");
451+
let url = Url::parse(&url_string).map_err(|e| PyDataFusionError::Common(e.to_string()))?;
452+
self.ctx.runtime_env().deregister_object_store(&url)?;
453+
Ok(())
454+
}
455+
442456
#[allow(clippy::too_many_arguments)]
443457
#[pyo3(signature = (name, path, table_partition_cols=vec![],
444458
file_extension=".parquet",
@@ -492,6 +506,10 @@ impl PySessionContext {
492506
self.ctx.register_udtf(&name, func);
493507
}
494508

509+
pub fn deregister_udtf(&self, name: &str) {
510+
self.ctx.deregister_udtf(name);
511+
}
512+
495513
#[pyo3(signature = (query, options=None, param_values=HashMap::default(), param_strings=HashMap::default()))]
496514
pub fn sql_with_options(
497515
&self,
@@ -1008,16 +1026,28 @@ impl PySessionContext {
10081026
Ok(())
10091027
}
10101028

1029+
pub fn deregister_udf(&self, name: &str) {
1030+
self.ctx.deregister_udf(name);
1031+
}
1032+
10111033
pub fn register_udaf(&self, udaf: PyAggregateUDF) -> PyResult<()> {
10121034
self.ctx.register_udaf(udaf.function);
10131035
Ok(())
10141036
}
10151037

1038+
pub fn deregister_udaf(&self, name: &str) {
1039+
self.ctx.deregister_udaf(name);
1040+
}
1041+
10161042
pub fn register_udwf(&self, udwf: PyWindowUDF) -> PyResult<()> {
10171043
self.ctx.register_udwf(udwf.function);
10181044
Ok(())
10191045
}
10201046

1047+
pub fn deregister_udwf(&self, name: &str) {
1048+
self.ctx.deregister_udwf(name);
1049+
}
1050+
10211051
#[pyo3(signature = (name="datafusion"))]
10221052
pub fn catalog(&self, py: Python, name: &str) -> PyResult<Py<PyAny>> {
10231053
let catalog = self.ctx.catalog(name).ok_or(PyKeyError::new_err(format!(

0 commit comments

Comments
 (0)