Skip to content

Allow pickling PyExpr#1517

Draft
ntjohnson1 wants to merge 2 commits intoapache:mainfrom
rerun-io:nick/pickle_expr
Draft

Allow pickling PyExpr#1517
ntjohnson1 wants to merge 2 commits intoapache:mainfrom
rerun-io:nick/pickle_expr

Conversation

@ntjohnson1
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Allows holding a pyexpr when spawing python parallelism. Will file an issue. Currently just exploring.

Closes #.

Rationale for this change

Parallelism.

What changes are included in this PR?

This does change how global context is managed to treat it more like a singleton so we can assume that is the right thing to pull when unpacking the expr.

Are there any user-facing changes?

Not really, besides adding new capability.

ntjohnson1 and others added 2 commits April 28, 2026 11:26
Promote the previously immutable global context slot in `datafusion-python-util`
from `OnceLock<Arc<SessionContext>>` to a `RwLock<Arc<SessionContext>>` and
expose `set_global_ctx` (Rust) / `SessionContext.set_as_global` (Python). Users
who register UDFs or otherwise customize a context can now make it the default
seen by `SessionContext.global_ctx()` and the module-level `read_*` helpers.

Existing snapshots returned by `get_global_ctx()` are unaffected — the swap only
changes what subsequent readers see.

Also fixes a pre-existing clippy `uninlined_format_args` nit in
`dataframe.rs` that was tripping the pre-commit hook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `to_bytes` / `from_bytes` on `Expr` (Python wrapper) and the underlying
`RawExpr` (Rust). Serialization uses `datafusion-proto`'s `Serializeable`
trait, encoding function references by name. The Python wrapper implements
`__getstate__` / `__setstate__` on top, so `pickle.dumps` / `dill.dumps` work
out of the box.

Reconstruction resolves function names against the process-wide global
`SessionContext` (introduced as settable in the previous commit). Built-in
functions always roundtrip; user-defined functions roundtrip when registered
on a context that has been installed via `SessionContext.set_as_global()`.

Adds `dill` to the dev dependency group and parametrized tests covering
both serializers across columns, literals, binary ops, casts, between,
aggregates, case/when, and a UDF with the global-ctx pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant