Allow pickling PyExpr#1517
Draft
ntjohnson1 wants to merge 2 commits intoapache:mainfrom
Draft
Conversation
Promote the previously immutable global context slot in `datafusion-python-util` from `OnceLock<Arc<SessionContext>>` to a `RwLock<Arc<SessionContext>>` and expose `set_global_ctx` (Rust) / `SessionContext.set_as_global` (Python). Users who register UDFs or otherwise customize a context can now make it the default seen by `SessionContext.global_ctx()` and the module-level `read_*` helpers. Existing snapshots returned by `get_global_ctx()` are unaffected — the swap only changes what subsequent readers see. Also fixes a pre-existing clippy `uninlined_format_args` nit in `dataframe.rs` that was tripping the pre-commit hook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `to_bytes` / `from_bytes` on `Expr` (Python wrapper) and the underlying `RawExpr` (Rust). Serialization uses `datafusion-proto`'s `Serializeable` trait, encoding function references by name. The Python wrapper implements `__getstate__` / `__setstate__` on top, so `pickle.dumps` / `dill.dumps` work out of the box. Reconstruction resolves function names against the process-wide global `SessionContext` (introduced as settable in the previous commit). Built-in functions always roundtrip; user-defined functions roundtrip when registered on a context that has been installed via `SessionContext.set_as_global()`. Adds `dill` to the dev dependency group and parametrized tests covering both serializers across columns, literals, binary ops, casts, between, aggregates, case/when, and a UDF with the global-ctx pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Allows holding a pyexpr when spawing python parallelism. Will file an issue. Currently just exploring.
Closes #.
Rationale for this change
Parallelism.
What changes are included in this PR?
This does change how global context is managed to treat it more like a singleton so we can assume that is the right thing to pull when unpacking the expr.
Are there any user-facing changes?
Not really, besides adding new capability.