Commit 8cc23f3
committed
feat: auto-register ObjectStore and accept it in read/register methods
Closes #899. Users no longer need to call `register_object_store()`
before reading remote files. Two complementary mechanisms are provided:
**Auto-registration from URL scheme**
`try_register_url_store()` is called inside every `read_*` /
`register_*` method. It parses the path, detects the scheme (s3, gs,
az/abfss, http/https) and builds an appropriate `ObjectStore` from
environment variables. An existing registration is never overwritten, so
an explicit `register_object_store()` call still takes precedence.
Anonymous S3 access is enabled via `AWS_SKIP_SIGNATURE=true/1`
(avoids EC2 IMDS timeouts when not running on AWS).
**`object_store` parameter on read/register methods**
All eight `read_*` / `register_*` methods (`read_parquet`,
`register_parquet`, `read_csv`, `register_csv`, `read_json`,
`register_json`, `read_avro`, `register_avro`) now accept an optional
`object_store` keyword argument. Passing a store instance registers it
for the URL immediately, with no separate call required:
```python
from datafusion.object_store import S3Store
store = S3Store("my-bucket", region="us-east-1", skip_signature=True)
df = ctx.read_parquet("s3://my-bucket/data.parquet", object_store=store)
```
**pyo3-object-store integration**
Replaced the hand-rolled `store.rs` Python classes with
`pyo3-object_store 0.9` (compatible with object_store 0.13), which
provides richer, actively maintained Python builders for every backend.
The `datafusion.object_store` module now exposes `S3Store`, `GCSStore`,
`AzureStore`, `HTTPStore`, `LocalStore`, `MemoryStore`, and `from_url`.
Legacy names (`AmazonS3`, `GoogleCloud`, `MicrosoftAzure`, `Http`,
`LocalFileSystem`) are kept as backward-compatible aliases.
`register_object_store(url, store)` now takes a full URL prefix and a
`PyObjectStore` instead of the old `(scheme, StorageContexts, host)`
triple, matching the pattern suggested in #899.
**Tests**
Added integration tests (`@pytest.mark.integration`):
- `test_read_http_csv` - reads CSV from GitHub raw HTTPS
- `test_read_https_parquet` - reads Parquet from Apache parquet-testing
- `test_read_s3_parquet_explicit` - passes `S3Store` via `object_store=`
- `test_read_s3_parquet_auto` - uses `AWS_SKIP_SIGNATURE=true` env var1 parent be8dd9d commit 8cc23f3
76 files changed
Lines changed: 543 additions & 148 deletions
File tree
- benchmarks
- db-benchmark
- crates/core
- src
- examples
- datafusion-ffi-example/python/tests
- tpch
- python
- datafusion
- tests
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
| 59 | + | |
59 | 60 | | |
60 | 61 | | |
61 | 62 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
24 | 23 | | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
| |||
31 | 33 | | |
32 | 34 | | |
33 | 35 | | |
34 | | - | |
35 | 36 | | |
36 | 37 | | |
37 | 38 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
| 24 | + | |
23 | 25 | | |
24 | 26 | | |
25 | 27 | | |
26 | | - | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
| 50 | + | |
50 | 51 | | |
51 | 52 | | |
52 | 53 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | | - | |
21 | 20 | | |
22 | 21 | | |
| 22 | + | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
| 66 | + | |
66 | 67 | | |
67 | 68 | | |
68 | 69 | | |
| |||
0 commit comments