Skip to content

Commit e63d29d

Browse files
authored
Merge pull request #160 from MITLibraries/TIMX-533-rework-dataset-load
TIMX 533 - Load pyarrow dataset on TIMDEXDataset init
2 parents 0a80a24 + 6f75254 commit e63d29d

8 files changed

Lines changed: 346 additions & 598 deletions

File tree

README.md

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -110,12 +110,6 @@ timdex_dataset = TIMDEXDataset("s3://my-bucket/path/to/dataset")
110110

111111
# or, local dataset (e.g. testing or development)
112112
timdex_dataset = TIMDEXDataset("/path/to/dataset")
113-
114-
# load the dataset, which discovers all parquet files
115-
timdex_dataset.load()
116-
117-
# or, load the dataset but ensure that only current records are ever yielded
118-
timdex_dataset.load(current_records=True)
119113
```
120114

121115
All read methods for `TIMDEXDataset` allow for the same group of filters which are defined in `timdex_dataset_api.dataset.DatasetFilters`. Examples are shown below.

pyproject.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ line-length = 90
5454
[tool.mypy]
5555
disallow_untyped_calls = true
5656
disallow_untyped_defs = true
57-
exclude = ["tests/", "output/"]
57+
exclude = ["tests/", "output/", "migrations/"]
5858

5959
[[tool.mypy.overrides]]
6060
module = []
@@ -95,6 +95,8 @@ ignore = [
9595
"PLR0915",
9696
"S321",
9797
"S608",
98+
"TD002",
99+
"TD003",
98100
"TRY003"
99101
]
100102

tests/conftest.py

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,6 @@ def timdex_dataset(tmp_path, timdex_dataset_config) -> TIMDEXDataset:
8282
),
8383
write_append_deltas=False,
8484
)
85-
dataset.load()
8685
return dataset
8786

8887

@@ -110,8 +109,6 @@ def timdex_dataset_multi_source(tmp_path) -> TIMDEXDataset:
110109
),
111110
write_append_deltas=False,
112111
)
113-
114-
dataset.load()
115112
return dataset
116113

117114

@@ -165,8 +162,6 @@ def timdex_dataset_with_runs(tmp_path, timdex_dataset_config_small) -> TIMDEXDat
165162
),
166163
write_append_deltas=False,
167164
)
168-
169-
dataset.load()
170165
return dataset
171166

172167

@@ -202,8 +197,6 @@ def timdex_dataset_same_day_runs(tmp_path) -> TIMDEXDataset:
202197
),
203198
write_append_deltas=False,
204199
)
205-
206-
dataset.load()
207200
return dataset
208201

209202

0 commit comments

Comments
 (0)