Skip to content

Commit 37c9275

Browse files
committed
Property for ETL records data
Why these changes are being introduced: This is a small change now, that will lead to a larger change later. The TIMDEX dataset is getting more structure, and this means we will want to initialize a TIMDEXDataset instance with the root of the dataset, but then internally there will be more opinionation about where files should be read and written to. How this addresses that need: A new property 'data_records_root' is added to TIMDEXDataset that mirrors similar properties in TIMDEXDatasetMetadata. This informs any operations that need to read or write ETL records where precisely they are in the dataset. At this time only .write() utilizes it, but in a future ticket the load method will be heavily reworked (if not outright removed) and this property will be fully integrated. This is needed now to continue updates to TIMDEXMetadataDataset for TIMX-530. Side effects of this change: * Initialization of TIMDEXDataset should provide the true dataset root, not point to /data/records. The pipeline lambda currently does this, but will be updated in TIMX-531. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TIMX-530 * https://mitlibraries.atlassian.net/browse/TIMX-531
1 parent ff2aff0 commit 37c9275

1 file changed

Lines changed: 5 additions & 1 deletion

File tree

timdex_dataset_api/dataset.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,10 @@ def __init__(
126126
# writing
127127
self._written_files: list[ds.WrittenFile] = None # type: ignore[assignment]
128128

129+
@property
130+
def data_records_root(self) -> str:
131+
return f"{self.location.removesuffix('/')}/data/records" # type: ignore[union-attr]
132+
129133
@property
130134
def row_count(self) -> int:
131135
"""Get row count from loaded dataset."""
@@ -370,7 +374,7 @@ def write(
370374
start_time = time.perf_counter()
371375
self._written_files = []
372376

373-
dataset_filesystem, dataset_path = self.parse_location(self.location)
377+
dataset_filesystem, dataset_path = self.parse_location(self.data_records_root)
374378
if isinstance(dataset_path, list):
375379
raise TypeError(
376380
"Dataset location must be the root of a single dataset for writing"

0 commit comments

Comments
 (0)