Commit 25d4430
committed
current_records view relies on temp table
Why these changes are being introduced:
The current_records view has always been a performance and resource
bottleneck. Moving to metadata and SQL has helped, but there was a little
kink left in relation to using that metadata for data retreival.
We often would materialize a query to a pandas dataframe for to use
to drive data retrieval. In that moment, we do not benefit from having
current_records be a view, when we're going to materialize the data
anyhow.
How this addresses that need:
Utilizing a DuckDB temp table, we take a small performance hit on
TIMDEXDatasetMetadata load, but then have near instant current_records
queries thereafter.
Additionally, we remove ordering in the metadata query for data
retrieval and perform this in-memory with the pandas dataframe. Often
this may be quite small, but even if large, it's more efficient here
and already in python memory.
This will also set the stage for performing just-in-time metadata
queries as chunks before data retrieval, versus pulling all metadata
rows in one query and then chunking that in memory.
Side effects of this change:
* Quicker metadata queries, small performance hit on load. Appears
similarly memory intensive.
Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-5431 parent b7c3350 commit 25d4430
3 files changed
Lines changed: 66 additions & 28 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
125 | 125 | | |
126 | 126 | | |
127 | 127 | | |
128 | | - | |
129 | | - | |
| 128 | + | |
| 129 | + | |
130 | 130 | | |
131 | 131 | | |
132 | 132 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
375 | 375 | | |
376 | 376 | | |
377 | 377 | | |
| 378 | + | |
| 379 | + | |
378 | 380 | | |
379 | 381 | | |
380 | 382 | | |
381 | 383 | | |
| 384 | + | |
382 | 385 | | |
383 | 386 | | |
384 | 387 | | |
| |||
410 | 413 | | |
411 | 414 | | |
412 | 415 | | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
413 | 420 | | |
414 | 421 | | |
415 | 422 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
334 | 334 | | |
335 | 335 | | |
336 | 336 | | |
| 337 | + | |
337 | 338 | | |
338 | 339 | | |
339 | 340 | | |
| |||
436 | 437 | | |
437 | 438 | | |
438 | 439 | | |
439 | | - | |
440 | | - | |
441 | | - | |
442 | | - | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
443 | 443 | | |
444 | 444 | | |
445 | 445 | | |
446 | | - | |
447 | | - | |
448 | | - | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
449 | 491 | | |
450 | | - | |
451 | | - | |
452 | | - | |
453 | | - | |
454 | | - | |
455 | | - | |
456 | | - | |
457 | | - | |
458 | | - | |
459 | | - | |
460 | | - | |
461 | | - | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
462 | 500 | | |
463 | | - | |
464 | | - | |
465 | | - | |
466 | | - | |
467 | | - | |
468 | | - | |
469 | 501 | | |
470 | 502 | | |
471 | 503 | | |
| |||
602 | 634 | | |
603 | 635 | | |
604 | 636 | | |
605 | | - | |
606 | 637 | | |
607 | 638 | | |
608 | 639 | | |
| |||
0 commit comments