|
| 1 | +# Catalog service migration (DWH-317) |
| 2 | + |
| 3 | +Spellbook's underlying Trino catalogs have moved from Dune's self-hosted Hive Metastore (HMS) to the Dune catalog service. The migration is transparent for most spells, but a few things changed that you should be aware of when reading / contributing to the repo. |
| 4 | + |
| 5 | +## What changed in this repo |
| 6 | + |
| 7 | +Every spell now emits an additional set of Dune table properties on each run. The shared post-hook macros (`mark_as_spell`, `expose_spells`, `hide_spells`, `expose_dataset`) have been updated to always set: |
| 8 | + |
| 9 | +- `dune.created_by='dbt_spellbook'` — provenance tag so the catalog service can distinguish dbt-authored spells from rows written by other producers (sqlmesh, `dunectl migrate`, etc.). |
| 10 | +- `dune.public` — explicit `'true'` for spells created via `mark_as_spell` and `expose_spells`, `'false'` for spells created via `hide_spells`. This used to be implicit and relied on HMS defaults. |
| 11 | +- `dune.visible` — explicit `'false'` by default, `'true'` for spells exposed via `expose_spells` / `expose_dataset`, `'false'` in sandpit regardless. This drives Data Explorer discoverability. |
| 12 | + |
| 13 | +You do not need to change existing models. The baseline post-hook chain in every `dbt_project.yml` (`set_trino_session_property` → `optimize_spell` → `mark_as_spell`) now lands these properties for every spell automatically. Per-model `post_hook='{{ expose_spells(...) }}'` / `'{{ hide_spells() }}'` calls override the baseline as before. |
| 14 | + |
| 15 | +## Per-model opt-in patterns (reminder) |
| 16 | + |
| 17 | +| Macro | `dune.public` | `dune.visible` | Use when | |
| 18 | +| --- | --- | --- | --- | |
| 19 | +| `mark_as_spell` (default, automatic) | `true` | `false` | Intermediate / chain-specific tables, default | |
| 20 | +| `expose_spells(...)` (per-model `post_hook`) | `true` | `true` | Sector spells surfaced in Data Explorer | |
| 21 | +| `hide_spells()` (per-model `post_hook`) | `false` | `false` | Internal spells that must stay private | |
| 22 | +| `expose_dataset(...)` (per-model `post_hook`) | `true` | `true` | Third-party datasets | |
| 23 | + |
| 24 | +`dune.public='true'` means anyone can query the table. `dune.visible='true'` means it shows up in Data Explorer. They are independent flags. |
| 25 | + |
| 26 | +## What changed outside this repo |
| 27 | + |
| 28 | +Some configuration cannot be updated via this repo because it lives in dbt Cloud job definitions and CI runner secrets. The following items were changed out-of-band as part of the migration and are documented here for reproducibility. |
| 29 | + |
| 30 | +### dbt Cloud profile (`dunesql`) |
| 31 | + |
| 32 | +The `dunesql` profile is defined in the dbt Cloud account connection (not in any `profiles.yml` in this repo). Its Trino connection was updated to target the migrated clusters: |
| 33 | + |
| 34 | +- `database`: `hive_catalog_svc` (was `hive` on the pre-migration cluster). |
| 35 | +- `host`: one of the migrated spellbook clusters depending on the job (`trino-spellbook-cd.prod.internal.dunetech.io`, `trino-spellbook-daily.prod.internal.dunetech.io`, etc.). |
| 36 | +- `http_headers.X-Trino-Client-Tags`: `routingGroup=spellbook-<cluster>` matching the job's cluster. |
| 37 | + |
| 38 | +On the migrated clusters, the Trino catalogs `hive` and `delta_prod` are aliases for the same catalog-service-backed catalog. Both names resolve to the same rows. Models can continue to reference sources and refs without schema qualification changes. |
| 39 | + |
| 40 | +### CI runner `~/.dbt/profiles.yml` |
| 41 | + |
| 42 | +The `dunesql` profile used by GitHub Actions (`.github/workflows/dbt_run.yml` invokes `dbt ... --profile dunesql`) is provisioned into `$HOME/.dbt/profiles.yml` by the runner image / secret at job start. Its `database` and `host` values follow the same pattern as the dbt Cloud profile above. Update both together when cluster names change. |
| 43 | + |
| 44 | +### Sandpit profile (in-repo) |
| 45 | + |
| 46 | +The sandpit profile (`dbt_subprojects/*/profiles.yml`, key `spellbook-sandpit`) was already updated in-repo: `database: hive_catalog_svc` and `X-Trino-Client-Tags: routingGroup=spellbook-sandpit`. No further action needed. |
| 47 | + |
| 48 | +## Verifying a new spell lands correctly |
| 49 | + |
| 50 | +After a dbt run against prod or sandpit, the spell should appear in the catalog service with the expected properties. Query the catalog service directly from any migrated cluster: |
| 51 | + |
| 52 | +```sql |
| 53 | +SELECT extra_properties |
| 54 | +FROM system.metadata.table_properties |
| 55 | +WHERE catalog_name = 'hive_catalog_svc' |
| 56 | + AND schema_name = '<your_schema>' |
| 57 | + AND table_name = '<your_table>'; |
| 58 | +``` |
| 59 | + |
| 60 | +Expected keys on a `mark_as_spell`-only spell: `dune.created_by`, `dune.public`, `dune.visible`, `dune.data_explorer.category`, and (for tables) `dune.vacuum`. |
| 61 | + |
| 62 | +## Context |
| 63 | + |
| 64 | +The migration tracking issue is DWH-317 in Linear. See the arrakis-jobs and core repos for the cluster and plugin-side changes that back this work. |
0 commit comments