Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
343 commits
Select commit Hold shift + click to select a range
8c0d1e4
reprocess by id
Dragos0000 May 22, 2025
cf65e6c
reprocess dags and label change for dags
Dragos0000 May 23, 2025
c54c423
adding examples to reprocess by status
Dragos0000 May 26, 2025
b8a187e
adding examples to reprocess by status
Dragos0000 May 26, 2025
e543554
change step for reprocessing
Dragos0000 May 27, 2025
b8692f3
fix: Fix tests after merge
duprijil May 27, 2025
726ac53
Merge pull request #583 from OP-TED/feature/SWS2-18
duprijil May 27, 2025
90cb7c2
fix: Fix problem after merger
duprijil May 27, 2025
34b08ef
Merge branch 'develop' into feature/SWS2-24
duprijil May 27, 2025
fa46d7e
fix: Fix problem with running unnecessary scheduled dag run on DAG un…
duprijil May 27, 2025
0cf76de
fix: Make notice daily fetcher being hard failing
duprijil May 27, 2025
dec7e7d
fix: Fix problem with running unnecessary scheduled dag run
duprijil May 27, 2025
50c5e49
feat!: Make MP processor be more hard fail
duprijil May 28, 2025
dee722d
fix: Solve failing ted api tests with response status
duprijil May 28, 2025
350dacd
feat: Implement retreiving success notice statuses from airflow or .e…
duprijil Jul 21, 2025
0aae224
feat: implement logic of failing dags based on noticess success statuses
duprijil Jul 23, 2025
c00b5d4
tests:Implement FAT for TEDSWS-184
duprijil Jul 24, 2025
456336e
Update notice_processing_pipeline.py
duprijil Jul 25, 2025
f0562d6
Merge pull request #2 from meaningfy-ws/feature/SWS2-24
Dragos0000 Jul 25, 2025
e585f63
new structure test
Dragos0000 Jul 30, 2025
961fa24
fix for testing
Dragos0000 Jul 30, 2025
5456639
fix for testing
Dragos0000 Jul 30, 2025
ab9329a
fix for testing dags path
Dragos0000 Jul 30, 2025
82c86da
sonar proprieties
Dragos0000 Jul 30, 2025
f7f40bb
sonar changes
Dragos0000 Aug 5, 2025
929eeb9
sonar changes
Dragos0000 Aug 5, 2025
d1fcbc1
make file changes
Dragos0000 Aug 5, 2025
80cff94
make file changes
Dragos0000 Aug 5, 2025
fc944a3
mapping resources
Dragos0000 Aug 5, 2025
bb88598
queries resources
Dragos0000 Aug 5, 2025
dfb8a74
test fix
Dragos0000 Aug 11, 2025
2a2dfd4
Merge pull request #5 from meaningfy-ws/feature/restructure
duprijil Aug 11, 2025
aac74f0
chore: add test reports to gitignore
duprijil Aug 11, 2025
066c7af
docs: Add FAT report for TED SWS pipeline version 2.3.0
duprijil Aug 11, 2025
c5fdb5f
build: Implement solution for TEDSWS-192
duprijil Aug 11, 2025
0fa96e4
fix: Fix problem with allure reports having too big size
duprijil Aug 11, 2025
7a41c88
fix: update .gitignore with new airflow infra link
duprijil Aug 11, 2025
5628bff
fix: switch opt to home for 3rd party libs
duprijil Aug 11, 2025
1fe4471
solve problem with failing on notices with success status
duprijil Aug 11, 2025
b6904bb
solve problem with failing on notices with success status by changing…
duprijil Aug 11, 2025
b3b1632
Merge pull request #6 from meaningfy-ws/feature/SWS2-17-restructured
Dragos0000 Aug 11, 2025
833e80c
alignment with bitbucket
Dragos0000 Dec 5, 2025
d10bda6
chore: Add src/infra/airflow/libraries to .gitignore
duprijil Dec 11, 2025
88472ab
fix: Delete unnecessary variable: BATCH_SIZE
duprijil Dec 12, 2025
510b269
fix: Fix problem with multiple triggering of materialised view DAG
duprijil Dec 12, 2025
0de36a3
test: Implement tests for new logic of triggering materialised view
duprijil Dec 12, 2025
ec13c17
fix: Add trigger materialised view toggle
duprijil Dec 12, 2025
d879541
test: Add tests for trigger materialised view toggle
duprijil Dec 12, 2025
7955a18
fix: Add materialised view dag run toggle in load mapping suite dag
duprijil Dec 14, 2025
74e4068
fix: Potentailly fix the problem with finishing fetch notices by date…
duprijil Dec 14, 2025
a7b3e5c
fix: Encreasing time of task queued timeout to potentially fix task k…
duprijil Dec 14, 2025
2dc86e4
fix: Decrease number of max active runs and tasks of notice processin…
duprijil Dec 14, 2025
b588829
fix: Make materialised view dag param in fetch notice by date True by…
duprijil Dec 15, 2025
3ae0431
chore: Update VERSION from 2.3.0-rc.3 -> 2.3.0-rc.4
duprijil Dec 17, 2025
bb33da8
chore: Change trigger rule to ALL_DONE of distillation step
duprijil Dec 17, 2025
54cbe25
fix: Make pipeline services soft failing
duprijil Dec 17, 2025
408842a
Merge pull request #9 from meaningfy-ws/feature/SWS2-43
Dragos0000 Dec 19, 2025
6fcb9b0
package loader
valexande Dec 10, 2025
32a6489
config loader and update makefile for mac user
valexande Dec 10, 2025
76912ea
updates for capturing changes in mssdk
valexande Dec 11, 2025
ba3b965
mssdk integration tests for loading packages
valexande Dec 11, 2025
575444e
mssdk integration tests for saving packages
valexande Dec 11, 2025
8137d65
make the mssdk tests self-contained (no user CLI params)
valexande Dec 12, 2025
20687d5
include mssdk test data, modify a test script
valexande Dec 12, 2025
5e05ee1
refactor the mssdk integration tests, move out of features
valexande Dec 16, 2025
d0906da
Merge pull request #7 from meaningfy-ws/feature/TED9-151/TED9-155
valexande Jan 6, 2026
11765d9
Merge pull request #8 from meaningfy-ws/feature/TED9-151/TED9-154
valexande Jan 6, 2026
15e5b3e
Merge pull request #17 from meaningfy-ws/feature/TED9-151/TED9-155
valexande Jan 6, 2026
dce5369
chore: update gitignore for arbitrary venv suffixes
valexande Dec 17, 2025
d6c569c
chore: move mssdk to main requirements file
valexande Dec 18, 2025
16b3941
refactor: rename MappingSuite to MappingPackage verbatim
valexande Dec 19, 2025
9ad66e1
refactor: rename MappingSuite*->MappingPackage* prefixed symbols
valexande Dec 19, 2025
adaa4c1
refactor: rename suite to package, strings and variables
valexande Dec 24, 2025
0eae9d6
refactor: rename more mapping_suite symbols to mapping_package
valexande Dec 25, 2025
ed57934
refactor: rename mapping_suite symbols to mapping_package, test code
valexande Dec 25, 2025
65b14cb
refactor(docs): update mention of MappingSuiteEvent
schivmeister Jan 14, 2026
f75f46b
refactor: mapping_suite->mapping_package symbol renames, feature tests
schivmeister Jan 14, 2026
54130d6
refactor: yet more suite->package symbol renames in test code
schivmeister Jan 14, 2026
0f072fe
refactor: more suite->package symbol renames in test data
schivmeister Jan 15, 2026
e605671
revert metabase export file to original state
schivmeister Jan 15, 2026
d435c11
refactor: suite->package renames among files/modules
schivmeister Jan 16, 2026
0f5ea81
feat: support for loading project (Mapping Suite) configuration
schivmeister Jan 16, 2026
1313df1
test: add unit test for optional MS config load and save
schivmeister Jan 19, 2026
c5bebe9
fix(test): unit test for super call should mock parent function
schivmeister Jan 19, 2026
0a21a51
refactor(test): remove unneeded use of MagicMock, prefer Mock
schivmeister Jan 19, 2026
6c3a988
Merge pull request #20 from meaningfy-ws/feature/TED9-167_extend-gith…
schivmeister Jan 25, 2026
17cdfd6
Merge pull request #10 from meaningfy-ws/feature/TED9-46_refactor-pip…
schivmeister Jan 25, 2026
5de5502
add consolidated test workflow and vault connectivity check
twicechild Jan 22, 2026
5632f9d
consolidate CI into single workflow, remove self-hosted runner
twicechild Jan 22, 2026
579d18b
fix branch pattern to match nested paths
twicechild Jan 22, 2026
128b16c
test: try self-hosted runner for comparison
twicechild Jan 22, 2026
5f6c09a
skip installs if tools already present on runner
twicechild Jan 22, 2026
dd9d7d2
split workflow: unit tests on push, full tests on PR
twicechild Jan 22, 2026
dfa2294
consolidate workflow with conditional runner and test commands
twicechild Jan 22, 2026
6ef96b0
update SonarCloud action to v6 for security patches
twicechild Jan 22, 2026
f30f932
use python -m tox to avoid system tox conflict on self-hosted
twicechild Jan 22, 2026
2bad73a
add develop to PR trigger branches
twicechild Jan 22, 2026
9c5e227
Rename unit-tests.yml to tests.yml
twicechild Jan 26, 2026
31df76a
Add TED-SWS deployment workflow (Phase 1 - Traefik)
twicechild Jan 25, 2026
ddbdbeb
Update deployment workflow to use a single runner
twicechild Jan 25, 2026
b8bce9d
Remove prerequisites install, now managed by Ansible
twicechild Jan 25, 2026
4a57697
Use /opt/tedsws as deployment directory with clean: false
twicechild Jan 25, 2026
2003538
Use default workspace with clean: false for persistence
twicechild Jan 25, 2026
77cc9d0
Add Phase 2: MongoDB + Fuseki deployment, pin AllegroGraph to v8.4.3
twicechild Jan 25, 2026
8d8a12e
Use Fuseki base image directly, remove Dockerfile build
twicechild Jan 25, 2026
ce929b1
Add Phase 3: Airflow deployment
twicechild Jan 25, 2026
3907e06
Use /opt/tedsws for Airflow, mount libraries at runtime
twicechild Jan 25, 2026
a12666f
Add Phase 4: MinIO, AllegroGraph, Digest API, Metabase
twicechild Jan 25, 2026
50cfd26
Fix digest_api env_file path
twicechild Jan 25, 2026
e715261
Add create-env-digest-api before start
twicechild Jan 25, 2026
60cfddc
Fix digest_api: use src/ted_sws path
twicechild Jan 25, 2026
f6ffa23
Fix deepdiff version: 8.5.0 does not exist
twicechild Jan 25, 2026
a34f4b5
Move all health checks to end of workflow
twicechild Jan 25, 2026
7378a3d
Disable Digest API (needs Python 3.10+ fix)
twicechild Jan 25, 2026
72b19c1
Upgrade Traefik v2.5 to v3.3 (Docker API compat)
twicechild Jan 25, 2026
0efcc55
Add restart policies, inline redirect middleware
twicechild Jan 25, 2026
02d6dae
Refactor into callable workflows: infrastructure, pipeline
twicechild Jan 25, 2026
4cdb78c
Use configurable TRAEFIK_DATA_PATH for letsencrypt
twicechild Jan 25, 2026
df82d81
Remove obsolete version attrs, fix AIRFLOW_INFRA_FOLDER path
twicechild Jan 25, 2026
861cfbb
Fix AIRFLOW_INFRA_FOLDER override in repo root .env
twicechild Jan 25, 2026
e2d9716
Fix .env symlink: copy after build-airflow
twicechild Jan 25, 2026
e5298c2
Replace make build-airflow with direct docker commands
twicechild Jan 25, 2026
de28de6
Consolidate env and directory setup into single step
twicechild Jan 25, 2026
a94b824
Add conditional job execution with paths-filter
twicechild Jan 25, 2026
e918fb6
Add cleanup step and Makefile to trigger paths
twicechild Jan 25, 2026
aae3f6c
Add concurrency control and job timeouts
twicechild Jan 25, 2026
726be3b
Add memory limits: Traefik 128M, Mongo Express 128M, MinIO 256M, Post…
twicechild Jan 25, 2026
5075cf4
Rename deploy-ted-sws.yml to deploy.yml
twicechild Jan 25, 2026
d5071ee
Fix paths-filter to compare against previous commit on branch
twicechild Jan 25, 2026
c694321
Fix paths filter to match deploy.yml
twicechild Jan 25, 2026
80aa006
Always run health checks
twicechild Jan 25, 2026
42bdd28
Test: verify stack filter triggers
twicechild Jan 25, 2026
d78d5b8
Remove test comment
twicechild Jan 25, 2026
5e7a32c
Use tedsws-staging subdomain
twicechild Jan 26, 2026
dc03530
Fix redirect@file to redirect@docker (dynamic.yaml removed)
twicechild Jan 26, 2026
10e1725
Fix libraries mount overwriting /home/airflow (breaks pip installs)
twicechild Jan 26, 2026
f785ec6
Remove workflow_dispatch trigger
twicechild Jan 26, 2026
dc24ec1
Revert deepdiff to 8.5.0
twicechild Jan 26, 2026
1f39a8f
Remove OPS-16-notes.md from gitignore
twicechild Jan 26, 2026
6666656
Fix Airflow logs permission issue by setting AIRFLOW_UID to runner UID
twicechild Jan 27, 2026
8eef013
Integrate tests into deploy workflow, run full tests before DAG deplo…
twicechild Jan 27, 2026
ef84c4f
Temporarily skip tests in deploy workflow (srv runner offline)
twicechild Jan 27, 2026
4e581cf
Use local MongoDB container URL instead of external DNS in staging
twicechild Jan 27, 2026
8ebe2f5
Remove unused AllegroGraph from deployment (project uses Fuseki)
twicechild Jan 27, 2026
b094fa7
Use tedsws-staging runner for full tests, re-enable tests in deploy
twicechild Jan 27, 2026
362c1fe
Fix MongoDB connection for tests: use localhost:27018 instead of cont…
twicechild Jan 27, 2026
5c3ee3c
Change deploy trigger from feature branch to develop
twicechild Jan 27, 2026
d04b1eb
fix(staging): use internal Docker DNS and add concurrency limits
twicechild Feb 3, 2026
c96cb31
feat(infra): add SFTP service to deployment stack
twicechild Feb 9, 2026
48aafc7
fix(infra): set SFTP port to 22 for internal Docker networking
twicechild Feb 9, 2026
a5a9742
Revert "fix(infra): set SFTP port to 22 for internal Docker networking"
twicechild Feb 9, 2026
27c933a
fix(infra): configure SFTP port for internal Docker networking
twicechild Feb 9, 2026
f36497f
fix(infra): set SFTP port before starting Airflow
twicechild Feb 9, 2026
ff43f15
fix(infra): hardcode SFTP external port to 2235
twicechild Feb 9, 2026
e517317
fix(ci): resolve Docker container hostnames via bridge IPs instead of…
twicechild Feb 18, 2026
a9eb317
fix(ci): override SFTP port and S3 SSL for direct container access, a…
twicechild Feb 18, 2026
e07d1f6
data: update test data for MS Config according to latest model
schivmeister Feb 16, 2026
2fae1ce
fix: MS Config loaded by config dir instead of project dir
schivmeister Feb 17, 2026
b1983f4
fix(test): update broken MSSDK integration test for MS Config
schivmeister Feb 17, 2026
251dd20
Merge pull request #28 from meaningfy-ws/feature/OPS-4/full-tests-flo…
schivmeister Feb 18, 2026
fedf48a
Merge pull request #27 from meaningfy-ws/feature/TED9-46/fix-msconfig…
schivmeister Feb 18, 2026
bed01e5
Merge pull request #26 from meaningfy-ws/feature/TED9-46/update-mssdk…
schivmeister Feb 18, 2026
4e497f0
feat(infra): add unified TED-SWS docker stack
twicechild Feb 9, 2026
579876e
feat(make): add unified stack targets and staging-unified-dotenv
twicechild Feb 9, 2026
8a80923
ci: update workflows for unified stack deployment
twicechild Feb 9, 2026
6d116ad
docs(infra): add README with architecture, services, and migration guide
twicechild Feb 9, 2026
c3ec25c
fix(infra): remove public SFTP port and Traefik labels from staging
twicechild Feb 9, 2026
ae1a26a
refactor(ci): align workflows with unified stack naming and env
twicechild Feb 18, 2026
8a715e4
fix(infra): use named volumes for staging data services
twicechild Feb 18, 2026
1306b93
feat(infra): add metabase-postgres to staging stack
twicechild Feb 18, 2026
df97393
fix(ci): pass .env.common to compose for variable interpolation
twicechild Feb 18, 2026
6c08aba
Merge pull request #25 from meaningfy-ws/feature/TED9-188/docker-stac…
twicechild Feb 18, 2026
4b8bb0a
fix(infra): pin mongodb and minio to staging-compatible versions
twicechild Feb 18, 2026
2d83499
fix(infra): correct TED API URL to api.ted.europa.eu
twicechild Feb 18, 2026
f3d86cb
ci(tests): add preflight checks for required endpoints
twicechild Feb 19, 2026
26a84f0
fix(ci): use nc for minio health check instead of curl
twicechild Feb 19, 2026
4c93e1c
fix(ci): check endpoint reachability not response codes
twicechild Feb 19, 2026
ad4ff36
Merge pull request #29 from meaningfy-ws/fix/TED9-188/pin-staging-dat…
twicechild Feb 19, 2026
984fa7c
fix: Fix problem with pipeline hard failing
duprijil Feb 20, 2026
36a2211
fix(infra): add shared webserver secret key, staging logs, and minio …
twicechild Feb 20, 2026
8a47c90
fix: Make notice pipeline to trigger automatically materialised view …
duprijil Feb 20, 2026
632780c
Merge pull request #31 from meaningfy-ws/feature/TED9-202/BDC47874-is…
schivmeister Feb 20, 2026
23cc54e
Merge pull request #32 from meaningfy-ws/feature/TED9-200/BDC47874-is…
schivmeister Feb 20, 2026
72e0a1f
fix(infra): correct Traefik host rule subdomain ordering for staging …
twicechild Feb 20, 2026
1917944
feat(infra): replace ecs-cli with CloudFormation for AWS ECS deployment
twicechild Feb 25, 2026
c7731a5
fix(infra): upgrade digest-api to Python 3.10 on bookworm
twicechild Feb 25, 2026
e2f1c37
chore(infra): add resource tags to all CFN templates
twicechild Feb 25, 2026
3318fa5
fix(infra): increase Airflow concurrency settings for staging environ…
twicechild Mar 2, 2026
d5a954a
Merge pull request #35 from meaningfy-ws/feature/refine-aiflow-concur…
schivmeister Mar 2, 2026
eb41148
fix: Implement mapping mechanism for reprocessing DAG
duprijil Mar 2, 2026
a031f3a
Merge branch 'develop' into feature/TED9-201/BDC47874-issue-3
duprijil Mar 2, 2026
2580635
fix: Delete except that does nothing
duprijil Mar 2, 2026
ad4a48d
Potential fix for pull request finding 'Statement has no effect'
duprijil Mar 2, 2026
9864fad
fix: Potential fix for pull request finding 'Statement has no effect'…
duprijil Mar 2, 2026
28f3dfd
fix: Fix failing test on exception that does nothing
duprijil Mar 2, 2026
54ab71a
fix: Solve comments from PR
duprijil Mar 3, 2026
0e6a20d
docs: Enhance reprocess DAG's code with docs and comments
duprijil Mar 3, 2026
bfaa7ef
Merge pull request #36 from meaningfy-ws/feature/TED9-201/BDC47874-is…
schivmeister Mar 4, 2026
ec22bd5
fix(ci): run full test workflow on all PRs and target branches
schivmeister Mar 4, 2026
f25e957
Merge pull request #39 from meaningfy-ws/feature/OPS-4/full-tests-all…
schivmeister Mar 6, 2026
4554f70
feat(infra): expose SFTP service on staging via port 2235
twicechild Mar 5, 2026
ea61fe2
feat(infra): bump staging Airflow concurrency for CCX43 upgrade
twicechild Mar 9, 2026
de984c8
Merge pull request #41 from meaningfy-ws/feat/staging-airflow-concurr…
schivmeister Mar 9, 2026
3f2f5a5
feat(infra): add testing/SRV environment to unified ted-sws-stack
twicechild Mar 9, 2026
a34a5fb
Merge pull request #42 from meaningfy-ws/feature/prepare-SRV-for-FAT-…
schivmeister Mar 9, 2026
8a23d07
chore(test): update MS config for MSSDK update of probing model
schivmeister Mar 10, 2026
2849331
Merge pull request #44 from meaningfy-ws/feature/TED9-47/msconfig-model
schivmeister Mar 10, 2026
7a4a7ee
feat(infra): add Fuseki seed data init container and healthcheck
twicechild Mar 9, 2026
62fcfda
docs(infra): document Fuseki seed data init in stack README
twicechild Mar 9, 2026
530e55a
Merge pull request #43 from meaningfy-ws/chore/fuseki-seed-data-init
schivmeister Mar 10, 2026
30e028c
fix(deps): pin development version of MSSDK to avoid stale dependencies
schivmeister Mar 11, 2026
1a3e707
Merge pull request #45 from meaningfy-ws/feature/pin-mssdk
schivmeister Mar 11, 2026
d924fc6
WIP: transition to MSSDK for package loading/saving, v2/eForms only
schivmeister Jan 23, 2026
078b08c
chore: remove unused and dupe imports
schivmeister Jan 23, 2026
afcfa8d
test: add mandatory fields, files for MSSDK model compliance
schivmeister Feb 18, 2026
d950637
feat: add post model validator to populate legacy fields and vice-versa
schivmeister Feb 19, 2026
49771c5
feat: update package repository to accept new native extended type
schivmeister Feb 19, 2026
3374147
feat!: eliminate identifier and version concatenation, DeepDiff test
schivmeister Feb 19, 2026
f0e5c06
chore: add GitHub download logging for visibility into corner cases
schivmeister Feb 20, 2026
de9927d
feat: populate important legacy data fields and unset MSSDK ones
schivmeister Feb 20, 2026
bbcfea6
fix: add an infinite recursion guard to the post model validator
schivmeister Feb 20, 2026
583e9ad
fix: original legacy field values should be retained during model sync
schivmeister Feb 20, 2026
fa21488
fix: add guard for mongo client closing by external library
schivmeister Feb 20, 2026
53a18fe
fix: post-validation sync in package model has always truthy fields
schivmeister Feb 20, 2026
4880384
fix: handle MSSDK and bytes serialization in legacy package writing
schivmeister Feb 20, 2026
6955a44
chore: remove irrelevant BROKEN comment
schivmeister Feb 20, 2026
37422d0
Revert "fix: add guard for mongo client closing by external library"
schivmeister Feb 24, 2026
7e3abd7
feat: support for all package versions through unified v3 model
schivmeister Feb 24, 2026
a73c43d
fix: add missing sync for sparql, shacl test suites
schivmeister Feb 25, 2026
fcb9368
fix(test): test data outside folders will not be loaded by MSSDK
schivmeister Feb 25, 2026
ab0b759
feat: Git shallow clone
schivmeister Feb 26, 2026
9259f3f
refactor: simplify package repository, remove package types
schivmeister Feb 27, 2026
d81b95e
chore: remove unneeded validation services and tests
schivmeister Feb 27, 2026
e78af4e
fix: guard against MSSDK package loading errors
schivmeister Mar 6, 2026
76dc968
chore: remove unneeded comments
schivmeister Mar 12, 2026
70dcca7
Merge pull request #23 from meaningfy-ws/feature/TED9-166_extend-gith…
schivmeister Mar 12, 2026
130a4ff
feat: migrate metadata resource files loading to read from MS Config
Mar 2, 2026
b09c067
chore: remove all metadata static files since migration to MS Config
Mar 2, 2026
f26fc1f
chore: add missing mongodb_client docstring
Mar 2, 2026
024f86a
fix: SF normalization with MS Config, MSSDK CSV-Pandas compatibility
Mar 3, 2026
6b6373f
fix(tests): ensure mapping suite/package is loaded for normalisation
schivmeister Mar 4, 2026
d08b0c6
fix: circular bug - decouple MappingFilesRegistry from notice, package
schivmeister Mar 4, 2026
50fa51d
fix(test): remove fake MS config fixture and logic in pipeline test
schivmeister Mar 4, 2026
fc647dd
fix(test): update mapping registry test data for probing model update
schivmeister Mar 10, 2026
8f6f564
fix: missing mongodb_client param in normalise_notice()
schivmeister Mar 10, 2026
ac98d7a
feat: make MS Config mandatory, add Airflow parameter
schivmeister Mar 11, 2026
f5dab09
fix: copy resource files/folders when msconfig branch provided
schivmeister Mar 12, 2026
65d8c5d
test(e2e): exercise config download and check for necessary files
schivmeister Mar 12, 2026
86603c2
test(unit): exercise the msconfig branch download
schivmeister Mar 12, 2026
2bfc323
Merge pull request #47 from meaningfy-ws/feature/TED9-46/parameterize…
schivmeister Mar 12, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
61 changes: 61 additions & 0 deletions .github/workflows/deploy-infrastructure.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
name: Deploy Infrastructure

on:
workflow_call:
secrets:
VAULT_TOKEN:
required: true
VAULT_ADDR:
required: true

env:
VAULT_TOKEN: ${{ secrets.VAULT_TOKEN }}
VAULT_ADDR: ${{ secrets.VAULT_ADDR }}
TRAEFIK_DATA_PATH: /opt/tedsws/traefik
STACK_PATH: src/infra/ted-sws-stack

jobs:
deploy:
name: Deploy Infrastructure
runs-on: tedsws-staging
timeout-minutes: 30

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Setup environment
run: |
# Create .env.staging from Vault (passwords only, everything else hardcoded)
make staging-unified-dotenv

# Runtime-specific values (not in Vault)
echo "AIRFLOW_UID=$(id -u)" >> ${STACK_PATH}/.env.staging
echo "AIRFLOW_IMAGE=tedsws/airflow:staging" >> ${STACK_PATH}/.env.staging

# Setup deployment directory (source/config only — data uses named volumes)
sudo mkdir -p /opt/tedsws ${TRAEFIK_DATA_PATH}/letsencrypt
sudo chown -R $USER:$USER /opt/tedsws
mkdir -p /opt/tedsws/src /opt/tedsws/logs /opt/tedsws/plugins /opt/tedsws/test

# Copy source files
rsync -a --delete src/ /opt/tedsws/src/

- name: Download libraries
run: |
make init-libraries
rsync -a --delete libraries/ /opt/tedsws/libraries/

- name: Build Airflow image
run: |
cp requirements.txt ${STACK_PATH}/airflow/
docker build -t tedsws/airflow:staging ${STACK_PATH}/airflow/

- name: Deploy stack
run: |
docker compose \
-f ${STACK_PATH}/docker-compose.yml \
-f ${STACK_PATH}/docker-compose.staging.yml \
--env-file ${STACK_PATH}/.env.common \
--env-file ${STACK_PATH}/.env.staging \
up -d --force-recreate
43 changes: 43 additions & 0 deletions .github/workflows/deploy-pipeline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
name: Deploy Pipeline Code

on:
workflow_call:
inputs:
restart_worker:
description: 'Restart Airflow worker (needed if ted_sws code changed)'
required: false
default: false
type: boolean

jobs:
deploy:
name: Deploy DAGs and Code
runs-on: tedsws-staging
timeout-minutes: 10

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Sync code to deployment directory
run: |
echo "Syncing DAGs and ted_sws to /opt/tedsws..."
rsync -av --delete src/dags/ /opt/tedsws/src/dags/
rsync -av --delete src/ted_sws/ /opt/tedsws/src/ted_sws/
echo "Sync complete."

- name: Restart Airflow worker
if: ${{ inputs.restart_worker }}
run: |
echo "Restarting Airflow worker to pick up ted_sws changes..."
docker restart airflow-worker
echo "Worker restarted."

- name: Verify deployment
run: |
echo "=== Deployed files ==="
echo "DAGs:"
ls -la /opt/tedsws/src/dags/ | head -10
echo ""
echo "ted_sws modules:"
ls -la /opt/tedsws/src/ted_sws/ | head -10
119 changes: 119 additions & 0 deletions .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
name: Deploy TED-SWS Stack

on:
push:
branches:
- develop
paths:
- 'src/infra/**'
- 'src/dags/**'
- 'src/ted_sws/**'
- 'requirements.txt'
- 'Makefile'
- '.github/workflows/deploy*.yml'

concurrency:
group: deploy-${{ github.ref }}
cancel-in-progress: false

jobs:
changes:
name: Detect Changes
runs-on: tedsws-staging
timeout-minutes: 5
outputs:
stack: ${{ steps.filter.outputs.stack }}
pipeline: ${{ steps.filter.outputs.pipeline }}
steps:
- uses: actions/checkout@v4
- uses: dorny/paths-filter@v3
id: filter
with:
base: ${{ github.ref }}
filters: |
stack:
- 'src/infra/**'
- 'requirements.txt'
- 'Makefile'
- '.github/workflows/deploy-infrastructure.yml'
pipeline:
- 'src/dags/**'
- 'src/ted_sws/**'
- '.github/workflows/deploy-pipeline.yml'

stack:
name: Deploy TED-SWS Stack
needs: changes
if: ${{ needs.changes.outputs.stack == 'true' }}
uses: ./.github/workflows/deploy-infrastructure.yml
secrets:
VAULT_TOKEN: ${{ secrets.VAULT_TOKEN }}
VAULT_ADDR: ${{ secrets.VAULT_ADDR }}

tests:
name: Run Tests
needs: [changes, stack]
if: ${{ always() && needs.changes.outputs.pipeline == 'true' && (needs.stack.result == 'success' || needs.stack.result == 'skipped') }}
uses: ./.github/workflows/tests.yml
with:
full_tests: true
secrets:
VAULT_TOKEN: ${{ secrets.VAULT_TOKEN }}
VAULT_ADDR: ${{ secrets.VAULT_ADDR }}
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}

pipeline:
name: Deploy TED-SWS Pipeline
needs: [changes, stack, tests]
if: ${{ always() && needs.changes.outputs.pipeline == 'true' && needs.tests.result == 'success' }}
uses: ./.github/workflows/deploy-pipeline.yml
with:
restart_worker: true

health-check:
name: Health Check
needs: [stack, pipeline]
if: always()
runs-on: tedsws-staging
timeout-minutes: 5

steps:
- name: Check all services
run: |
echo "Waiting for services to stabilize..."
sleep 10

failed=0

echo "=== Checking Traefik ==="
docker ps | grep -q "traefik" && echo "✓ Traefik" || { echo "✗ Traefik"; failed=1; }

echo "=== Checking Data Stores ==="
docker ps | grep -q "mongodb" && echo "✓ MongoDB" || { echo "✗ MongoDB"; failed=1; }
docker ps | grep -q "fuseki" && echo "✓ Fuseki" || { echo "✗ Fuseki"; failed=1; }
docker ps | grep -q "minio" && echo "✓ MinIO" || { echo "✗ MinIO"; failed=1; }
docker ps | grep -q "sftp" && echo "✓ SFTP" || { echo "✗ SFTP"; failed=1; }

echo "=== Checking Airflow ==="
docker ps | grep -q "airflow-webserver" && echo "✓ Airflow webserver" || { echo "✗ Airflow webserver"; failed=1; }
docker ps | grep -q "airflow-scheduler" && echo "✓ Airflow scheduler" || { echo "✗ Airflow scheduler"; failed=1; }
docker ps | grep -q "airflow-worker" && echo "✓ Airflow worker" || { echo "✗ Airflow worker"; failed=1; }

echo "=== Checking APIs ==="
docker ps | grep -q "metabase" && echo "✓ Metabase" || { echo "✗ Metabase"; failed=1; }

echo ""
echo "=== All containers ==="
docker ps --format "table {{.Names}}\t{{.Status}}"

exit $failed

- name: Cleanup old Docker resources
if: always()
run: |
echo "=== Cleaning up old Docker resources ==="
docker system prune -f --filter "until=24h"
echo ""
echo "=== Disk usage after cleanup ==="
df -h /
docker system df
153 changes: 153 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
name: Tests

on:
push:
branches: [feature/**, hotfix/**, release/**]
pull_request:
workflow_call:
inputs:
full_tests:
description: 'Run full test suite (including integration tests)'
type: boolean
default: true
secrets:
VAULT_TOKEN:
required: true
VAULT_ADDR:
required: true
SONAR_TOKEN:
required: false

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

env:
VAULT_TOKEN: ${{ secrets.VAULT_TOKEN }}
VAULT_ADDR: ${{ secrets.VAULT_ADDR }}

jobs:
test:
name: ${{ (github.event_name == 'pull_request' || inputs.full_tests) && 'Full Tests' || 'Unit Tests' }}
runs-on: ${{ (github.event_name == 'pull_request' || inputs.full_tests) && 'tedsws-staging' || 'ubuntu-latest' }}

steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Set up Python 3.10
uses: actions/setup-python@v5
with:
python-version: '3.10'

- name: Set up Java
if: github.event_name == 'pull_request' || inputs.full_tests
uses: actions/setup-java@v4
with:
distribution: 'temurin'
java-version: '11'

- name: Install Vault
if: github.event_name == 'pull_request' || inputs.full_tests
run: |
if ! command -v vault &> /dev/null; then
wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt-get update && sudo apt-get install -y vault
else
echo "Vault already installed: $(vault --version)"
fi

- name: Install system dependencies
run: |
if ! dpkg -s libssl-dev &> /dev/null; then
sudo apt-get update && sudo apt-get install -y libssl-dev libcurl4-openssl-dev
else
echo "System dependencies already installed"
fi

- name: Install Python dependencies
run: |
python -m pip install --upgrade setuptools pip wheel
make install
make install-dev

- name: Resolve Docker container hostnames
if: github.event_name == 'pull_request' || inputs.full_tests
run: |
# Add /etc/hosts entries for all containers on the unified stack network
for entry in $(docker network inspect tedsws-internal -f '{{range .Containers}}{{.Name}}:{{.IPv4Address}} {{end}}'); do
name="${entry%%:*}"
ip="${entry#*:}"
ip="${ip%/*}"
echo "$ip $name" | sudo tee -a /etc/hosts
done
echo "=== /etc/hosts entries ==="
grep -vE "^#|^$|localhost" /etc/hosts | tail -20
echo "=== Connectivity check (Docker services) ==="
FAIL=0
curl -sf --max-time 5 http://fuseki:3030 > /dev/null && echo "OK: fuseki:3030" || { echo "FAIL: fuseki:3030"; FAIL=1; }
nc -z -w5 minio 9000 && echo "OK: minio:9000" || { echo "FAIL: minio:9000"; FAIL=1; }
nc -z -w5 mongodb 27017 && echo "OK: mongodb:27017" || { echo "FAIL: mongodb:27017"; FAIL=1; }
nc -z -w5 sftp 22 && echo "OK: sftp:22" || { echo "FAIL: sftp:22"; FAIL=1; }
if [ "$FAIL" -eq 1 ]; then
echo "::error::Required Docker services are not reachable"
exit 1
fi

- name: Create env file
run: |
if [ "${{ github.event_name }}" == "pull_request" ] || [ "${{ inputs.full_tests }}" == "true" ]; then
# Merge app config (.env.common) + Vault secrets (.env.staging)
make staging-unified-dotenv
cat src/infra/ted-sws-stack/.env.common src/infra/ted-sws-stack/.env.staging > .env
# Use host-local libraries (downloaded by make init-libraries)
echo "RML_MAPPER_PATH=$(pwd)/libraries/.rmlmapper/rmlmapper.jar" >> .env
echo "XML_PROCESSOR_PATH=$(pwd)/libraries/.saxon/saxon-he-10.9.jar" >> .env
echo "LIMES_ALIGNMENT_PATH=$(pwd)/libraries/.limes/limes.jar" >> .env
else
echo "VAULT_TOKEN=${{ secrets.VAULT_TOKEN }}" >> .env
echo "VAULT_ADDR=${{ secrets.VAULT_ADDR }}" >> .env
fi

- name: Preflight check (external endpoints)
if: github.event_name == 'pull_request' || inputs.full_tests
run: |
set -a && source .env && set +a
FAIL=0
curl -so /dev/null --max-time 5 "$TED_API_URL" && echo "OK: TED API ($TED_API_URL)" || { echo "FAIL: TED API ($TED_API_URL)"; FAIL=1; }
curl -so /dev/null --max-time 5 "$ALLEGRO_HOST" && echo "OK: AllegroGraph ($ALLEGRO_HOST)" || { echo "FAIL: AllegroGraph ($ALLEGRO_HOST)"; FAIL=1; }
curl -so /dev/null --max-time 5 "http://fuseki:3030/test_limes" && echo "OK: Fuseki test_limes" || { echo "FAIL: Fuseki test_limes dataset"; FAIL=1; }
if [ "$FAIL" -eq 1 ]; then
echo "::error::Required external endpoints are not reachable"
exit 1
fi

- name: Get Java tools (Saxon, Limes, RML mapper)
if: github.event_name == 'pull_request' || inputs.full_tests
run: make init-libraries

# Using python -m tox instead of make targets to avoid conflict with
# outdated system tox on self-hosted runner (/home/lps/.local/bin/tox)
- name: Run tests
run: ${{ (github.event_name == 'pull_request' || inputs.full_tests) && 'python -m tox' || 'python -m tox -e unit' }}

- name: SonarCloud Scan
uses: SonarSource/sonarqube-scan-action@v6
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}

- name: Cleanup
if: always()
run: |
echo "=== Cleaning up test artifacts ==="
rm -rf .tox .pytest_cache __pycache__ .coverage coverage.xml
find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true
find . -type f -name "*.pyc" -delete 2>/dev/null || true
# Remove Docker container host entries added during this run
docker network inspect tedsws-internal -f '{{range .Containers}}{{.Name}}{{"\n"}}{{end}}' 2>/dev/null | \
xargs -I{} sudo sed -i '/\b{}\b/d' /etc/hosts || true
echo "=== Disk usage ==="
df -h /
Loading