Feature/ted9 47 enhance the e forms eligibility checking component#589
Open
kaleanych wants to merge 343 commits intoOP-TED:mainfrom
Open
Feature/ted9 47 enhance the e forms eligibility checking component#589kaleanych wants to merge 343 commits intoOP-TED:mainfrom
kaleanych wants to merge 343 commits intoOP-TED:mainfrom
Conversation
Feature/sws2 18
Feature/sws2 24
This reverts commit b45f6f8.
Use the one-stop MSSKD service to detect, convert and load packages of any given version, normalizing to a unified "v3". This yields support also for v3L, aka lightweight, as v3 is a superset (the lightweight variant excludes all data except bare transformation necessities). Standard Forms and eForms are henceforth "v1" and "v2", respectively. The pipeline native model is now an MSSDK v3-extended one, with the JSONLD being the canonical metadata model. Not only is there no equivalent in older models for this and the accompanying `context.jsonld`, the datetime datatype also needs special handling/conversion when used in legacy contexts. A key distinguishing feature of the new unified package is the complete refactor of the constraints model, removing one level of nesting but also adding more structure and possibilities with one model (like a range of document schema versions as seen in v1 or a list of such as seen in v2). Repurposing these constraints for legacy contexts therefore needs extra care, if not refactored completely. Recap of model differences from v1/v2 to v3: - `identifier` -> `id` - `issue_date (str)` -> `created_at (datetime)` - `ontology_version` -> `model_version` - `metadata_constraints.constraints` -> `applicability_constraints` - `eforms_subtype` -> `document_type_list` - `start_date/end_date` -> `document_time_interval.start/end` - `min/max_xsd_versions` -> `document_version_range.min/max` - `eforms_sdk_versions` -> `document_schema_version_list` Note that _applicability constraints_ is a package perspective -- the same constraints are to be interpreted by the pipeline as _eligibility constraints_ for a notice. There is an additional transitional field `project_identifier`, which stands in for the `mapping_type`, but only barely. This interpretation may be deprecated at any point, but not before support is added for alternative detection mechanisms.
- sparql_test_suites <-> test_suites_sparql - shacl_test_suites <-> test_suites_shacl This also fixes some tests that rely on these prerequisite validation data.
Fix the test assumptions by reducing the number of test data to match. In the case of the Standard Forms (v1) package `package_F03_test`, there are 105 test data files, of which 82 are unique. However, only 81 of them are to be found within folders under `test_data`, whereas the rest, including a unique one `example.xml`, is not contained within a folder.
Pass the type along as it does not matter as much anymore since we normalize to MSSDK v3 and the native pipeline model is an extension of it.
We now delegate to the MSSDK for validation, which is carried out during the package parsing/loading.
If an error occurs in the package loading, due to validation or other failures, simply forward the error and continue loading the other packages.
…ub-download-packages [TEDSWS-232] Breaking: Transition to MSSDK for package loading/saving
- pass MongoDB client to normalise_notice function - reparse MSSDK CSV list object w/ Pandas to reinterpret numbers - update tests
Tests were failing with ModelNotFoundError because: - Notice fixtures didn't set mapping_package_identifier - Mapping suite/package weren't loaded into test MongoDB instances - normalise_notice() calls didn't pass mongodb_client parameter Changes: - Add load_mapping_suite_and_package fixture to features/conftest.py - Update notice fixtures to set mapping_package_identifier - Pass mongodb_client to normalise_notice() in test steps - Add load_mapping_suite_and_package_fake for e2e tests using mongomock - Update e2e fixtures to link GitHub-loaded packages to local mapping suite Fixes 30+ e2e/feature tests that were failing after metadata resource refactoring with dynamic MS Config loading via MSSDK.
There was a hidden circular dependency in the metadata resource migration to MS Config via MSSDK. The previous design required a notice with `mapping_package_identifier` to load resources, but this created a circular dependency: normalisation needs resources, yet eligibility checking (which returns a package identifier but does not set one on the notice) needs normalised metadata. Initial assumptions may have been anchored on the resources being project-specific. However, this is problematic as not all projects may be updated with the mapping suite configuration. Therefore, resource files (country.json, languages.json, etc.) can be interpreted to be global for now during the transition period. Once all currently known production projects are updated with the configuration, a more dynamic method to select the mapping suite can be implemented, for e.g. via the `document_probing` conditions specified in the config, which defines what XPaths must and must not be available to be compatible with the project. Changes: - MappingFilesRegistry now loads resources from any available MappingSuite - Removed notice parameter from DefaultNoticeMetadataNormaliser and EformsNoticeMetadataNormaliser constructors - Updated find_metadata_normaliser_based_on_xml_manifestation() and extract_and_normalise_notice_metadata() to not require notice - Added MappingSuiteConfigError for when no MappingSuite is available - Updated all test fixtures to use the new API - Remove all traces and dependence on a Notice mapping_package_identifier TODO: The mapping suite must be made mandatory and be fetched from a default known project with the configuration if not given.
c04eab2 to
9e12e0c
Compare
The actual fetch of the github repo would get no MS config, and the fake would be adding one. There appears to inconsistency in this test passing locally but failing on the server, so let us remove the MS config part.
This is required for passing the mongodb client to the MappingFilesRegistry, which picks up mapping metadata resource files from the MS config. Without this there is a mismatch in the mongodb client in tests, whose first entrypoint usually gets a mock, but in this case, the normalisation would've defaulted to a real one retrieved from the environment.
9e12e0c to
8f6f564
Compare
If the mapping suite `config` folder is not found in the repository and branch specified, the package loading will fail. We allow now an additional, optional parameter to specify a second branch from which the config is available. This is useful for cases where a specific tag or release needs to be loaded but the config is in a later tag/release/commit.
Without the needed resource files, the copied config is useless.
…-msconfig Make MS Config mandatory, add Airflow parameter
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.