Skip to content

Commit fb02f67

Browse files
committed
refactored and organised documentation
1 parent bb2c9db commit fb02f67

21 files changed

Lines changed: 19411 additions & 2347 deletions

docs/antora/modules/ROOT/attachments/FATs/2023-02-20-TED-SWS-FAT-complete.html

Lines changed: 17158 additions & 0 deletions
Large diffs are not rendered by default.

docs/antora/modules/ROOT/attachments/aws-infra-docs/TED-SWS Installation manual v2.0.2.pdf renamed to docs/antora/modules/ROOT/attachments/aws-infra-docs/TED-SWS-Installation-manual-v2.0.2.pdf

File renamed without changes.

docs/antora/modules/ROOT/attachments/aws-infra-docs/TED-SWS Installation manual v2.5.0.pdf renamed to docs/antora/modules/ROOT/attachments/aws-infra-docs/TED-SWS-Installation-manual-v2.5.0.pdf

File renamed without changes.

docs/antora/modules/ROOT/nav.adoc

Lines changed: 28 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,30 @@
11

22
* xref:index.adoc[Home]
3-
* link:{attachmentsdir}/ted-sws-architecture/index.html[Preliminary Project Architecture^]
4-
* xref:mapping_suite_cli_toolchain.adoc[Mapping Suite CLI Toolchain]
5-
* xref:demo_installation.adoc[Instructions for Software Engineers]
6-
* xref:user_manual.adoc[User manual]
7-
* xref:system_arhitecture.adoc[System architecture overview]
8-
* xref:using_procurement_data.adoc[Using procurement data]
3+
4+
* [.separated]#**General References**#
5+
** xref:ted-sws-introduction.adoc[About TED-SWS]
6+
** xref:glossary.adoc[Glossary]
7+
8+
* [.separated]#**For TED-SWS Operators**#
9+
** xref:user_manual/getting_started_user_manual.adoc[Getting started]
10+
** xref:user_manual/system-overview.adoc[System overview]
11+
** xref:user_manual/access-security.adoc[Security and access]
12+
** xref:user_manual/workflow-management-airflow.adoc[Workflow management with Airflow]
13+
** xref:user_manual/system-monitoring-metabase.adoc[System monitoring with Metabase]
14+
15+
* [.separated]#**For DevOps**#
16+
17+
** link:{attachmentsdir}/aws-infra-docs/TED-SWS-Installation-manual-v2.5.0.pdf[AWS installation manual (v2.5.0)^]
18+
** link:{attachmentsdir}/aws-infra-docs/TED-SWS-AWS-Infrastructure-architecture-overview-v0.9.pdf[AWS infrastructure architecture (v0.9)^]
19+
20+
* [.separated]#**For End User Developers**#
21+
** xref:ted_data/using_procurement_data.adoc[Accessing data in Cellar]
22+
** link:https://docs.ted.europa.eu/EPO/latest/index.html[eProcurement ontology (latest)^]
23+
24+
* [.separated]#**For TED-SWS Developers**#
25+
** xref:technical/mapping_suite_cli_toolchain.adoc[Mapping suite toolchain]
26+
** xref:technical/demo_installation.adoc[Development installation instructions]
27+
** xref:technical/event_manager.adoc[Event manager description]
28+
** xref:architecture/arhitecture_choices.adoc[System architecture overview]
29+
** link:{attachmentsdir}/ted-sws-architecture/index.html[Enterprise architecture model^]
30+
** xref:architecture/arhitecture_choices.adoc[Architectural choices]

docs/antora/modules/ROOT/pages/architecture/arhitecture_choices.adoc

Lines changed: 351 additions & 0 deletions
Large diffs are not rendered by default.

docs/antora/modules/ROOT/pages/architecture/arhitecture_overview.adoc

Lines changed: 476 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
== Future work
2+
3+
In the future, another Master Data Registry type system will be used to
4+
deduplicate entities in the TED-SWS system, which will be implemented
5+
according to the requirements for deduplication of entities from
6+
notices.
7+
8+
The future Master Data Registry (MDR) system for entity deduplication
9+
should have the following architecture:
10+
11+
[arabic]
12+
. *Data Ingestion*: This component is responsible for extracting and
13+
collecting data from various sources, such as databases, files, and
14+
APIs. The data is then transformed, cleaned, and consolidated into a
15+
single format before it is loaded into the MDR.
16+
17+
. *Data Quality*: This component is responsible for enforcing data quality
18+
rules, such as format, completeness, and consistency, on the data before
19+
it is entered into the MDR. This can include tasks such as data
20+
validation, data standardization, and data cleansing.
21+
22+
. *Entity Dedup*: This component is responsible for identifying and
23+
removing duplicate entities in the MDR. This can be done using a
24+
combination of techniques such as string-based, machine learning-based,
25+
or knowledge-based methods.
26+
27+
. *Data Governance*: This component is responsible for ensuring that the
28+
data in the MDR is accurate, complete, and up-to-date. This can include
29+
processes for data validation, data reconciliation, and data
30+
maintenance.
31+
32+
. *Data Access and Integration*: This component provides access to the MDR
33+
data through a user interface and API's, and integrates the MDR data
34+
with other systems and applications.
35+
36+
. *Data Security*: This component is responsible for ensuring that the
37+
data in the MDR is secure, and that only authorized users can access it.
38+
This can include tasks such as authentication, access control, and
39+
encryption.
40+
41+
. *Data Management*: This component is responsible for managing the data
42+
in the MDR, including tasks such as data archiving, data backup, and
43+
data recovery.
44+
45+
. *Monitoring and Analytics*: This component is responsible for monitoring
46+
and analysing the performance of the MDR system, and for providing
47+
insights into the data to help improve the system.
48+
49+
. *Services layer*: This component is responsible for providing services
50+
such as, indexing, search and query functionalities over the data.
51+
52+
53+
All these components should be integrated and work together to provide a
54+
comprehensive and efficient MDR system for entity deduplication. The
55+
system should be scalable and flexible enough to handle large amounts of
56+
data and adapt to changing business requirements.
57+
58+
59+
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
== Glossary
2+
3+
*Airflow* - an open-source platform for developing, scheduling, and
4+
monitoring batch-oriented pipelines. The web interface helps manage the
5+
state and monitoring of your pipelines.
6+
7+
*Metabase* - is the BI tool with the friendly UX and integrated tooling
8+
to let you explore data gathered by running the pipelines available in
9+
Airflow.
10+
11+
*Cellar* - is the central content and metadata repository of the
12+
Publications Office of the European Union
13+
14+
*TED-SWS* - is a pipeline system that continuously converts the public
15+
procurement notices (in XML format) available on the TED Website into
16+
RDF format and publishes them into CELLAR
17+
18+
*DAG* - (Directed Acyclic Graph) is the core concept of Airflow,
19+
collecting Tasks together, organized with dependencies and relationships
20+
to say how they should run. The DAGS are basically the pipelines that
21+
run in this project to get the public procurement notices from XML to
22+
RDF and to be published them into CELLAR.
23+
Lines changed: 20 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,6 @@
11
= TED-RDF Conversion Pipeline Documentation
22

3-
The TED-RDF Conversion Pipeline, which is part of the TED Semantic Web Services, aka TED-SWS system, provides tools an infrastructure to convert TED notices available in XML format into RDF. This conversion pipeline is designed to work with the https://docs.ted.europa.eu/rdf-mapping/index.html[TED-RDF Mappings].
4-
5-
== Quick references for users
6-
7-
* xref:mapping_suite_cli_toolchain.adoc[Installation and usage instructions for the Mapping Suite CLI toolchain]
8-
* link:{attachmentsdir}/ted-sws-architecture/index.html[Preliminary project architecture (in progress)^]
9-
10-
11-
== Developer pages
12-
13-
xref:demo_installation.adoc[Installation instructions for development and testing for software engineers]
14-
15-
xref:attachment$/aws-infra-docs/TED-SWS-AWS-Infrastructure-architecture-overview-v0.9.pdf[TED-SWS AWS Infrastructure architecture overview v0.9]
16-
17-
xref:attachment$/aws-infra-docs/TED-SWS Installation manual v2.5.0.pdf[TED-SWS AWS Installation manual v2.5.0]
3+
The TED-RDF Conversion Pipeline, is part of the TED Semantic Web Services (TED-SWS system) and provides tools an infrastructure to convert TED notices available in XML format into RDF. This conversion pipeline is designed to work with the https://docs.ted.europa.eu/rdf-mapping/index.html[TED-SWS Mapping Suites] - self containing packages with transformation rules and resources.
184

195
== Project roadmap
206

@@ -23,12 +9,29 @@ xref:attachment$/aws-infra-docs/TED-SWS Installation manual v2.5.0.pdf[TED-SWS A
239

2410
| Phase 1 | The first phase places high priority on the deployment into the OP AWS Cloud environment.| August 2022 | xref:attachment$/FATs/2022-08-29-report/index.html[2022-08-29 report] | 29 August 2022 | link:https://github.com/OP-TED/ted-rdf-conversion-pipeline/releases/tag/0.0.9-beta[0.0.9-beta]
2511
| Phase 2 | Provided that the deployment in the acceptance environment is successful, the delivery of Phase 2 aims to provide the first production version of the TED SWS system. | Nov 2022 | xref:attachment$/FATs/2022-11-22-TED-SWS-FAT-complete.html[2022-11-22 report] | 20 Nov 2022 | https://github.com/OP-TED/ted-rdf-conversion-pipeline/releases/tag/1.0.0-beta[1.0.0-beta]
26-
| Phase 3 | This phase delivers the documentation and components and improvements that could not be covered in the previous phases. | Feb 2023 | --- | --- | ---
27-
12+
| Phase 3 | This phase delivers the documentation and components and improvements that could not be covered in the previous phases. | Feb 2023 | xref:attachment$/FATs/2023-02-20-TED-SWS-FAT-complete.html[2023-02-20 report] | 21 Feb 2023 | https://github.com/OP-TED/ted-rdf-conversion-pipeline/releases/tag/1.1.0-beta[1.1.0-beta]
2813
|===
2914

3015

3116

3217

3318

3419

20+
//
21+
// == Quick references for Developers
22+
//
23+
// == Quick references for DevOps
24+
//
25+
// == Quick references for TED-SWS Developers
26+
//
27+
// * xref:mapping_suite_cli_toolchain.adoc[Installation and usage instructions for the Mapping Suite CLI toolchain]
28+
// * link:{attachmentsdir}/ted-sws-architecture/index.html[Preliminary project architecture (in progress)^]
29+
//
30+
//
31+
// == Developer pages
32+
//
33+
// xref:demo_installation.adoc[Installation instructions for development and testing for software engineers]
34+
//
35+
// xref:attachment$/aws-infra-docs/TED-SWS-AWS-Infrastructure-architecture-overview-v0.9.pdf[TED-SWS AWS Infrastructure architecture overview v0.9]
36+
//
37+
// xref:attachment$/aws-infra-docs/TED-SWS Installation manual v2.5.0.pdf[TED-SWS AWS Installation manual v2.5.0]

0 commit comments

Comments
 (0)