-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathdata-origin.tex
More file actions
618 lines (484 loc) · 31.3 KB
/
data-origin.tex
File metadata and controls
618 lines (484 loc) · 31.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
\documentclass[11pt,a4paper]{ivoa}
\input tthdefs
\lstset{flexiblecolumns=true}
\usepackage{todonotes}
\usepackage{array}
\usepackage{float}
\marginparwidth=4cm
\title{Data Origin in the VO}
% see ivoatexDoc for what group names to use here
\ivoagroup{DCP}
\author{G.Landais}
\author{A.Muench}
\author{M.Demleitner}
\author{R.Savalle}
\editor{G.Landais}
%\previousversion{NOTE-20230522}
\previousversion{NOTE-DataOrigin-1.1-20240126}
\previousversion{NOTE-DataOrigin-1.0-20230522}
%\previousversion{This is the first public release}
\begin{document}
\begin{abstract}
Data Origin in the VO identifies a set of metadata items that define basic
provenance information, as well as their representation in documents produced
by Virtual Observatory (VO) services. This will improve traceability for VO
users, help them to understand result sets and facilitate data reuse and citation.
\end{abstract}
\section*{Acknowledgments}
Alberto Accomazzi (ADS), Anne Catherine Raugh (University of Maryland), Rafaele d'Abrusco (CfA), Mihaela Buga (CDS), Mathieu Servillat (PObsParis), Nicolas Moreau (ObsParis)
\section*{Conformance-related definitions}
\section{Introduction}
Information on the origin of a piece of data is important for end users to understand data, for meaningful data citation and to improve reusability. It is a part of provenance, which in turn is as a mandatory criterion in the GOFair\footnote{https://www.go-fair.org/} or RDA FAIR definition\footnote{https://doi.org/10.15497/rda00050}.
The Virtual Observatory (VO) provides an advanced framework to search for, query, and consume astronomical data. The specification of Data Origin proposed here for VOTable output includes both metadata originating at the data producer (e.g, author, space agency, observatory) and at the data centre (publisher) hosting the resource.
At this point, depending on the implementation, users can find the information conveyed in Data Origin in the data centre web pages (landing pages) or in the VO Registry. For citation, the ADS (NASA Astrophysics Data System) offers comprehensive bibliographic capabilities, including the production of BibTeX records for publications known to ADS. However, there are no VO standards to communicate this type of information yet.
A list of basic data origin metadata, reliably findable in a convenient location (i.e.,
the VOTable produced by a query) will help users to properly cite or
acknowledge the data resources contributing to new or derived works.
Tracing Data Origin, from the producer of the query to the production of the response, also allows an end user to determine the different agents that contribute to data curation (authors, data centre, space agencies, journal), which is particularly helpful when debugging. A typical scenario here is when mirrored data is the subject to potentially differing curation actions in the different publication processes.
The list of metadata items proposed here is designed to meet the needs of basic provenance
tracking when using current VO protocols.\\
The remainder of this note first presents the use cases guiding this
effort, then briefly lists the kind of metadata Data Origin concerns.
The core of the specification is in
section~\ref{sec:data-origin-in-votable}, which
describes the VOTable serialisation.
Non-normatively, in appendix~\ref{sec:appendixB}, we give a mapping of Data Origin items to entities from the IVOA Registry schema, and a citation template in appendix \ref{sec:appendixC}
that illustrates how data origin information can be used in practice.
\section{Use cases}
\subsection{Data Origin information}
Scenario: Researchers have data in a VOTable that shows an odd feature. They would now like to investigate whether that feature is physical or an artefact.
Derived requirements:
\begin{itemize}
\item Contact information for producers must be present.
\item Researchers would like to clearly identify defined roles in well-known places regardless of the service which generated the result.
\item When data provided by the service is derived from external resources, those external resources are clearly identified. In that case, additional curation applied by the publisher can be detected.
\end{itemize}
For instance, a table published in a journal or by a space agency may also be hosted in multiple data centres. The details of the table schema may depend on the data centre, which can add associated data, enrich metadata, or make a sub-selection of columns.
\subsection{Reproducibility}
A researcher revisits work they did six months earlier in an ad-hoc fashion and would now like to reproduce it in a more structured way. To do that, they need to know, say, which queries against which services, or perhaps which programs, produced the files.
Derived requirement: The request parameters and a service identification
(perhaps an ivoid in a narrower VO context, an access URL beyond that) must be available.
\subsection{Citation}
\label{sec:req-citation}
While preparing a publication, a researcher would like to properly cite the software and data that went into their results. They now run a program to extract that information from the digital artefacts going into the publication -- perhaps even in separate parts of citations and acknowledgments.
The type of information that would go into such a
citation can be assessed from this acknowledgement following a
convention of the American Astronomical Society:
\begin{quotation}
We searched optical astrometric data of these sources from the Gaia (Gaia Collaboration et al. 2016) Early Data Release 3 (Gaia Collaboration et al. 2021) via the CDS archive.
\end{quotation}
Derived requirement: The Data Origin must indicate requests for citation
and/or acknowledgment in a machine-readable way, preferably in a way
that machines can generate BibTeX entries for whatever they specify.
\subsection{Workflow bibliography}
A researcher has used a workflow engine to solve a complex science
problem. They now want to create a bibliography of everything that was
used to obtain the end result of the computation.
Derived requirement: Essentially as for use case~\ref{sec:req-citation},
except that the metadata extracted needs a higher level of homogeneity
and that relationships declared between different parts of Data Origin
might be useful.
\section{State of the Art}
Neither VOTable \citep{2025ivoa.spec.0116O} nor IVOA data access protocols at this point provide standard facilities for conveying Data Origin information. While protocols such as TAP \citep{2019ivoa.spec.0927D} have standard interfaces to retrieve table metadata (e.g., unit, type and description of columns) or metadata on service endpoints (``capabilities'') by virtue of providing VOSI \citep{2017ivoa.spec.0524G} endpoints, for basic metadata like authors or publication dates, clients have to consult the VO Registry. Even that may be difficult, because you cannot in general obtain its IVOA identifier from a service itself.
HiPS \citep{2017ivoa.spec.0519F} is a protocol which includes for each dataset a list of standardized metadata. HiPS metadata includes authors, publication year, data centre identifier or licenses.
\begin{figure}
\centering
\includegraphics[width=0.9\textwidth]{fig-ext-ids.pdf}
\caption{Usage of DOIs, ORCIDs, and Bibcodes in a sketch of a
VOResource instance document.}
\label{fig:extids}
\end{figure}
\subsection{Data Origin in IVOA Registry}
The IVOA Registry contains metadata relevant for Data Origin such as authors,
publication dates,
or relationships to other resources. Of particular relevance for Data
Origin are those latter relationships using external identifier schemes
more persistent than their IVOA's own: DOIs, ORCIDs, or Bibcodes (see
Fig.~\ref{fig:extids}). The metadata schema is mainly defined by
VOResource \citep{2018ivoa.spec.0625P} and VODataService
\citep{2021ivoa.spec.1102D}, which also define an XML serialisation for it.
The Registry makes this information available through several interfaces, partly
hosted by the data centres themselves, partly provided by a central
infrastructure.
Since the VO Registry is an open framework without any central curators,
there are no guarantees as to the continued availability or the resources or even their descriptions; the Registry is
specifically \emph{not} designed as a persistent metadata repository
that artefacts could reliably refer to as metadata sources.
The IVOA Registry uses a unique identifier, the IVOID
\citep{2016ivoa.spec.0523D}, as the primary key for its resource
collection. By the above considerations, this IVOID is not suitable as a means of citation.
Both the Registry's metadata schema and the DataCite
\citep{std:DataCite40} metadata schema have been
derived from Dublin Core \citep{std:DUBLINCORE}. While the extensions differ in detail, it is not
hard to write a (not unreasonably lossy) mapping between the two metadata schemas. This can enable
sustainable citation by enrolling VO resources in DataCite's persistence
mechanisms. Even so, the provision of extensive in-document metadata
has clear usability and practicality advantages, not to mention that many
pieces of data origin are unsuitable for a general metadata schema like
DataCite's.
\subsection{Data Origin and Provenance}
%The Provenance \citep{2020ivoa.spec.0411S} and Dataset Data Models can
%be used to express Data Origin.
%Data origins information is intended to be provided in the results of queries. This information can be used to populate steps in a Provenance workflow.
%Dataset Origin (see \ref{sec:dataset-origin}) can be serialized with Entities and Agent. The query information including information such as URL and parameters (see \ref{sec:query-information}) can be set with the configuration extension of the Provenance DM of the Virtual Observatory \citep{2020ivoa.spec.0411S}.
Data Origin information is intended to be provided in the results of queries.
DataOrigin records can be used to build a provenance graph describing how to
get the main entity from a resource that is the data of origin, typically in one step
(an entity generated by an activity that used another entity as origin).
This mapping is illustrated in a VO Provenance Model \citep{2020ivoa.spec.0411S} in appendix
%The Provenance Data Model \citep{2020ivoa.spec.0411S} is based on Entities, Agents and Activities as defined in the W3C Provenance model. The model's main focus is the detailed documentation of workflows.
%For the serialisation of ProvDM instances within VOTables, MIVOT \citep{2023ivoa.spec.0620M} is available. At this point, however, the relatively complex model and many free parameters are obstacles for a wide and direct adoption of ProvDM+MIVOT to represent Data Origin, in particular when compared to the very straightforward mechanisms proposed here.
%``Last-Step-Provenance'' is a Provenance extension currently under discussion which would define a list of metadata corresponding to Data Origin. Its output will not be recursive and could be easily serialized in a table.\todo{If we write this here, everyone will ask: Well, so why don't we wait for that? Perhaps we ought to just drop this?}
%Other initiatives, in working progress, such as ``DatasetDM'', or
%``Last-Step-Provenance'' show the growing interest of adding pieces of
%provenance to published datasets.
%The metadata listed in Data Origin can be a reference for current
%and future models interested by the information.
\subsection{DALI}
DALI \citep{2017ivoa.spec.0517D} defines common conventions for all
modern IVOA data access protocols.
%A part of it defines in-band signalling of error conditions or overflows -- an important part of Data Origin -- for VOTables.
It also defines bespoke names for \xmlel{INFO} elements used to convey
Data Origin-type metadata, in particular \emph{citation} and \emph{standardID}. In a sense,
this specification is an extension of the mechanism defined in DALI.
\section{Expected Data Origin}
% added 23-nov-2023
This document lists atomic metadata items that server software should
generate when producing query responses for VO clients.
It includes reproducibility metadata (see Table~\ref{tab:query-names})
that reflects the context in which a query was executed. The information
includes metadata allowing users to execute the query again as well as
metadata that will aid debugging in case a later execution of the
query does yield different results, such as versions of data and software
or the execution
date, which is particularly relevant when the resources' data content
can evolve.
The information is complemented by provenance metadata (see Table~\ref{tab:origin-names}) like authors, licence, references, or identifiers
for the resource or related resources (which could be IVOIDs, Bibcodes, or DOIs).
Much of this information is already provided by the VO Registry.
While giving relevant IVOID(s) would therefore in principle go a long way
towards supplying relevant metadata, as discussed above for
persistent availability of the metadata, a serialisation
directly into the VO output is desirable
(see Table~\ref{sec:data-origin-in-votable}).
\section{Data Origin in VOTable}
\label{sec:data-origin-in-votable}
The metadata listed below combines terms from DALI \citep{2017ivoa.spec.0517D}, Dublin Core \citep{std:DUBLINCORE} and extensions in order to provide Data Origin information to end users.
\paragraph{Note about the following items.}
While it is desirable that publishers provide the full set of
metadata,
%we consider the following items the minimally viable set:
we consider the following items as having the highest impact (for reuse or citation):
ivoid, publisher, service\_protocol, request, request\_date,
citation, resource\_version, rights\_uri, creator, publication\_date,
last\_update\_date.
\subsection{Query information}
\label{sec:query-information}
Table~\ref{tab:query-names} lists the metadata items defined here to
convey query-related information in Data Origin.
These pieces of information enable linking Registry records to the
current result and, to some extent, reproducing the query executed.
For queries against evolving datasets, the request\_date item clearly is
particularly important.
\begin{table}[h]
\hbox to\textwidth{\hss
\begin{tabular}{|l|>{\raggedright}p{7cm}|l|} \hline
\textbf{\vrule width0pt height 12pt depth 7pt Item} & \textbf{Description} & \textbf{Dublin Core}\\ \hline
% removed ivoid & IVOID of underlying data collection & R & \\ \hline
publisher & Data centre that produced the VOTable & publisher\\ \hline
%rename 23-nov-2023 version & Software version (*) & & \\ \hline
server\_software & Software version (*) & \\ \hline
service\_protocol & IVOID of the protocol through which the data was retrieved & \\ \hline
service\_ivoid & IVOID of the service through which the data was retrieved & \\ \hline
request & Full request URL including a query string (**)& \\ \hline
query & An input query in a formal language (e.g., ADQL) & \\ \hline
% removed in 23-nov-2023
%request\_post & (POST Request) POST arguments & & \\ \hline
% end
request\_date & Query execution date &\\ \hline
contact & Email or URL to contact publisher & \\ \hline
\multicolumn{3}{p{\textwidth}}{\vskip 2pt\footnotesize(*) Operators are
encouraged to follow \citet{note:opid} in this item.} \\
\multicolumn{3}{p{\textwidth}}{\footnotesize(**) For ``Simple''
protocols (regardless of the HTTP method), put the
application/x-www-form-urlencoded form of the query parameters in the
query part of the URL here.
More complex scenarios like UWS are not covered by this document.}
\end{tabular}\hss}
%\caption{\xmlel{INFO} names available for specifying the query that generated a VOTable}
\caption{\xmlel{INFO} names for specifying the query that generated a VOTable}
\label{tab:query-names}
\end{table}
\subsection{Dataset Origin}
\label{sec:dataset-origin}
Dataset origin complements the query-related information to improve the
understandability of the underlying data. Clients should make
sure that end users can easily access and inspect this information.
If the resource is also described in the Registry, care
must be taken that the in-response metadata reflects the metadata
available there at the time the response is produced.
Table~\ref{tab:origin-names} lists the origin-related metadata items
defined here.
\begin{table}[!h]
\hbox to\textwidth{\hss
\begin{tabular}{|l|>{\raggedright}p{7cm}|l|} \hline
\textbf{\vrule width0pt height 12pt depth 7pt Item} & \textbf{Description} & \textbf{Dublin Core}\\ \hline
% removed 23-nov-2023 publication\_id & Dataset identifier that can be used for citation& M & identifier\\ \hline
data\_ivoid & IVOID of underlying data collection & \\ \hline
ivoid & (deprecated) use data\_ivoid & \\ \hline
citation & Dataset identifier that can be used for citation (e.g. dataset DOI) & identifier\\ \hline
reference\_url & Dataset landing page & \\ \hline % previously landing_page
% removed in 23-nov-2023
%curation\_level & Controled vocabulary
% (IVOA rdf, content\_level) & & \\ \hline
% end
% modifier 23-nov-2023 resource\_version & Dataset version or last release & R & \\ \hline
resource\_version & Dataset version & \\ \hline
%rename 23-nov-2023 rights & (*) Licence URI & R & rights\\ \hline
rights\_uri & Licence URI (*) & rights\\ \hline
rights & Licence or Copyright text & rights\\ \hline
creator & \raggedright The person(s) mainly involved in the
creation of the resource; generally, the author(s)
& creator\\ \hline
journal & Designation of the medium the originating scholarly publication was
published in. In general, that is a journal name. Common
abbreviations (ApJ, A\&A, \dots) are encouraged. & \\ \hline
% removed 15-dec-2023 to use cites or is_derived_from
%relation\_type & An identifier of a second resource and its relationship to the
% present resource.
% Controlled vocabulary (**)& & relation\\ \hline
%related\_resource & Information about a second resource from which the present resource
% is derived. The source is an identifier that can be prefixed with the identifier type: eg: bibcode:, doi:, ror: & & source\\ \hline
article & Bibcode or DOI of a publication relevant for the data & relation\\ \hline
cites & An Identifier (ivoid, DOI, bibcode) of a resource
being in a ``cites'' (**) relationship to the
originating resource & relation\\ \hline
is\_derived\_from & An Identifier (ivoid, DOI, bibcode) of a referenced resource
that was used to produce the current resource (**) & relation\\ \hline
% remove 23-nov-2023
%publication\_date & Date of publication (DALI timestamp) & R & \\ \hline
%resource\_date & Date of the original publication (DALI timestamp) & R & date\\ \hline
%
original\_date & Date of the original resource from which the present resource is derived (DALI timestamp) & \\ \hline
publication\_date & Date of first publication in the data centre (DALI timestamp) (***) & \\ \hline
last\_update\_date & Last data centre update (DALI timestamp) (****) & date\\ \hline
\multicolumn{3}{p{\textwidth}}{\vskip2pt\footnotesize(*) Following Registry
practice, this should come from
SPDX \url{https://spdx.org/licenses/}, though Creative Commons URLs
\url{https://creativecommons.org} are also admitted}\\
\multicolumn{3}{p{\textwidth}}{\footnotesize(**) \url{https://www.ivoa.net/rdf/voresource/relationship\_type/}}\\
\multicolumn{3}{p{\textwidth}}{\footnotesize(***) Equivalent to curation/date[@role='created'] in registry}\\
\multicolumn{3}{p{\textwidth}}{\footnotesize(****) Equivalent to curation/date[@role='updated'] in registry}
\end{tabular}\hss}
\caption{\xmlel{INFO} names available for specifying information
related to the origin of the data set(s) a VOTable was generated from}
\label{tab:origin-names}
\end{table}
\subsection{VOTable serialization}
In this document, we focus on a basic serialization that allows data
providers to describe individual tables.
This is particularly suitable for protocols like Simple Cone Search.
%The basic serialization uses INFO tags to populate Data Origin (see the example of a ConeSearch result in appendix \ref{sec:appendixA}).
\paragraph{The basic serialization} uses INFO tags to populate DataOrigin using the 'name' attribute with the items listed in Table\ref{tab:origin-names} or Table\ref{tab:query-names} (see the example of a ConeSearch result in appendix \ref{sec:appendixA}).
%uses INFO tags to populate Data Origin items using the attribute 'name' (see the example of a ConeSearch result in appendix \ref{sec:appendixA}).
INFO tags are allowed in VOTable under \xmlel{VOTABLE} or in \xmlel{RESOURCE} elements.
It is expressly allowed to supply data origin in individual
\xmlel{TABLE} or \xmlel{RESOURCE} elements in more complex VOTables.
As a best practice, the global items listed in Table \ref{tab:query-names} should be placed directly at the root of the VOTable document. If a VOTable document contains several resources or tables, the items listed in Table \ref{tab:origin-names} can be placed in their respective resources or tables.
As a service to human readers, it is recommended to put descriptions, possibly derived from definitions provided in this document, into the bodies of the INFO elements.
This specification does not at this point constrain the multiplicities of individual INFO items, and clients should not fail hard if any given INFO item occurs multiple times.
\paragraph{Complex queries} (for instance, resulting from ADQL JOIN-s) need an advanced output serialization to gather the full metadata of all contributing resources.
Mechanisms to manage this requirement are being developed in the IVOA
(MIVOT).
The mechanisms defined here are generally still applicable in these
cases, but the authors acknowledge that they are certainly stretched to
their limits in such cases.
%\section{Data Origin in Registry} REMOVE 2025-11-03
%The VO registry schema, which contains most of the Data Origin information, is completed by metadata described in VOResource \citep{2018ivoa.spec.0625P}.
%This information (assuming it has been filled in by the issuer of the registration) can be requested via the ivo-id now available in VOTable.
%
%The table \ref{tab:voresourcemapping} (see appendix \ref{sec:appendixB}) establishes the mapping between VOResource and Data Origin items.
\appendix
\section{Appendix, Cone search serialization}\label{sec:appendixA}
Simple Conesearch with its VOTable serialization. Data Origin are specified using INFO.
\begin{lstlisting}[basicstyle=\footnotesize\ttfamily]
<VOTABLE version="1.5" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.ivoa.net/xml/VOTable/v1.3" xsi:schemaLocation=...>
<INFO name="server_protocol" value="ivo://ivoa.net/std/ConeSearch"/>
<INFO name="request_date" value="2022-10-30T12:08:00"
>Query execution date</INFO>
<INFO name="request"
value="https://vizier.cds.unistra.fr/viz-bin/conesearch/J/AJ/161/36/table8?
RA=28.4&DEC=39.3&SR=1"/>
<INFO name="contact" value="cds-question@unistra.fr"
>Publisher contact</INFO>
<INFO name="server_software" value="VizieR/7.5.3">Software version</INFO>
<RESOURCE ID="yCat_51610036" name="J/AJ/161/36">
<DESCRIPTION>117 exoplanets in habitable zone with Kepler DR25</DESCRIPTION>
<INFO name="data_ivoid" value="ivo://cds.vizier/j/aj/161/36">
ivoid identifier to link registry
</INFO>
<INFO name="publisher" value="CDS">data centre</INFO>
<INFO name="landing_page"
value="https://cdsarc.cds.unistra.fr/viz-bin/cat/J/AJ/161/36"/>
<!-- Extra information from Data Origin (basic info) -->
<INFO name="citation" value="doi:10.26093/cds/vizier.51610036">
Identifier that can be used for citation
</INFO>
<INFO name="last_update_date" value="2022-10-07">Last Data Centre update</INFO>
<INFO name="rights_uri"
value="https://cds.unistra.fr/vizier-org/licences_vizier.html"/>
<INFO name="creator" value="Bryson S.">Author</INFO><!-- ORCID ? -->
<INFO name="cites" value="2021AJ....161...36B">
Reference article
</INFO>
<INFO name="journal" value="AJ">
Journal of the reference article
</INFO>
<INFO name="publication_date" value="2021-03-16">
Date of first publication in the data centre
</INFO>
<INFO name="original_date" value="2021">Publication Date of the article</INFO>
....
<TABLE> .... </TABLE>
</RESOURCE>
</VOTABLE>
\end{lstlisting}
\section{Appendix, VOResource and Data origin}\label{sec:appendixB}
Expected metadata (VOResource) with their equivalent in Datacite schema (version 4.4) to provide Data Origin in the registry.\\
%\begin{table}
\label{tab:voresourcemapping}
The following table is a non-normative crosswalk for metadata
concepts between the VO Registry (VOResource and VODataService)
\footnote{This is not a full mapping of VOResource to DataCite. To the extent
this is possible (i.e., without having to mint DOIs), the XSLT at
https://github.com/ivoa/vor-doi gives such a mapping.}
, data origin, and DataCite. It is
intended to simplify the implementation of data origin for
publishers that already have machine-readable resource descriptions
available.
\paragraph{}
% 2025-11-06 gilles/Markus remove data_origin empty cells
\begin{tabular}{|p{2.8cm}|p{3cm}|p{3cm}|p{4cm}|} \hline
\textbf{VO Registry} & \textbf{Data Origin} & \textbf{DataCite} & \textbf{Explanation} \\ \hline
identifier &ivoid & Identifier & The ivoid of resource(s) hosted by the service\\ \hline
%title & & Title& The resource title\\ \hline
%shortName &&& The short name of the resource\\ \hline
%altIdentifier & & AlternateIdentifier&
% Alternate identifier accepts bibcode, DOI or URL. DOI should be privileged to facilitate cross identification \\ \hline
\end{tabular}
\paragraph{}
\begin{tabular}{|p{2.8cm}|p{3cm}|p{3cm}|p{4cm}|} \hline
\multicolumn{4}{|l|}{\textbf{curation}} \\ \hline
publisher & publisher & Publisher & The publisher (*)\\ \hline
creator & creator & Creator & The author(s) (*)\\ \hline
%contributor & & Contributor & The contributor(s) (*)\\ \hline
date [Created]& publication\_date & Dates [created] & The creation date (in data centre)\\ \hline
date [Updated]& last\_update\_date & Dates [updated] & The date of the last modification\\ \hline
? & original\_date & PublicationYear & The year of publication in data centre\\ \hline
version & resource\_version & Version &\\ \hline
contact & contact &&\\ \hline
\multicolumn{4}{l}{\small \footnotesize(*) terms allowing name and Orcid (\xmlel{altIdentifier} in VOResurce)} \\
\end{tabular}
\paragraph{}
\begin{tabular}{|p{2.8cm}|p{3cm}|p{3cm}|p{4cm}|} \hline
\multicolumn{4}{|l|}{\textbf{content} } \\ \hline
source & article & RelatedIdentifier (*) & A scholarly publication to cite when the data is used (bibcode or DOI)\\ \hline
referenceURL & reference\_url & & The landing page URL\\ \hline
%type & & ResourceType & The Resource type (catalog, etc)\\ \hline
%description & & Description & Usually, it is the abstract\\ \hline
%relationship & & RelatedIdentifiers & The Link to remote resource (it is recommended to link Original data centre) \\ \hline
relationshipType & cites, is\_derived\_from & relationType &\\ \hline
relatedResource & cites, is\_derived\_from & RelatedIdentifier & \\ \hline
\multicolumn{4}{l}{\small \footnotesize(*) DataCite sub-properties type=bibcode, relationType=IsSupplementTo} \\
\end{tabular}
\paragraph{}
\begin{tabular}{|p{2.8cm}|p{3cm}|p{3cm}|p{4cm}|} \hline
\multicolumn{4}{|l|}{\textbf{rights}} \\ \hline
rights & rights & Rights&
The right element accepts free text. However, machine-readable (*) license is preferable
\\ \hline
URI & rights\_uri& rightsURI & The License URL\\ \hline
\multicolumn{4}{p{\textwidth}}{\small \footnotesize(*) See SPDX list \url{https://spdx.org/licenses/} or Creative Commons licenses \url{https://creativecommons.org}}
\end{tabular}\\
%%\textbf{Examples}
%%
%%Examples of rights serialization:
%%
%%\begin{verbatim}
%%<rights rightsURI="https://spdx.org/licenses/ODbL-1.0.html">
%% ODbL-1.0
%%</rights>
%%\end{verbatim}
%%
%%\begin{verbatim}
%%<rights rightsURI="https://creativecommons.org/licenses/by/4.0/">
%% Creative Commons Attribution 4.0
%%</rights>
%%\end{verbatim}
%%
%%
%%Example or relation ship :
%%Cite the original dataset using "source" (to link a bibliographic reference) or "relatedIdentifier" (to link a dataset)
%%
%%e.g.:
%%\begin{verbatim}
%%<relationship>
%% <relationshipType>Cites</relationshipType>
%% <relatedResource>doi: 10.5270/esa-qa4lep3 : Gaia DR3 ESA</relatedResource>
%%</relationship>
%%\end{verbatim}
%%
%%Example of Creator:
%%\begin{verbatim}
%%<creator>
%% <name>Bryson S.</name>
%% <altIdentifier>orcid:0000-0003-0081-1797<altIdentifier>
%%</creator>
%%\end{verbatim}
\section{Appendix, Citation Template} \label{sec:appendixC}
Example of citation template that could be included in authors articles. The template exploits the metadata described in this document:\\
\textbf{Template}:\\
We extract data published in <article> (<creator>, <original\_date>),
via <publisher> services (ivoa resource=<ivoid>, <publication\_date>)
using <service\_protocol> (version <server\_software>, executed at <request\_date>\\
\textbf{Example}:\\
We extract data published in bibcode:2021AJ....161...36B (Bryson S., 2021),
via CDS services (ivoa resource=ivo://cds.vizier/j/aj/161/36, 2021-03-16)
using Simple Cone Search 1.03 (version 7.294, executed at 2022-10-30)\\
\textbf{Example}: we can construct a citation (in APA style) from data-origin. However, we emphasise that ADS remain the standard for citation.\\
"APA: <creator> (<publication\_date>). <title> [Dataset]. <publisher>. <citation>
\section{Appendix, DataOrigin and ProvDM}\label{sec:appendixD}
This is an example of Provenance extracted from a SimpleConeSearch result containing DataOrigin.
The figure is a graphical representation of provenance created using the VOPROV Python package \footnote{\url{
https://github.com/sanguillon/voprov}}.
\begin{figure}[htbp]
\includegraphics[width=1.2\textwidth]{voprov_example.png}
\caption{Provenance serialization generated from DataOrigin}
\end{figure}
\section{Appendix, Changes from Previous Versions}
\subsection{Difference between versions 1.1 and 1.2}
\begin{itemize}
\item New item: \textit{service\_ivoid}
\item Rename ivoid to data\_ivoid (now in Table2 (Origin))
\item Rename editor to journal
\end{itemize}
\subsection{Difference between versions 1.0 and 1.1}
\begin{itemize}
\item Changed vocabulary: \\
\textit{resource\_date} becomes \textit{last\_update\_date},
\textit{rights} and \textit{copyrights} becomes \textit{rights\_uri} and \textit{rights} (VOResource),
\textit{version} becomes \textit{server\_software} (https://ivoa.net/documents/Notes/softid/),
\textit{publication\_id} becomes \textit{citation} (DALI)
\textit{landing\_page} becomes \textit{reference\_url} (VOResource)
\textit{relation\_type} and \textit{related\_resource} to specific relation : \textit{cites} and \textit{is\_derived\_from} (VOResource)
\item Remove \textit{curation\_level},
\textit{request\_post},
\textit{rights\_type}
\item New item: \textit{original\_date}, \textit{article}
\item Clarify date items and multiple INFO in VOTable.
\item Propose added human-readable text for VOTable serialization.
\item Language smoothing
\end{itemize}
\bibliography{ivoatex/ivoabib,ivoatex/docrepo,local}
\end{document}