The W3C Data Catalog Vocabulary, Version 2: Rationale, Design Principles, and Uptake

DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. Since its first release in 2014 as a W3C Recommendation, DCAT has seen a wide adoption across communities and domains, particularly in conjunction with implementing the FAIR data principles (for findable, accessible, interoperable and reusable data). These implementation experiences, besides demonstrating the fitness of DCAT to meet its intended purpose, helped identify existing issues and gaps. Moreover, over the last few years, additional requirements emerged in data catalogs, given the increasing practice of documenting not only datasets but also data services and APIs. This paper illustrates the new version of DCAT, explaining the rationale behind its main revisions and extensions, based on the collected use cases and requirements, and outlines the issues yet to be addressed in future versions of DCAT.


Introduction
Data has become the most important asset that enables addressing issues ranging from societal challenges, such as pandemics and climate change, to everyday business insights.Thus, data descriptions and data cataloging are fundamental for supporting these data-driven approaches.The last few years have seen an increase in the trend towards Open Data, originally related primarily to public sector information, and then with increasing emphasis on facilitating the sharing and re-use of research data -for example, the Research Data Alliance (RDA) 7 and funder policies-, as well as an understanding of the importance of metadata -for example, with the uptake of FAIR data principles [1] for Findable, Accessible, Interoperable and Reusable data.Besides enabling data discovery and re-use, metadata is now also considered crucial to providing all the information necessary to reproduce an experiment-not only in order to verify the research results in scientific studies, but also in cases where data are used in support to policy making and impact assessment in the public sector.In addition, the qualitative and quantitative costs of not providing FAIR data and metadata have been estimated to be really high: an estimated impact of e10.2 bn for the European economy [2].
The Data Catalog Vocabulary, or DCAT, is a notable contribution to this picture.DCAT is a metadata vocabulary designed to facilitate interoperability between data catalogs published on the Web, irrespective of the domain, community, or platform.Consequently, by using DCAT, data published on the web can be exchanged between systems in an unambiguous manner and with a shared meaning.It was developed following the World Wide Web Consortium (W3C) standardization processes.
Originally developed and hosted at the Digital Enterprise Research Institute (DERI), DCAT was considered by the W3C e-Government Interest Group, and further refined by the Government Linked Data (GLD) Working Group, which published it as a W3C Recommendation in 2014 [3].Since then, it has been adopted and adapted by different parties-a notable example being DCAT-AP [4], the profile of DCAT being used across Europe as metadata interchange format.
In this paper, we describe the revision of DCAT, referred to as DCAT 2, which was developed by the W3C Dataset Exchange Working Group (DXWG) 8 in response to a new set of use cases and requirements gathered from implementation experiences with the original version (2014) of the W3C DCAT vocabulary, and new applications that were not considered at that time.These include the possibility of cataloging other resource types in addition to datasets, such as data services, and of describing relationships between datasets, as well as between datasets and other cataloged resources.Overall, DCAT 2 harmonizes approaches emerging from different communities of usage, extending the core on which profiles can ensure the uniformity of semantics required for a lossless interoperability.
DCAT 2 was published as a W3C Recommendation in February 2020 [5].This paper complements the formal recommendation, offering insights into the requirements and the process considered in the new version of DCAT.
The paper is organized as follows.Section 2 explains the methodology, detailing the design principles adopted for the development of DCAT 2. Section 3 gives a brief summary of the requirements that drove the revision.Section 4 presents the DCAT model and highlights the features and guidelines introduced in DCAT 2. Section 5 reviews and discusses contributions in relation to other well-known metadata vocabularies.Section 6 discusses the implementation evidence and the uptake of DCAT.Finally, Section 7 summarizes the contributions and outlines future activities.

Methodology and Design Principles
The revision of DCAT has been developed by the W3C Data Exchange Working Group (DXWG), which was chartered to maximize interoperability between services such as data catalogs, e-infrastructures, and virtual research environments. 9The revision of DCAT was one of the planned deliverables, together with two other specifications concerning guidelines for the publication of application profiles and profile-based content negotiation.
DXWG worked on DCAT version 2 between May 2017 and January 2020.The group discussions took place in circa 130 teleconferences and four face-to-face meetings, as well as via the DXWG mailing list, issue tracker and GitHub repository.Following the formal W3C process, all these resources are publicly available, including the agenda and minutes of each meeting. 10he efforts of DXWG have focused on fulfilling requirements expressed in a W3C Working Group Note, the Dataset Exchange Use Cases and Requirements [6], which documents 51 use cases collected by the working group, and from which the requirements for the revision were identified.Beside the use cases and requirements documented in [6], the working group took into account the feedback received in response to four intermediate versions of the specification, consisting of three public Working Drafts and a Candidate Recommendation, each publicized within relevant communities.
This paper explicitly refers to requirements and technical design issues to guide interested readers into interlinked working group resources, which deepen the discussion and elucidate the design choices made.
The paper references to working group resources as follows: Issues All the DCAT issues are documented in the GitHub space of the DXWG.The paper cites them in the text by number, e.g., Issue 1009 for https://github.com/w3c/dxwg/issues/1009.
Requirements Requirements are documented in [6] and replicated as separated GitHub issues to track discussion and changes triggered by the requirements.The paper refers to them by their handles, also pointing to the related issues when specific discussions need to be referenced.For example, the paper refers to "Dereferenceable identifiers [RDID]" by [RDID], and to its related issue available at https: //github.com/w3c/dxwg/issues/53as Issue 53.
The working group adhered to the following guiding principles designing DCAT 2.
Preservation of the backward compatibility with existing implementations.In designing DCAT 2, the working group strove to minimize the impact on existing implementations.Governmental agencies have already deployed broadly the DCAT standard, and the working group aimed to preserve current implementations by avoiding the need to enforce changes unless strictly necessary.DCAT 2 does not make obsolete any pre-existing terms, and introduces new practices by complementing those already in place.New implementations of, e.g., application profiles are expected to adopt DCAT 2, while the existing implementations will not need to be upgraded unless owners want to use the new features.In particular, current DCAT deployments that do not overlap with the DCAT 2 new features (e.g., data services, time and space properties, qualified relations, packaging) do not need to change anything to remain conformant with DCAT 2.
Reuse of terms from consolidated metadata vocabularies.DCAT 2 incorporates terms from pre-existing vocabularies where stable terms with appropriate semantics could be found.This is consistent with the Data on the Web Best Practice (DWBP) #15 "Use terms from shared vocabularies, preferably standardized ones, to encode data and metadata."[7].DCAT reuses terms from Dublin Core [8], FOAF [9], and PROV-O [10], and defines a minimal set of classes and properties of its own.Informal summary definitions of the externally-defined terms are included in the DCAT vocabulary for convenience, while authoritative definitions are available from the normative references.Changes to definitions in the references, if any, will be expected to take precedence over the summaries given in DCAT.
Minimization of the ontological commitment.The group strives to minimize the ontological commitment of DCAT 2. From a practical point of view, that implies avoiding over-axiomatization of DCAT, e.g., by introducing restrictions that might limit the re-usability of DCAT.Moreover, following the DWBP #16 "Choose the right formalization level" [7], DCAT 2 has removed or relaxed domain and range restrictions for properties (such as those concerning the specification of data themes, keywords, and landing pages).As a rule of thumb, DCAT delegates to application profiles the burden of setting restrictions or providing guidelines for specific applications and communities.
Balancing normative specification and Open-World Assumption.The specification of DCAT 2 is influenced by common assumptions made in contexts of the Semantic Web and linked data.In particular, DCAT is a metadata schema based upon the "Open-World Assumption" (OWA), and it is defined by using the Resource Description Framework (RDF) data model [11].The OWA implies that the metadata schema is not closed, and it can be extended using types and relationships borrowed from other schemas.RDF promotes an inherently machine-actionable approach, where each term in a metadata schema has its own identifier, which can be used to retrieve the term's semantics, and terms from distinct vocabularies can be jointly used.These assumptions have proven to scale on uncoordinated open environments such as the Web, but the flexibility offered by the OWA must be taken into account when dealing with the notion of conformance.DCAT-compliant catalogs may include additional non-DCAT metadata fields and additional RDF data in the catalog's RDF description.The contents of all metadata fields that are held in the catalog (and that contain data about the catalog itself), as well as the corresponding cataloged resources and distributions, are included in this RDF description, and are expressed using the appropriate classes and properties from DCAT.All classes and properties defined in DCAT are used consistently with the semantics declared in the DCAT Recommendation.Constraints on instances can be provided using shape languages such as ShEx and SHACL [12,13,14].

Requirements for DCAT 2
Table 1 summarizes the requirements addressed by DCAT 2. The following sections present the modeling solution introduced in DCAT 2, which refer to the requirements in the table.

DCAT Metadata Schema
The backbone of DCAT 2 [5] consists of three main classes: dcat:Catalog, dcat:Resource, dcat:Distribution.Figure 1 provides an overview of DCAT 2 model, showing the classes of resources that can be members of a Catalog, and the relationships between them.The diagram uses UMLstyle class notation, but it should be interpreted following the usual RDF Open-World Assumption around the presence/absence of properties, relationships, and cardinalities.To assist in understanding the full scope of each class, the inherited properties are copied down from each super-class.Cardinalities are shown in a few places to reinforce expectations, but these are not axiomatized or enforced in any way by the normative recommendation.

Dataset access [RDSA]
Provide a way to specify access restrictions for both a dataset and a distribution.Distribution schema [RDIS] Define a way to include identification of the schema the described data conforms to.Spatial coverage [RSC] Provide means to specify spatial coverage with geometries.Define way to specify content of packaged files in a Distribution.

Distribution service [RDISV]
Provide a mean to describe that a distribution is provided by a service.

Primary & alternative id [RIDALT]
Provide means to distinguish the primary and alternative (legacy) identifiers.

Quality-related info [RDQIF]
Define a way to associate quality-related information with Datasets.

Data quality model [RDQM]
Identify common modeling patterns for different aspects of data quality based on frequently referenced data quality attributes found in existing standards and practices.

Dataset citation [RDSC]
Provide a way to specify information required for data citation (e.g., dataset authors, title, publication year, publisher, persistent identifier).

Entailment of Schema.org [RES]
Define schema.orgequivalents for DCAT properties to support entailment of Schema.orgcompliant profiles of DCAT records.
Table 1: Requirements addressed in DCAT 2 identified by their IDs.The looselystructured catalog requirement (Issue 253) emerged from the community in form of GitHub issue.
dcat:Catalog represents a catalog, which can be seen as a kind of dataset in which each individual item is a metadata record describing a DCAT resource.dcat:Resource represents any resource that may be described by a metadata record in a catalog.It is the parent class of dcat:Dataset and dcat:DataService-the most typical resources types documented in a DCAT catalog.DCAT profiles or applications can define other kinds of resources to be cataloged as sub-classes of dcat:Dataset, dcat:DataService or dcat:Resource.It is worth noting that dcat:Resource and its subclasses can be used also for datasets and services which are not included in any catalog.dcat:Distribution represents a specific representation of a dataset.A dataset might be available in multiple serializations that may differ in various ways, including natural language, media-type or format, schematic organization, temporal and spatial resolution, level of detail or profiles (which might specify any or all of the above).
DCAT 2 borrows from the Dublin Core Metadata Terms (DCTERMS) vocabulary [8] a set of properties that are transversely applicable to different items, including datasets, data services, catalogs, and distributions.In particular, dcterms:title and dcterms:description to title and describe items; dcterms:issued and dcterms:modified to indicate the date of formal issuance and the most recent modification date of an item; dcterms: license and dcterms:rights to indicate a legal document under which the item is made available and its copyright statements.

DCAT 2 new features in the backbone and traversal properties.
DCAT 2 provides guidelines to express conformance.It recommends the property dcterms:conformsTo on a traversal set of items to express conformance to different types of standards.The use of such a property is a consolidated practice in different profiles and vocabularies (e.g., DCAT-AP [4] and DQV [15]).Besides, for formal standards issued by bodies like ISO and W3C, dcterms:conformsTo is adopted to indicate models, schemas, ontologies, profiles that a cataloged resource or distribution conforms to (see Issue 55 and Issue 411).
DCAT 2 elaborates the guidelines to handle licenses and rights (see Issue 114).Different best practices recommend providing data license and right information (e.g.DWBP [7]).However, multiple use cases fall under the umbrella of license and right information.DCAT 2 provides guidelines distinguishing three main cases: one to associate a resource that represents "license"; a second, to associate a resource denoting only access rights (e.g., whether data can be accessed by anyone or just by authorized parties (Req.RDSA, Issue 59)); a third, to cover all the other cases -i.e., statements not concerning licensing conditions and/or access rights (e.g.copyright statements).
For the first case, DCAT 2 recommends the property dcterms:license to refer to canonical URIs of well-known licenses such as those defined by Creative Commons.For the second, it recommends the property dcterms: accessRights to express statements specify access rights by referring to code lists/taxonomies, such as the access rights code list MDR-AR11 used in DCAT-AP [4] or the Eprints Access Rights Vocabulary Encoding Scheme12 .
For the third, all the other types of rights statements such as copyright statements, which are not covered by dcterms:license and dcterms:accessRights, DCAT 2 recommends the property dcterms:rights.Finally, in the particular case when rights are expressed via Open Digital Right Language (ODRL) policies, DCAT 2 recommends to use the odrl:hasPolicy property as the link from the description of the cataloged resource or distribution to the ODRL policy according to the W3C ODRL model [16] and vocabulary [17], in addition to the corresponding DCTERMS property that matches the same ODRL policy type.
The following subsections provide more detailed descriptions of the specific components of DCAT 2.

Resources
The class dcat:Resource represents a cataloged resource.In previous versions of DCAT, datasets were the only kind of entities in DCAT catalogs.DCAT 2 newly introduces the dcat:Resource class, which is an extension point for defining a catalog of any resource.The original dcat:Dataset is a sub-class of dcat:Resource.Besides properties transversely applicable, the class dcat:Resource includes all the properties that were made available in the previous version of DCAT for datasets and might serve for other kinds of resources in DCAT 2. In particular, dcat:landingPage indicates a Web page that can be navigated in a Web browser to gain access to the resources, the catalog, a dataset, its distributions and/or additional information.dcat:contactPoint, dcterms:creator and dcterms:publisher indicate respectively the contact information for the cataloged resource (expressed in vCard [18]), the entity responsible for creating the resource and the entity for making the resource available, both expressed as foaf:Agent.dcterms:language refers to the natural language used for textual metadata (i.e.titles, descriptions, etc) of a cataloged resource.dcat:keyword classifies the resources using free-text keywords, while dcat:theme classifies resources with concepts taken from Knowledge Organization Systems (KOS) and possibly available as Linked Data.
dcat:Dataset is a subclass of dcat:Resource which represents a collection of data, published or curated by a single agent, and available for access or download in one or more representations, schematic layouts and formats or serializations.The property dcat:distribution relates a dataset to its distributions (dcat:Distribution). dcat:DataService is a subclass of dcat:Resource which represents a Web API or service that provides access to data, specifically to download distributions of a dataset.
Other subclasses of dcat:Resource can be defined to support applications that catalog other kinds of resource, for example, "specimens".

DCAT 2 new features in Resource.
DCAT 2 provides flexible mechanisms to indicate the type of cataloged resources (Req.RDST and Issue 64).DCAT can be used to model a variety of resources -including documents, software, images and audio-visual content.To ensure the flexibility potentially required by catalogs serving different communities and application cases, DCAT 2 provides two mechanisms for typing resources.First, a cataloged resource description has an RDF type to denote a sub-class of dcat:Resource -initially dcat:Dataset and dcat:DataService.Second, the property dcterms:type may be used to indicate a sub-type.It is strongly recommended that the value of this property is taken from a well-governed and broadly recognized set of resource types (e.g., the DCMI Type vocabulary [8], the DataCite resource types [19], the ISO-19115-1 scope codes [20], the MARC intellectual resource types).Using dcterms:type is particularly appropriated for referring to classifications provided by other standards, and to enable interoperability with existing catalogs (see use cases ID8 and ID20).When describing a resource which is not a dcat:Dataset or dcat:DataService, it is recommended to create a suitable sub-class of dcat:Resource, or use dcat:Resource with the dcterms:type property to indicate the specific type.
DCAT 2 provides information required for data citation (see Req. RDSC and Issue 61).DCAT 2 provides equivalents to all the mandatory elements in DataCite [19].The original DCAT already supported title, publisher, publication year, resource type, DCAT 2 has specifically considered dcterms:creator to indicate creator and it provides guidelines for dealing with different types of identifiers (see section 4.5).
DCAT 2 provides a way to deal with a wide set of relations.Resources might be related in many different ways and complex relations might characterize the context in which resources have been created, for example, to track its input data, the software used, the agents and founders involved (e.g., see use cases ID9, ID12, ID31, ID32).The property dcterms:relation is recommended for use in the context of a cataloged resource to capture general relationships, including related datasets (Req.RRDS) and the case where the package of resources associated with a cataloged item includes a mixture of representations, parts, documentations and other elements which are not strictly 'distributions' of a dataset (see Issue 253 expressing the requirement on loosely-structured catalogs).The property dcterms:relation is a super-property of a number of more specific properties which express more precise relationships, such as dcat:distribution, dcterms:hasPart, (and its sub-properties dcat:catalog, dcat:dataset, dcat:service), dcterms: isPartOf, dcterms:conformsTo, dcterms:isFormatOf, dcterms:hasFormat, dcterms:isVersionOf, dcterms:hasVersion, dcterms:replaces, dcterms: isReplacedBy, dcterms:references, dcterms:isReferencedBy, dcterms: requires, dcterms:isRequiredBy.The dcterms:relation is not inconsistent with a subsequent reclassification with more specific semantics, though the more specialized sub-properties should be used to link a dataset to component and supplementary resources if possible.For example, DCAT 2 uses the property dcterms:isReferencedBy to associate the resource described in the catalog with an external resource that references, cites, or points to the cataloged resource.By applying this property, DCAT 2 tracks publications that reuse or describe a specific dataset (see Req. RDSP and Issue 63).DCAT 2 tracks the project that has generated a resource: prov:wasGeneratedBy links datasets to the projects that have generated them (Req.RPR and Issue 77).
DCAT 2 supports complex non-binary relations.It uses qualified relations to deal with relations not covered by the above or other known properties (e.g., PROV-O properties such as prov:wasDerivedFrom, prov: hadPrimarySource) and to overcome the limitation related to binary relations (see the requirement "qualified forms" [Req.RQF] discussed in Issue 79).Even when the relations are represented in known properties, there may be the need of providing additional information concerning, e.g., the temporal context of a relationship, which requires the use of a more sophisticated representation, for example, to specify the temporal dimension of a rolei.e., the time frame during which an individual/organization played a given role -and, maybe, also other information -e.g., the organization where the individual held a given position while playing that role (see use cases ID19 and ID13, and Issue 66).DCAT 2 models relationships between resources and agents with property prov:qualifiedAttribution (for example, the funding source Req.RFS) and relationships between resources with dcat: qualifiedRelation.Property prov:qualifiedAttribution links the resource to instances of the class prov:Attribution, which ascribes the re-source to an agent indicated by the property prov:agent.Property dcat: qualifiedRelation links the resource to a relation dcat:Relationship involving another resource pointed by the property dcterms:relation.The property dcat:hadRole is used in prov:qualifiedAttribution to denote the relation the resources have and in dcat:qualifiedRelation to indicate the roles an agent plays.
DCAT 2 supports a rich set of temporal and spatial properties to characterize datasets.The previous version of DCAT offered dcterms:issued, dcterms:modified and dcterms:accrualPeriodicity to indicates when a dataset is issued, modified and its update schedule.DCAT 2 adopts new properties specifically dealing with the temporal coverage (Req.RTC).It introduces the property dcat:temporalResolution to specify the minimum temporal separation of items in a dataset encoded as xsd:duration and adopts dcterms:temporal to indicate the temporal extent of a dataset.The extent is expressed as instances of the class dcterms:PeriodOfTime, indicating the start and end of the interval by using properties dcat:startDate or time:hasBeginning, and dcat:endDate or time:hasEnd, respectively.The interval can also be open -i.e., it can have just a start or just an end (see Issue 85 for further discussions).Similarly, DCAT 2 introduces two new properties to express spatial coverage (Req.RSC, see Issue 83 for the detailed discussion).dcat:spatialResolutionInMeters specifies the minimum spatial separation of items in a dataset, expressing it as a decimal values in meters.dcterms:spatial expresses the spatial extent of a dataset.Its values are a spatial region or named placed dcterms:Location, in which, the property locn:geometry specifies an extensive geometry (i.e., a set of coordinates denoting the vertices of the relevant geographic area), dcat:bbox specifies a geographic bounding box delimiting a spatial area, dcat:centroid indicates a geographic center of a spatial area, or another characteristic point.
DCAT 2 adds mechanisms for including data services.Data is often served via web services.A service may provide access to more than one dataset, and it is necessary to know how to query the service API to get the data (see use cases ID18 and ID6).DCAT 2 specializes dcat:Resource with a new class dcat:DataService to model data services (see Issue 180).A data service is a collection of operations that provides access to one or more datasets or to data processing.The dcat:servesDataset property links a service to data that it can distribute.The kind of service can be indicated using the dcterms:type property; its value may be taken from a controlled vocabulary such as the INSPIRE spatial data service type code list 13 .dcat:endpointURL provides the root location or primary endpoint of the service (a Web-resolvable IRI).Property dcat:endpointDescription provides a description of the services available via the endpoints, including their operations, parameters, etc.The endpoint description gives specific details of the actual endpoint instances, using dcterms:conformsTo to indicate the general standard or specification that the endpoints implement.An endpoint description may be expressed in a machine-readable form, such as an Open API [21] description, an OGC GetCapabilities response WFS [22,23], WMS [24,25], a SPARQL Service Description [26], an OpenSearch [27] or WSDL [28] document, a Hydra API description HYDRA [29].

Distributions
dcat:Distribution is a specific class for representation of a dataset.A dataset might be available in multiple serializations that may differ in various ways, including natural language, media-type or format, schematic organization, temporal and spatial resolution, level of detail or profiles (which might specify any or all of the above).Distributions represent a general availability of a dataset, whose access can include different access methods (e.g., direct download, API, or through a Web Page).For the distributions, dcat:downloadURL provides the URL for a downloadable file in a given format.The "format" of a distribution should be specified through the property dcat:mediaType when a correspondent IANA Media Types [30] exists, or dcterms:format otherwise.dcat:byteSize specifies the size of distribution in bytes.When a direct link to the downloadable file is not available, dcat:accessURL indicates a URL of the resource that gives access to a distribution of the dataset.It should be used for the URL of a service or location that can provide access to this distribution, typically through a Web form, query or API call.

DCAT 2 new features in Distributions.
DCAT 2 introduces distribution service to support use cases where the distribution of a dataset is made by Web services (ID6 and Req.RDISV).DCAT 2 adds the property dcat:accessService which relates distributions to their dcat:DataService detailed information about how users can interact with distribution services (Issue 267).

DCAT 2 revises and clarifies the definition of distribution (Req. RDIDF).
The previous definition of dcat:Distribution allowed a number of alternative interpretations.The definition has been rephrased to clarify that distributions are primarily representations of datasets.DCAT 2 clarifies that lossless transformations between representations are not always possible.In some cases, distributions of the same dataset might have different levels of fidelity to the underlying data (see discussion in Issue 52).Moreover, the question of whether different representations can be understood to be distributions of the same dataset, or distributions of different datasets, is application-specific. Judgment about how to describe them is the responsibility of the provider, taking into account their understanding of the expectations of users, and practices in the relevant community.DCAT 2 supports packaged and compressed distributions (Req.RDIP see Issue 54).Distributions can include multiple files made available in compressed archives.DCAT 2 introduces the property dcat:packageFormat and dcat:compressFormat to indicate the package and compression formats of the distribution.Both formats should be expressed using a media type as defined by IANA [30], if available.DCAT 2 recommends to indicate distribution schema.It uses the property dcterms:conformsTo to indicate the model or schema used for the representation of dataset (Req.RDIS and Issue 55).

Catalog and Catalog Record
A dcat:Catalog is a curated collection of metadata about resources such as datasets and data services.dcat:Catalog is characterized by further properties besides those transversely applicable: foaf:homepage indicates the homepage of the catalog which usually is a public Web document available in HTML; dcat:themeTaxonomy refers to the Knowledge Organization System (KOS) providing concepts to classify the cataloged resources; dcat:record links a catalog to a dcat:CatalogRecord describing the registration of a single cataloged resource that is part of the catalog.Using dcat:record and dcat:CatalogRecord is possible to distinguish between the metadata of a cataloged resource (i.e., instances of dcat:Resources) and the metadata of the metadata of the cataloged resource (i.e., instances of dcat:CatalogRecord).This is required in specific cases, for example, to express the date when a resource has been registered or modified in the catalog (dcterms:issued and dcterms:modified attributed to instances of dcat:CatalogRecord), which may differ from the publication or modifica-tion of the concrete resources (aka dcterms:issued or dcterms:modified attributed to instances of dcat:Resource).DCAT 2 enables provision for catalogs to be composed of other catalogs, in particular, dcat:Catalog has been made a sub-class of dcat:Dataset, and the property dcat:catalog is provided to specify sub-catalogs (see Issue 182).
DCAT 2 extends the type of thematic resources which can be considered to classify datasets.It relaxes the global range of the property dcat: themeTaxonomy allowing the linking to a KOS that is not formalized as a skos:ConceptScheme (See Issue 119).Beside SKOS concept schemes, SKOS collections [31,32] or OWL ontologies [33] are recommended advising that each member of the KOS can be denoted by an IRI and published as linked data.
DCAT 2 includes specific mechanisms to state the conformance of metadata to standards.It adopts the property dcterms:conformsTo for dcat: CatalogRecord to represent the conformance of a record metadata with a metadata standard (see Issue 502).

Guidelines
In addition to the feature discussed above, DCAT 2 elaborates guidelines to meet specific requirements posed by the community.Guidelines systematize emerging solutions based on W3C vocabularies such as DQV [15] and ADMS [34] which are stable enough to be adopted even if they have not reached the status of W3C recommendation.
DCAT 2 provides guidelines to deal with different kinds of identifiers.As pointed out in the use case ID11, a number of different (possibly persistent) identifiers are widely used in the scientific community, especially for publications, but now increasingly for authors and data.Different approaches are used for representing them, best practices are needed to enable their effective use across platforms.But more importantly, they need to be made actionable, irrespective of the platforms they are used in (see Req. RDID).Encoding identifiers as HTTP URIs seems to be the most effective way of making them actionable.Notably, quite a few identifier schemes can be encoded as dereferenceable HTTP URIs, and some of them are also returning machine-readable metadata (e.g., DOIs, ORCIDs).Moreover, they can still be encoded as literals, especially if there is the need of knowing the identifier "type" (Req.RIDT).In such a case, a common identifier type registry would ensure interoperability.DCAT 2 reuses terms provided by DCTERMS [8] and VOCAB-ADMS [34].Data providers can apply dcterms:identifier to any kind of resources binding their HTTP dereferenceable proxy IDs with legacy identifiers, non-HTTP dereferenceable identifiers, locally minted or thirdparty-provided identifiers (Issue 53).Another issue concerns the ability to specify primary and secondary identifiers.This may be a requirement when resources are associated with multiple identifiers (Req.RIDALT).The property adms:identifier can express other locally minted identifiers or external identifiers, like DOI, ELI, arXiv for creative works, and ORCID, VIAF, ISNI for actors such as authors and publishers, as long as the identifiers are globally unique and stable.The property adms:identifier ranges in instances of the class adms:Identifier, for which skos:notation indicate the identifier as a literal with datatype IRI (e.g.,"PA 1-060-815"^^ex:type), adms: schemaAgency and dcterms:creator represent the authority that defines the identifier scheme (e.g., the ex:type in the example).adms:schemaAgency is used when the authority has no URI associated (see Issue 67).The type of identifiers can be provided as RDF datatypes [11] or custom OWL datatypes [35] if not already registered as URI type.Examples of common types for identifier scheme (arXiv, etc.) are defined in DataCite schema 14 and FAIRsharing Registry15 (see Issue 68).
DCAT 2 provides guidelines for documenting the quality of resources and distributions.Consistently with the recommendations from the Data on the Web Best Practices (DWBP) [7], the use cases ID45 and ID14 stress the need for a uniform representation of data quality so that consumers understand the possibilities and risks of using and reusing the data.DCAT 2 reuses the Data Quality Vocabulary (DQV) [36] [15] to associate qualityrelated information to datasets (Req.RDQIF) and offer common modeling patterns for different aspects of Data Quality (see, Req.RDQM, Issue 57, Issue 58).The property dqv:hasQualityAnnotation relates datasets and distributions with reviews, users' feedback and quality certificates (modeled as dqv:QualityAnnotation).The property dqv:hasQualityMeasurement relates resources and distributions to quality measurements (instances of dqv:QualityMeasurement) evaluated by community-defined domain-specific metrics (dqv:Metric) which provide quantitative or qualitative information about the dataset or distribution.dqv:QualityPolicy models policies or agreements that are chiefly governed by data quality concerns.As previously discussed, dcterms:conformTo can state the compliance with standards, specifications.DCAT 2 includes examples of how DQV can express the degree of conformance to best practices (e.g. the DWBP [7] or the FAIR Principles [1]) and combines DQV with the Evaluation and Report Language (EARL) [37] and PROV ontology [10] to express details about the results of conformance and quality tests.

Related Work
This section reviews metadata models that readers might perceive as overlapping with DCAT in terms of coverage or goals.The discussion points out the distinct metadata models' peculiarities and their mapping into DCAT.Overall, the discussion clarifies that DCAT is not redundant with the existing metadata models.Instead, a joint of the discussed metadata models with DCAT brings advantages in the overall metadata expressivity and crosssector, cross-platform sharing, and reuse.
CERIF.The Common European Research Information Format (CERIF) models Research Environment, including research outputs, persons, organizations, projects, funding programs, facilities as first-class citizens and capturing the semantic relationships of entities with each other as well as entity classifications (i.e.roles).The European Commission mandated euro-CRIS to maintain, develop and promote CERIF as an EU recommendation to Member States.euroCRIS now has more than 100 institutional members in approximately 40 countries and there are hundreds of implementations of CERIF, including by several commercial ICT suppliers.CERIF is currently being used in numerous systems in production across Europe (e.g., national or institutional research information systems), as well as in European FP7 e-infrastructure projects, such as OpenAIREplus, EuroRIs-Net+ and ENGAGE [38].CERIF and DCAT differ in terms of goals and specificity.CERIF specifically focuses on research environments, while DCAT focuses on Data Catalogs.Partial mapping of DCAT into CERIF exists [39].For example, DCAT Datasets can be modeled as ResultProduct, but CERIF does not natively provide distinctions between catalogs, datasets, distributions, nor other details such as access details.
DataCite.The DataCite metadata schema [19] is a list of core metadata properties chosen for accurate and consistent identification of a resource for citation and retrieval purposes, along with recommended use instructions.It is managed by the DataCite consortium, founded in late 2009 with the goal of easing the access to scientific research data on the Internet, increasing acceptance of research data as legitimate, citable contributions to the scientific record, and supporting data archiving that will permit results to be verified and re-purposed for future study.DataCite infrastructure is responsible for issuing persistent identifiers (in particular, DOIs) for datasets, and for registering dataset metadata.Such metadata is to be provided according to the DataCite metadata schema.While DataCite's Metadata Schema has been expanded with each new version, it is, nevertheless, intended to be generic to the broadest range of research datasets, rather than customized to the needs of any particular discipline.DataCite metadata primarily supports citation and discovery of data; It does not include specific terms for Catalogs and Distributions, it is not intended to supplant or replace community-specific metadata.DataCite enables providing other metadata schemas via DOI content negotiation.In particular, it supports JSON-LD [40] to serve metadata according to Schema.org.A mapping from DataCite to DCAT is defined in CiteDCAT-AP [41], a metadata profile used in Zenodo 16 , the most popular European research data repository.
ISO 19115.ISO 19115-1:2014 [20] defines a metadata schema for describing geographic information and services by means of metadata.It provides in-formation about the identification, the extent, the quality, the spatial and temporal aspects, the content, the spatial reference, the portrayal, distribution, and other properties of digital geographic data and services.Mapping of ISO 19115 to DCAT has been developed, in particular, GeoDCAT-AP [42] is an extension to the "DCAT application profile for European data portals" (DCAT-AP) for the representation of geographic metadata.GeoDCAT-AP was designed to enable the cross-sector and cross-platform sharing and re-use of INSPIRE and, more in general, metadata following the ISO 19115/19119 standards and the corresponding XML-based implementation (ISO 19139).
Schema.org.In 2011, the major search engines Bing, Google, and Yahoo (later joined by Yandex) created Schema.org to provide a single schema across a wide range of topics that included people, places, events, products, offers, and so on [43].Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on Web pages, in email messages, and beyond.Schema.orgincludes a number of types and properties based on the original DCAT work (see sdo:Dataset as a starting point), and the index for Google's Dataset Search service relies on structured description in Web pages about datasets using both Schema.organd DCAT [44].This class is modeled starting from W3C DCAT work, and benefits from collaboration around the DCAT, ADMS and VoID vocabularies 17 .In particular, Schema.orgmimics the DCAT backbone, the (abstract) sdo:Dataset and (concrete) sdo:DataDownload matches dcat:Dataset / dcat:Distribution, as for the relationship of Datasets to DataCatalogs.Contrary to DCAT, Schema.org is not a W3C standard, the project is not governed by W3C, the W3C advisory group or the W3C Process; rather, it stems from an informal collaboration.In terms of workflow, the primary difference between Schema.org and W3C's recommendation track process is an emphasis on incremental publication of releases (several releases per year) approved by a small steering group whose role is to evaluate and approve release candidates prepared by the project webmaster on the basis of wider discussion which takes place in a dedicated W3C community group and related GitHub project.DCAT 2 [5] provides a mapping between DCAT and Schema.org to clarify the relation between DCAT and Schema.organd promote the discoverability by main-stream search engines (see Req. RES).
VoID.VoID [45] is an RDF vocabulary for expressing metadata about RDF datasets.It covers (i) general metadata following the Dublin Core model; (ii) access metadata describing how RDF data can be accessed using various protocols; (iii) structural metadata describing the structure and schema of datasets for tasks such as querying and data integration; (iv) description of links between datasets for understanding how multiple datasets are related and can be used together.VoID is quite popular in the context of Linked data and extended by other vocabularies such as DataID [46].However, being specifically suited for RDF dataset and linked data practices, it does not cover all the types of data required by the open and research data community (e.g., CSV, JSON).Fruitfully jointly use of DCAT and VoID have been shown (e.g., by DataID [46]).

DCAT implementations and uptake
The W3C recommendation process requires the collection of implementation experiences to show that a specification is sufficiently clear, complete, relevant to market needs, and to ensure that independent, interoperable implementations of each feature of the specification are realized.In view of that, the editors of DCAT 2 prepared a DCAT 2 implementation report [47].The report also shows preliminary evidences of DCAT 2 uptake.It focuses on two types of evidence: i) DCAT-based vocabularies; ii) data catalogs, data services, and datasets.
As for DCAT-based vocabularies, different profiles are based on DCAT 2 [5] or extend the original version of DCAT [3] with properties and classes included in DCAT 2, showing implementation evidences of the reviews included.Due to the large number of DCAT-based vocabularies and data catalogs supporting DCAT, this section includes only a representative subset, providing nonetheless enough implementation evidence of the revisions proposed in DCAT 2.
In particular, DCAT-AP [4] is a profile of DCAT used across Europe since 2014 as a metadata interchange format, primarily for catalogs of government data, and, to some extent, for scientific data.As such, it has a broad geographic coverage, and it is supported in data catalogs (e.g., the European Data Portal18 ) and catalog platforms (e.g., CKAN 19 ).
GeoDCAT-AP [42] and StatDCAT-AP [48] are domain-specific extensions of DCAT-AP for geospatial and statistical data, respectively, and they share the same geographic coverage of DCAT.
CiteDCAT-AP [41] and DCAT-AP-JRC [49] are extensions of DCAT-AP specifically designed for multidisciplinary research data, and they are implemented in the corporate catalog of the European Commission's Joint Research Centre 20 .Moreover, CiteDCAT-AP is supported in Zenodo 21 , the research data catalog and repository most widely used in Europe.
DCAT-AP has also been used as a basis for the development of countryspecific extensions (see [50]).Such extensions have not been included in this review, but they provide additional support to the implementation evidence for the revisions proposed in DCAT 2 already included in DCAT-AP.
DCAT-AP aligns with DCAT 2 since version 2.0, and such alignment will eventually be reflected in the DCAT-AP extensions.For example, Geo-DCATAP 2.0 [42] (released in December 2020) is aligned with DCAT 2.
Moreover, in the context of scientific data, projects and initiatives such as EOSC-pillar [51], FAIRsFAIR [52] and ExPaNDS encourage data repository owners to publish their datasets by mapping their metadata with the DCAT standard when following the FAIR principles.
DCAT 2 is adopted in FAIRification of Citizen Science platform [53], and open source platforms such as SEEK [54] to improve interoperability between digital assets on the Web and enable cross-domain markup.It is a core building block for developing REST API aiming at creating, storing, and serving FAIR metadata (see FAIR Data Point (FDP) [55]).
DCAT is recommended by the ExPaNDS project as part of its "Final Recommendations for FAIR Photon and Neutron Data Management" 22 .

Conclusion and Future work
DCAT 2 is a metadata schema that facilitates data catalogs' interoperability on the Web.DCAT gives people and machines a specific and domain-independent approach to create catalogs that express the core elements of a dataset description in a standardized way that is suitable for publication on the Web, and enables cross-domain interoperability by being used either on its own or alongside, as a complement to other data catalog standards.Thanks to this, DCAT facilitates effective search and retrieval and permits easy scaling up of the query process either through "frictionless" aggregation of dataset descriptions and catalog records from many different sources and domains, or by applying the same query across multiple catalogs and aggregating the results.These patterns can also be varied slightly so as to provide communities with tailored approaches to the dataset catalog that respect the specific nuances of a particular type of data.
DCAT 2 is designed as a community effort by DXWG, adheres to design principles specifically suited to establish it as a lingua franca for exchanging data coming from different catalogs.In particular, the back compatibility with the previous version aims at preserving existing implementations; the reuse of terms from consolidated metadata vocabularies eases the interoperability promoting the adoption of cross-vocabulary modeling patterns; the minimization of ontology commitment opens to its reuse and specialization from the different domain communities; the Open-World Assumption unlocks DCAT complementation with other existing metadata vocabularies.
Version 2 builds on the initial work published in 2014 by providing, among other things, classes of descriptors that can be used for data services, and a wider set of relationships characterizing datasets and their temporal and spatial aspects.It also removes the constraints that were inherent in the prescribed use of some vocabulary terms for relationships (properties) that were present in its original version, so making their usage pattern more flexible.
DCAT editors and DXWG support DCAT 2 adopters by assisting the specific doubts and issues via the DXWG public mailing list 23 and related GitHub space 24 .Further DCAT releases are planned, DXWG is discussing including a more explicit notion of data series and versioning in DCAT.Going forward, the WG expects the incorporation of classes to describe data services into the model will make DCAT an increasingly useful tool in data science and provide a well-trodden path for those implementing the FAIR principles to follow.

Figure 1 :
Figure 1: Overview of DCAT schema, showing the classes of resources that can be members of a Catalog, and the relationships between them.Classes and terms newly introduced by DCAT 2 are highlighted in the figure by the plus sign.

4. 4 . 1 .
DCAT  2 new features in Catalog and Catalog Record.DCAT 2 clarifies the scope of DCAT catalogs.DCAT was originally conceived to model data catalogs.DCAT 2 opens to novel first-class cataloged resources providing dcat:Resource as an extension point for communityspecified cataloged resources (see Issue 172 and section 4.2).It adds dcat: DataService for representing data services and subsumes dcat:Dataset and dcat:DataService with dcat:Resource.It provides properties to deal with the new kinds of cataloged resources (see Issue 116): dcterms:hasPart, to specify a cataloged resource irrespective of its type; dcat:service, to specify a cataloged data service.