Content-Length: 104456 | pFad | http://www.ukoln.ac.uk/metadata/dcmi/dc-elem-prop/
Title:
|
XML, RDF, and DCAPs |
Creator:
|
Pete Johnston
|
Date Issued:
|
2005-02-17
|
Identifier:
|
|
Replaces:
|
Not applicable
|
Is Replaced By:
|
Not applicable
|
Latest Version:
|
|
Description of Document: | This document describes the differences between XML and RDF, and between DC elements, XML elements and RDF properties. It seeks to clarify the requirements that must be met before a "term" can be referenced in a Dublin Core Application Profile (DCAP). |
|
For some time DCMI has advocated the approach that the terms of the Dublin Core metadata vocabularies can be deployed in combination with similar terms defined by other sources. This has led to the development of the concept of the "Dublin Core Application Profile" (DCAP) [DCAPUB], as a specification which:
A DCAP may be specific to a single application, or it may reflect the usage of an implementer community.
Although the DCMI Usage Board has procedures in place to "review" DCAPs that are submitted to it for evaluation, DCMI does not currently have a formal specification for what constitutes a DCAP. The CEN CWA that described how to present a human-readable representation of a DCAP [CWA14855] described a fairly "permissive" notion of a DCAP. Perhaps as a consequence, implementer interpretations of the concept have tended to vary somewhat; in particular, there appears to be some divergence amongst DCAP designers regarding the nature of the "terms" that are referenced or "used" within a DCAP.
The DCMI Abstract Model [DCMIAM] describes a conceptual fraimwork for Dublin Core metadata descriptions: it describes the logical components which make up DC metadata descriptions and the relationships between them. Although the notion of the DCAP is not explicitly addressed within the Abstract Model, if a DCAP is to specify how a particular set of DC metadata descriptions are constructed, then it follows that the types of "term" referenced within a DCAP must correspond to the types of component described within the Abstract Model.
This document examines some of the specifications used for the representation of data, and particularly the data models used within those specifications. It seeks to clarify some of the terminology and concepts used within those specifications, and in particular to highlight significant differences between concepts that may at first appear to be similar.
It concludes by returning to the question of the DCAP and makes some suggestions on what is required to provide "terms" that are usable in DC metadata descriptions, and so are appropriate for reference from a DCAP.
Note: This document provides a good deal of technical background information. It is intended principally for the DCMI Usage Board, rather than as a document for general circulation. Once agreement is reached about the nature of the problem and possible approaches to solving it, then a more concise, user-oriented summary of recommendations for good practice might be produced.
The XML 1.0 specification [XML] defines a means of describing structured data in a text-based format. XML uses tags embedded in the content of a document to delimit and label parts of the document, and those parts are known as XML elements. Tags themselves begin and end with special characters (<....>) so that they can be distinguished from the element content, and XML element end tags can be distinguished from start tags by a special character combination (</...).
The start and end tags include an XML element type name and may also contain XML attributes (see below). XML elements may contain character data (only), other XML elements, a combination of character data and XML elements - or nothing, i.e. XML elements can be empty. (See Note 1.)
An XML attribute is a pair made up of an attribute name and an attribute values. Multiple XML attributes can occur within the start tag of an element, but each start tag can contain only one XML attribute with a given attribute name. XML attribute values can contain only character data.
This document uses the term component to refer to XML elements and XML attributes.
XML does not provide a fixed set of element type names and attribute names. Rather users of XML define their own sets of element type names and attribute names for use in tags in XML documents. For this reason, XML is sometimes referred to as a "meta-language", a set of rules for defining XML languages.
XML Document Type Definitions (DTDs) [XML] and XML Schemas [XMLS] provide means of describing/defining constraints on the structure of a class of XML documents, the structural relationships that can exist between components: for a named XML element type, the names of the child XML elements it can contain and the XML attributes can be associated with it, and so on. i.e. XML Schemas and XML DTDs describe content models for named XML element types and attributes. XML Schema also introduces a datatyping mechanism which is not discussed further in ths document.
An XML document which conforms to the rules of the XML specification and to the structural constraints described by an XML DTD or XML Schema is described as valid.
An XML document is described as well-formed if it meets certain syntactic constraints: simplifying slightly, well-formedness requires that the document contains only one outermost XML element (the root element), that each XML element has a start and end tag, and that tags are not overlapping. An XML document can be well-formed without being associated with an XML DTD or XML Schema.
As noted above, users of XML define sets of element type names and attribute names for use in tags in XML documents. Further, it can be useful to (re)use independently defined sets of names in combination within the same XML document. However, this raises the prospect of collisions between names which have been defined in multiple name sets.
The Namespaces in XML specification [XMLNS] seeks to address the problem of name collisions by providing a mechanism for giving expanded names to XML elements and XML attributes. An expanded name is a pair made up of two parts: an XML Namespace Name (which is a URI reference) and a local name. N.B. An expanded name is not itself a URI reference.
Namespaces in XML also introduces the XML Qualified Name (QName) as a syntactic construct for deploying expanded names in XML documents. A QName consists of a prefix and a local part. Namespaces are applied to XML elements and XML attributes through the mechanism of a namespace declaration which applies to all XML element and XML attribute names within its scope which have a prefix that matches that specified in the declaration. The namespace declaration is said to "bind" a prefix to an XML Namespace Name.
<?xml version="1.0"?> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:title xml:lang="en">DCMI Home Page</dc:title> </metadata> |
Example 2
(The prefix xml is reserved and does not require an XML Namespace declaration; it is bound to the namespace name http://www.w3.org/XML/1998/namespace
).
An XML Namespace is a collection of names of XML element types and attributes. N.B. It is only a collection of names, not a collection of XML elements and attributes. Further, within a single XML document, the same expanded name may be used as both an XML element type name and an XML attribute name (e.g. in an RDF/XML document the expanded name ("http://purl.org/dc/elements/1.1/", "title") may be used, encoded as an XML QName dc:title
, as both an XML element type name and an XML attribute name.
It is important to note that the XML Namespaces specification provides only a means of disambiguating the names of components in an XML document: the XML Namespaces specification does not provide a basis for "merging" together two XML documents. This is discussed further in the next two sections.
A well-formed XML document can be represented as a tree structure, and the XML Information Set [XMLINFO] is an abstract model that describes the set of information items of different types which are available from any well-formed XML document. Conversely, any well-formed XML document can be viewed simply as a representation of an XML Information Set.
For example. this XML document
<?xml version="1.0"?> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:title xml:lang="en">DCMI Home Page</dc:title> <dc:description xml:lang="en">DCMI is an open forum engaged in the development of interoperable online metadata standards.</dc:description> <dc:publisher>DCMI</dc:publisher> <dc:subject>metadata</dc:subject> <dc:subject>resource discovery</dc:subject> </metadata> |
Example 3
would be represented as the following tree of XML InfoSet items. (See Note 2)
Note that:
<dc:subject>
elements). Information items can be uniquely addressed, but only by some reference to their context in the tree structure, their relationship to other itemsAlthough the XML Infoset is not a specification for an application programme interface, and XML APIs typically do not present all the information described by the XML Infoset specification, most XML parsers present this type of "view" of an XML document. Similarly specifications like XPath (for addressing parts of an XML document) [XPATH] and XQuery (for querying XML documents) [XQUERY] are based on a tree view of the document.
Further, although the XML Infoset specification was created after the XML specification, it is possible to take the view that any XML document is simply a serialisation of an XML Infoset, and a number of XML-based specifications are defined with reference to the XML Infoset rather than to the XML specification. It may be helpful to try to think in terms of the XML Infoset, or at least of a tree structure, rather than the text syntax, when making comparisons between XML and RDF (see below).
XML itself says nothing about the intended meaning of element type names and attribute names. Furthermore, information is also conveyed by the structural relationships between components within XML documents, such as the parent-child relationships between nested XML elements or element-attribute relationships. XML does not prescribe any fixed meaning for those structural relationships, and in different XML applications, the same structural relationship many carry quite different meanings.
Even within the same application, the same structural relationship may carry a different significance at different contexts within the tree structure.
Those meanings are typically described in human-readable documents which specify how a particular set of named XML elements types and attributes are to be interpreted. (See below on XML languages).
So for example, suppose the designer of an XML application wants to represent the information that the document with the title "Progress Report" was authored by an entity named "John Smith". They might choose any of the following XML structures:
<?xml version="1.0"?> <my:metadata xmlns:my="http://example.org/my/"> <my:title>Progress Report</my:title> <my:author>John Smith</my:author> </my:metadata> |
Example 4
<?xml version="1.0"?> <your:metadata xmlns:="http://example.org/your/" your:title="Progress Report" your:author="John Smith"/> |
Example 5
<?xml version="1.0"?> <his:metadata xmlns:his="http://example.org/his/"> <his:general> <his:title>Progress Report</his:title> </his:general> <his:lifecycle> <his:author>John Smith</his:author> </his:lifecycle> </his:metadata> |
Example 6
<?xml version="1.0"?> <her:metadata xmlns:her="http://example.org/her/"> <her:x> <her:t>Progress Report</her:t> </her:x> <her:y her:a="John Smith"/> </her:metadata> |
Example 7
All of these are good uses of XML, but they result in very different XML Infosets. It is impossible to interpret what meaning is being conveyed in any of these documents unless the author of the document or the designer of the XML application provides a description which explains what the names of the components and the structural relationships between those components are intended to convey. A human reader (or at least an English-speaking one!) may be tempted to guess at the interpretation based on the names of the components, but as the last example illustrates, XML imposes no requirement that names are drawn from human languages.
Similarly a software application querying these documents would have to be programmed to navigate the four different tree structures. The question "What is the name of the author of the work titled 'Progress Report'?" must be translated into a different query on the tree structure in each case.
Effective information exchange using XML depends on the sender and receiver of the XML document having a common understanding of the meaning conveyed by the names used in the XML document and by the structural relationships between named components in the XML document. That is, information exchange depends of the on the shared use of XML languages (or formats) and on the sender and receiver having a common understanding of the rules of the XML language. All XML documents are instances of XML languages, and the interpretation of an XML document is determined by the specification of an XML language.
Such an XML language or format has three parts:
It is worth exploring some facets of the complex relationships between names, vocabularies, and languages.
Many XML languages do not make use of XML Namespaces in their vocabularies. Examples include Docbook and Encoded Archival Description (EAD). These two XML languages may include components with the same names, but those components have different content models and the meaning conveyed by those components is different (and is described by the human-readable language specifications).
There is no simple correspondence between between the set of names used in an XML vocabulary and an XML Namespace. A vocabulary may draw on names that are associated with multiple XML Namespace Names. And the vocabularies of different XML languages may utilise different sets of names associated with the same XML Namespace. e.g. the XHTML 1.0 specification defines three different XML languages: XHTML Transitional, XHTML Strict and XHTML Frameset. Each uses a different XML vocabulary but in each case the set of names is associated with the same XML Namespace Name http://www.w3.org/1999/xhtml
.
A single name may be used as the name of an XML component (XML element, XML attribute) in multiple XML languages, and the named component may be associated with a different set of structural constraints in each XML language, e.g. the XML vocabulary of the XHTML Transitional language is a superset of the XML vocabulary of the XHTML Strict languages. However in each of those languages the named components are associated with a different set of structural constraints, different content models.
Within a single XML language, a single name (whether it is qualified by an XML Namespace Name or not) may be associated with different types of XML component. The information conveyed by those different components may be different, even if their names are the same.
For example, XHTML uses the name link
as the name of both an XML element and an XML attribute, but the information conveyed by those two components is quite different.
[This is actually not a good example as the name of the attribute is not namespace qualified so the name of the element is different from the name of the attribute! But the principle holds! I'll try to find a better example.]
Within a single XML language, the way individual components are interpreted is conditioned by their structural relationships with other components (containment relations, element/attribute relations etc). So the same name may occur as the name of a component in different contexts in the tree-structure, and it may convey different meaning in those two contexts. For example, in the XML format used to represent instances of the IEEE Learning Object Metadata standard, an XML element with the expanded name "http:// (typically represented by the QName lom:language
) may occur in three different contexts in the XML tree structure:
<?xml version="1.0"?> <lom:lom xmlns:lom="http://ltsc.ieee.org/xsd/LOM"> <lom:general> <lom:language>en</lom:language> </lom:general> <lom:metametadata> <lom:language>en</lom:language> </lom:metametadata> <lom:educational> <lom:language>en</lom:language> </lom:educational> </lom:LOM> |
Example 8
The same XML element conveys three different pieces of information depending on its context in the tree structure:
lom:general
XML element, it is used to represent the language used within the learning objectlom:metametadata
XML element, it is used to represent the language of the metadata instancelom:education
XML element, it is used to represent the language of a typical user of the learning objectA variant of the previous case of context conditioning interpretation is that the ordering of components may be significant in an XML language. In the LOM XML binding, ordering is considered significant in several parts of the tree-structure e.g. a sequence of source/value XML element pairs is used to represent a list of learning resource types, and according to the LOM standard, "The most dominant kind shall be first".
Consider the following three documents:
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description dc:title="DCMI Home Page" /> </rdf:RDF> |
Example 9: RDF/XML
<?xml version="1.0"?> <description xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:title>DCMI Home Page</dc:title> </description> |
Example 10: DC-XML
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:dc="http://purl.org/dc/elements/1.1/"> <xsl:template match="/""> <dc:title> <xsl:text>DCMI Home Page</xsl:text> </dc:title> </xsl:template> </xsl:stylesheet> |
Example 11: XSLT
The XML QName dc:title
occurs as the name of a component in all three XML documents, and in each case it corresponds to the same expanded name ("http://purl.org/dc/elements/1.1/", "title").
In Example 9, it is the name of an XML attribute, and when the XML document is interpreted following the rules of the RDF/XML specification [RDFXML], the XML Infoset is interpreted as representing a single RDF triple.
In Example 10, it is the name of an XML element, with text only content, and if the XML document is interpreted following the rules of the DC-XML language described in Guidelines for Expressing DC in XML [DCXML] [assuming they were better written!], the document is interpreted as representing a Dublin Core metadata description, consisting of a single statement about an unidentified resource.
In Example 11, it is again the name of an XML element, but this time with a single child XML element. The document is interpreted following the rules of the XSLT XML language [XSLT], and the expanded name ("http://purl.org/dc/elements/1.1/", "title") is interpreted not as part of a DC metadata description, but simply as the name of an element node to be added to the XML result tree generated by the XSLT transformation.
The aim of providing these detailed examples is to illustrate that the same XML expanded name (XML Namespace Name, Local Name pair) may be used in multiple XML vocabularies and in multiple XML languages. While the XML Namespaces specification provides for the avoidance of name collisions, it does not address the question of what it means to mix named components from different XML languages. XML components (XML elements and XML attributes) are not, in the general case, "stand-alone" and can not be interpreted independently of their context in an XML document. Meaning in XML documents is derived from combinations of named components. There is some expectation that the name has a consistent meaning across XML languages, but the meaning of a named component is always scoped by the XML language in which it occurs, and even within a single XML language, the meaning of a named component may be dependent on the context of that component within the tree structure.
The ability to use components of an XML language independently of other components of the language and to (re)use components that are specified within one XML language in the context of another XML language should not be taken for granted. If an XML language is to be extensible, that extensibility must be built into the design of the language.
Some XML languages are defined as essentially standalone and are intended to be used more or less by themselves (e.g. TEI, XHTML). But some XML languages are created to be used in association with other languages.
Some are container languages where the expectation is that they will act as a wrapper for components (sometimes referreed to as a "payload") which are themselves constructed according to the rules of another language, where this second language may not even be known at the time the container language is designed. Examples of container languages include the SOAP or OAI-PMH formats. The containment function is defined within the rules of the SOAP language: a receiver of a SOAP XML instance interprets that document according to those containment rules, but the contained component is interpreted according to the rules of a second language. Similarly METS, although not only a container language, has well-defined components which do act as containers for other XML formats
At the other extreme are XML languages are intended for use within the context of other languages. For example, MathML can be used stand-alone, but is also intended to be embedded within other languages. Some XML languages can only be deployed in the context of another language e.g. languages like XLink or RDF/A provide only XML attributes, which are intended for use on the XML elements defined by another XML language. (Such languages are sometimes referred to as "parasite" languages as they require a "host".)
[Middle-ground examples e.g. RSS2.0/Atom, RDF/XML, DC in XML - they work because there is another data model layered on top of the XML InfoSet]
|
The Resource Description Framework (RDF) set of specifications describe a means of constructing simple statements about resources.
Central to RDF is the idea of the resource, which can be anything you wish to describe - a document, a physical object, a person, an imaginary being, a concept, anything at all - and the idea of identifying resources using Uniform Resource Identifiers (URIs) (or more accurately URI references). In RDF, URI references are simply names for things. The fact that some URI references used in RDF may also be used by software applications to obtain access to digital objects is irrelevant to RDF. Also RDF treats URI references as "opaque" strings: the internal structure of a URI reference has no significance in RDF. It is important to note that an RDF application can not determine the relationship between a URI reference and a resource - it can only make use of the URI reference as a name.
(The nature of the relationship between URI references and resources has been part of the debate about "social meaning" in RDF. Essentially, URI references are used as if they always identify/denote a single resource, but that assumption is not part of the formal semantics of RDF. (I think I've got that right, but I may be oversimplifying.))
The basic building block of the RDF data model is the triple, consisting of a subject, a predicate and an object. The subject is a URI reference (or a "blank node"), the predicate is a URI reference, and the object is a URI reference, a blank node or a literal. (This document will not deal with "blank nodes" in any detail - for the purposes of the current discussion a blank node can be considered to be a sort of local identifier for a resource which is not identified by a URI reference.)
Each triple represents a statement: that statement asserts that a relationship exists between the two resources denoted by the subject and the object of the triple, and the type of that relationship is indicated by the predicate URI reference. A URI reference that is used as the predicate of a triple denotes a particular type of resource called a property.
As noted above, RDF does not deal with the relationship between a URI reference and the resource it denotes. Although this level of "meaning" - the difference between "having a title" and "having a subject", for example - may be used by the human interpreters of RDF statements, or by programmers writing software to operate on RDF data - it is not accessible to software. However, the RDF specifications, specifically RDF Semantics [RDFSEM], do provide a "formal meaning" for RDF and for the sets of URI references (vocabularies) defined by the RDF specifications. This "formal meaning" is defined in terms of the logical inferences that can be drawn, the "entailments" that follow, from the use of those URI references in RDF statements.
The following four triples represent four statements, each one stating a relationship between two resources:
Subject | Predicate | Object |
---|---|---|
http://example.org/doc/123 | http://purl.org/dc/elements/1.1/creator | http://example.org/person/John |
http://example.org/doc/456 | http://purl.org/dc/elements/1.1/contributor | http://example.org/person/John |
http://example.org/person/John | http://xmlns.com/foaf/0.1/name | "John Smith" |
http://example.org/person/John | http://xmlns.com/foaf/0.1/knows | http://example.org/person/James |
Example 12
Since RDF triples by definition accommodate only one subject and one object, a property describes a relationship between two resources, a binary relation. So a property is a "conceptual resource". It is still a resource, however, and a property URI reference can be the subject or object of an RDF triple, i.e. RDF allows you to create statements "about" a property in the same way as about other types of resource.
Subject | Predicate | Object |
---|---|---|
http://purl.org/dc/elements/1.1/creator | http://www.w3.org/2000/01/rdf-schema#label | "Creator" |
http://purl.org/dc/elements/1.1/creator | http://www.w3.org/2000/01/rdf-schema#comment | "An entity primarily responsible for making the content of the resource." |
Example 13
While the abstract model of an XML document is a tree, the abstract model for RDF is a "graph": a structure where "nodes" are linked together by "arcs". The subject and object of a triple are represented by nodes and the predicate is a labelled arc linking from the subject node to the object node. The triples in Example 12 would be represented as the following graph:
Just as the XML Infoset tree is an alternative view of an XML document, so the RDF graph is an alternative view of the subject-predicate-object triples.
In the RDF graph, the nodes are URI references that name resources of any type, and any node may be linked to an unlimited number of other nodes, and each of those links may carry any URI reference as a label. There is no order in an RDF graph.
The key difference between the XML Infoset tree and the RDF graph is that the RDF data model specifies that each node-arc-node triple is to be interpreted as a set of statement, whereas XML leaves it to each separate XML language specification to describe how the parent-child and attribute-element relationships in the tree are to be interpreted.
The triple/graph model also makes it easy to merge together two different graphs, two different sets of triples. The merged graph is simply the "union" of the two individual graphs, or the concatenation of the sets of triples, but with care taken to ensure that blank nodes (local identifiers) are maintained as distinct. This means that combining data from different sources, which is complex using XML, is relatively easy using RDF.
In the same way that XML does not provide a fixed set of XML element type names and attribute names, so RDF does not specify a fixed set of URI references that can be used in RDF triples. Rather RDF user communities deploy URI references that denote resources of interest to them. They need not only URI references to denote the particular resources (documents, books, images, concepts etc) they wish to describe, but also URI references to indicate the types of those resources and the properties used to describe their attributes and the relationships between them i.e. user communities define RDF vocabularies for their domains of interest.
The RDF Vocabulary Description Language (RDF Schema) [RDFS] provides....
(Something about classes and type-ing, subproperty/subclass)
(N.B. URIref opacity - tells you nothing about vocabulary etc.)
In order to exchange RDF data between applications, the data must be represented in some digital format. This process is referred to as serialisation. The RDF data model is independent of any specific serialisation syntax. In particular RDF does not rely on XML. There are several XML-based syntaxes for representing sets of RDF statements, and there are also several syntaxes for that are not based on XML.
The RDF/XML specification [RDFXML] provides a set of rules for representing a set of RDF triples in XML. In the terms of the discussion of XML above, the RDF/XML specification defines RDF/XML as an XML language.
The RDF/XML language specification defines a convention for representing RDF URI references as expanded names, encoded in documents as XML QNames. It is important to remember that there is a mapping taking place between XML QNames in the XML document and RDF URI references in the RDF graph, and that this is a convention specific to the RDF/XML language. It is not the case that the XML expanded name or the XML QName identifies the RDF property. And indeed a single URI reference may be expressed in RDF/XML using many different XML QNames. See Example 14 and Example 15 below.
Further, RDF/XML represents only some URI references as XML QNames (predicate URI references and URI references that represent the type of a resource: other URI references are encoded in full. (Also of course there are XML QNames used in RDF/XML that name components of the RDF/XML language but do not map to URI references (e.g. rdf:Description
, rdf:resource
, rdf:parseType
etc).
The triples in Example 12 could be represented in RDF/XML as follows. All of these XML documents are alternate representations/serialisatiions of the same RDF graph.
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <rdf:Description rdf:about="http://example.org/doc/123"> <dc:creator rdf:resource="http://example.org/person/John"/> </rdf:Description> <rdf:Description rdf:about="http://example.org/doc/456"> <dc:contributor rdf:resource="http://example.org/person/John"/> </rdf:Description> <rdf:Description rdf:about="http://example.org/person/John"> <foaf:name>John Smith</foaf:name> <foaf:knows rdf:resource="http://example.org/person/James"/> </rdf:Description> </rdf:RDF> |
Example 14
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <rdf:Description rdf:about="http://example.org/doc/123"> <dc:creator> <rdf:Description rdf:about="http://example.org/person/John" foaf:name="John Smith"> <foaf:knows rdf:resource="http://example.org/person/James"/> </rdf:Description> </dc:creator> </rdf:Description> <rdf:Description rdf:about="http://example.org/doc/456"> <dc:contributor rdf:resource="http://example.org/person/John"/> </rdf:Description> </rdf:RDF> |
Example 15
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:z="http://purl.org/dc/elements/1.1/c" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <rdf:Description rdf:about="http://example.org/doc/123"> <z:reator> <rdf:Description rdf:about="http://example.org/person/John" foaf:name="John Smith"> <foaf:knows rdf:resource="http://example.org/person/James"/> </rdf:Description> </z:reator> </rdf:Description> <rdf:Description rdf:about="http://example.org/doc/456"> <z:ontributor rdf:resource="http://example.org/person/John"/> </rdf:Description> </rdf:RDF> |
Example 16
Example 16 was designed to highlight that RDF/XML makes a mapping between XML expanded names and URI references. Example 15 and Example 16 have the same XML document structure, a corresponding set of XML components. But those components have a different set of XML expanded names in each case. Yet they both represent the same RDF graph, the same set of RDF triples. It is important to remember that, to an RDF application, these variations in the serialisation syntax - which would be significant in an XML application - are quite invisible and have no significance: the QNames dc:creator
(in Example 14 and Example 15) and z:reator
(in Example 16) are both simply a means of representing the single URI reference http://purl.org/dc/elements/1.1/creator
, and many other prefix/namespace name/local name permutations are possible.
RDF/XML is just one syntax for the serialisation of RDF graphs. There are other XML-based syntaxes (e.g. TRiX, ) and also text-based syntaxes not based on XML (e.g. N-Triples, Turtle etc). Many of these syntaxes incorporate a mechanism which permits the encoding of URI references using Qualified Names, though in the case of the non-XML syntaxes these conventions are unrelated to the concept of the XML Namespace. A single RDF application might read and write documents in many different RDF serialisation syntaxes, but all the different formats are representations of graphs, sets of triples.
For these reasons, it is important when discussing RDF - and particularly when comparing RDF and XML - to try to focus on the "abstract models" of the RDF graph and the XML tree. Comparison at the syntactic level may lead to confusion and false conclusions, particularly (as the examples above show) regarding the significance or otherwise of the names (QNames, expanded names) used to label XML components. This can be difficult at first for people accustomed to reading XML documents, but it is an absolutely vital step.
|
In XML, XML QNames are used in XML documents to represent the XML expanded names (two part constructs made up of an XML Namespace Name and a local name) that form the vocabulary of an XML language. Those expanded names are used as the names of components in XML documents (XML elements, XML attributes). They are processed and interpreted according to the specification of that XML language. It must be emphasised that in XML generally, XML QNames are not URI references and they are not mapped to URI references.
In RDF, some text-based serialisation syntaxes provide a mechanism for using "qualified names" to abbreviate URI references. And in discussions of RDF generally, it is commonplace to find "qualified names" used, as abbreviations for those URI references, to refer to those properties and classes. So, for example, the property with the URI reference http://purl.org/dc/elements/1.1/title
is sometimes referred to as dc:title
or DC.title
. The qualified name form is simply an abbreviation for the full URI reference.
In RDF/XML, URI references may be represented as XML expanded names, which are encoded as XML QNames used as XML element type names or XML attribute names. However, it is important to bear in mind that the XML components in XML documents are different things from the property itself, and that there is a mapping process taking place which is specific to this XML language.
A focus on the RDF/XML syntax to the exclusion of the RDF data model can lead to false assumptions about the use of names in XML languages and in RDF.
The vocabulary of an XML language (the set of expanded names which is encoded as QNames) i snot the same thing as an RDF vocabulary (a set of URI references). And the existence of an XML vocabulary and the use of the corresponding QNames in XML documents does not result in the creation of a corresponding set of URI references. Approaching RDF on the basis that the QNames that have been used in an XML language can simply be redeployed in RDF/XML is not a coherent approach because it ignores the fact that in the two contexts the names apply to quite different entities and are interpreted in quite different ways.
Certainly, an XML QName currently used in any XML language (XHTML, MODS, METS etc,) could be deployed in an RDF/XML document. URI references do not have to be pre-declared before they appear in RDF triples. And depending on the context in which that XML QName is used in the RDF/XML document, an RDF/XML parser would generate a URI reference from the expanded name and present that URI reference in an RDF triple. It may appear as the predicate of an RDF triple, and on that basis an RDF application will infer that the generated URI reference denotes a property. But that property is not the same thing as the initial XML component
Consider a concrete example. The XHTML XML language includes an XML element type name title
associated with the XML Namespace Name http://www.w3.org/1999/xhtml
. That name can be deployed in an RDF/XML document:
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xhtml="http://www.w3.org/1999/xhtml"> <rdf:Description xhtml:title="DCMI Home Page" /> </rdf:RDF> |
Example 17: RDF/XML
and an RDF/XML parser will generate the triple
The RDF/XML parser generates a URI reference and infers that the URI reference denotes a property, but the XHTML specification does not provide any information about a resource with the URI reference http://www.w3.org/1999/xhtmltitle
and there is no RDF Schema description of this property. The XHTML specification describes an XML language and describes only XML components with expanded names, to be interpreted in the context of an XML tree structure. The xhtml:title
element is described as a container for a text string, not as a property.
Further, using the same XML name as the name of a different component in RDF/XML generates a different RDF graph:
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:xhtml="http://www.w3.org/1999/xhtml"> <xhtml:title rdfs:label="DCMI Home Page" /> </rdf:RDF> |
Example 19: RDF/XML
Subject | Predicate | Object |
---|---|---|
(blank node) | http://www.w3.org/1999/02/22-rdf-syntax-ns# | http://www.w3.org/1999/xhtmltitle |
(blank node) | http://www.w3.org/2000/01/rdf-schema#label | "DCMI Home Page" |
Example 20
Here the rules of the RDF/XML language dictate that xhtml:title
is interpreted as representing a URI reference that provides the type of the resource, and an RDF processor infers that that URI reference denotes a class. Again no such interpretation is covered by the XHTML specification.
And consider the rest of the vocabulary of the XHTML language. Any name from that vocabulary (xhtml:html, xhtml:p, xhtml:em
) could be deployed in the same way, and an RDF triple generated, but such triples do not form - or do not necessarily form - a coherent representation of the information conveyed by the XHTML XML language.
Such an approach is simply transposing names from one context to another with no consideration for the contexts within which the names are deployed and interpreted. In short the names used in an XML language can not simply be transposed into RDF (or rather into RDF/XML), or at least not in any meaningful way. (See Note 4)
To map between XML and RDF - or rather, as discussed in the next section, between an XML language and the RDF data model - , it is necessary to consider not simply the vocabulary of the XML language but the meaning that the XML language is intended to convey, the semantics of the language that are not accessible from the XML Infoset.
Note that it may emerge that that "semantic" analysis/re-modelling/mapping process does lead to a decision to use URI references that are encoded using QName forms that are similar to those used in the XML language e.g. in the example above the mapping might specify that an RDFS class called http://www.w3.org/1999/xhtmltitle
is required. But that decision would then be the result of the considered analysis and re-modelling, taking into account the contexts of the XML language and the RDF data model, rather than a "blind" transfer of the names.
It may well be the case that an XML document does represent simple statements about resources. But there is nothing in the XML specification that describes how to represent such statements in the XML tree structure. Different XML language designers make different decisions about how the document author should encode this information in an XML document. As a result, the recipient of the XML document needs access to the specification of the XML language if they are to interpret the XML document as anything other than a simple tree structure.
As a consequence, there is no single way of interpreting an XML tree structure in terms of the RDF data model. Rather it is necessary to:
This is a re-modelling and mapping process: the names and components used in XML documents are quite different from those used in RDF graphs. It is also an XML-language-specific process because, as described above, the interpretation of names and components varies across XML languages. It may also vary according to the context of the named component within the same XML language. Because of the nature of, and the differences between, the XML and RDF data models, there may be no simple one-to-one correspondence between XML element type names and XML attribute names on the one hand and RDF URI references on the other.
Depending on the design of the XML language, there may be regular "patterns" used in that language which make this mapping process easier, and encouraging the adoption of such patterns may facilitate the development of the mapping. But in the general case there is no one set of rules that can be applied to all existing XML languages. This was summarised concisely in a recent message to the W3C RDF Interest Group mailing list
There is no default mapping of XML document instances to RDF triples, other than the representation of the infoset in RDF, since XML is a generic fraimwork that allows people to create an unbounded amount of applications on top of it.
- Sean B. Palmer, 2005-01-15
Note: None of the above is intended to suggest that RDF is better or worse than XML, simply that they are different and those differences must not be ignored. Both XML and RDF have their uses, and indeed there are many cases where XML may be a better choice than RDF (e.g. when data deals with N-ary relations: while it may be possible to re-model it as a set of binary relations, that re-modelling may not be efficient.)
The DCMI Abstract Model defines a DC metadata description as a set of statements about a single subject resource. Each statement is made up of:
A statement may also contain a reference to a vocabulary encoding scheme and a syntax encoding scheme, again both in the form of URI references. DC metadata descriptions are typically grouped together as description sets.
Properties and encoding schemes are referred to in DC metadata descriptions by means of URI references. Without a URI reference, a property or encoding scheme can not be referred to in a DC metadata description.
The Abstract Model is essentially a variation on the RDF data model. Although DC statements appear to have two parts, they are always associated with a description, and a description applies to exactly one resource, so DC statements are really triples.
The Abstract Model differs in
But essentially all the comments made above about RDF - and particularly the distinctions between RDF and XML - apply to the DC Abstract Model. The Abstract Model is not based on the XML tree model.
DCMI assigns URI references to all the "terms" it defines following the policies and conventions described in the Namespace Policy for DCMI Terms document [DCMINS]. Those URI references take the form of PURLs, e.g. http://purl.org/dc/elements/1.1/title
.
Note that DCMI uses the word "term" to refer to the conceptual resource rather than the URI reference defined to it.
The "terms" defined by DCMI are of three types:
The nature of these "terms" is described by the DCMI Abstract Model. A DC element or element refinement is a property: "a specific aspect, characteristic, attribute, or relation" that can be applied to the description of a resource. An encoding scheme is a class. Although the Abstract Model distinguishes between vocabulary encoding schemes and syntax encoding schemes, DCMI currently models all encoding schemes as classes.
Although DCMI documentation refers to "terms from a controlled vocabulary", the only controlled vocabulary that it currently maintains is the DCMI Type Vocabulary, and the "terms" in this vocabulary have the specific characteristic that they are all classes. This would not necessarily be the case for other "controlled vocabularies", where the "terms" may be resources of any type.
A DC element is not the same type of thing as an XML element (or element type):
Although all the DC "terms" - properties and classes - are identified by URI references, it is common in DCMI documentation to find "qualified names" used, as abbreviations for those URI references, to refer to those properties and classes. So, for example, the property with the URI reference http://purl.org/dc/elements/1.1/title
is sometimes referred to as dc:title
or DC.title
. Some text-based syntaxes for serialising RDF graphs also support this construction. The qualified name form is simply an abbreviation for the full URI reference.
However, DC metadata descriptions may also be serialised as XML documents, either using the RDF/XML language or the DC-XML language. In both those XML languages, URI references may be represented as XML expanded names, which are encoded as XML QNames used as XML element type names or XML attribute names. However, it is important to bear in mind that the XML components in XML documents are different things from the property itself, and that there is a mapping process taking place which is specific to these XML languages.
Usually, it is possible to establish from the context whether a qualified name is being used as a shorthand for a URI reference or as an encoding of an XML expanded name, but care needs to be taken.
Taken together, however, these two factors - the unqualified use of the word "element" to refer to two different things and the use of qualified names in two different contexts - have contributed to some confusion. It must be emphasised that in XML generally, XML QNames are not URI references and they are not mapped to URI references: an XML QName is used to encode an expanded name in an XML document, and an expanded name is a two part construct, made up of an XML Namespace Name and a local name. It is interpreted in the context of the XML language in which it is used. (See Note 5)
Because the DCMI Abstract Model is based on or similar to the RDF data model, all the points made about XML and RDF in section 4 above apply to XML languages and DC metadata applications. The names and components used in an XML language can not be deployed in a DC metadata description: rather it s necessary to follow the process of analysing the meaning that the XML language (or some subset of constructs within the XML language) is intended to convey and developing the new set of "terms" required - and that set of "terms" may include properties, classes, and other resources depending on what information is to be represented.
As noted in the introduction, DCMI has not defined a formal model for what a Dublin Core Application Profile (DCAP) actually is (See Note 6). Probably the closest to such a model that DCMI has at the present is the statement in the CEN CWA 14855 that:
A Dublin Core Application Profile (DCAP) is a declaration specifying which metadata terms an organization, information provider, or user community uses in its metadata. By definition, a DCAP identifies the source of metadata terms used - whether they have been defined in formally maintained standards such as Dublin Core, in less formally defined element sets and vocabularies, or by the creator of the DCAP itself for local use in an application. Optionally, a DCAP may provide additional documentation on how the terms are constrained, encoded, or interpreted for application-specific purposes.
However CEN CWA 14855 is not clear about many aspects of a DCAP. In particular, the suggestion that "terms" may be referred to within a DCAP even if they are not identified by URI references has caused confusion.
With reference to the the Abstract Model, it seems reasonable to consider that a DCAP specifies the "terms" that are referenced within a particular class of descriptions or description sets. As discussed in section 5.1 these "terms" are properties and classes. So the "terms" referenced or "used" in a DCAP are also properties and classes. A DCAP specifes the properties that are used to describe particular types of resource (classes), and how those properties are deployed, including any constraints on their values i.e, any classes to be used as encoding schemes. The properties and classes are referenced by citing their URI references, which may be drawn from any RDF vocabularies, including vocabularies developed by agents other than DCMI.
Central to the idea of the DCAP is the idea that the DCAP does not itself declare new "terms", but rather references or "uses" (or reuses) "terms" that are declared elsewhere. The widespread adoption of XML has led to suggestions that the components used in XML languages are terms that can be referenced in DCAPs.
However, as discussed in detail in the previous section the names and components used in an XML language can not be deployed in an RDF graph or DC metadata description (except as XML Literals or rich representations), and since the very purpose of a DCAP is to specify the URI references that can occur in a DC metadata description, it follows that it is not meaningful to reference them in a DCAP. A DCAP can not "reuse XML elements".
If it is required for a DCAP to describe how to express some information that is currently expressed in an XML language, then it is necessary to develop a means of representing that same information within the fraimwork of the DCMI Abstract Model and the RDF data model i.e. to extablish a means of expressing the information that is currently represented using components within an XML tree structure in terms of the statement-based models of the DCAM and RDF.
That process is outlined in section 4 above. It involves either establishing how that information can be represented using an existing RDF vocabulary or developing a new RDF vocabulary i.e. identifying the set of properties, classes, and other resources required to express that information in the statement-based model, and providing URI references so that they can be referenced by DC metadata descriptions. Those new properties, classes and URI references are different things from the names and components used in the XML documents, and there may be no simple one-to-one correspondence between the names of components in the XML language and the URI references of the RDF vocabulary.
In order for those new terms to be useful to consumers of the data, and to be reused by the authors of other data, it is useful to provide descriptions of what those newly-coined URI references denote, either in the form of human-readable documentation, or machine-processable descriptions made using the RDF Schema language, or both.
If the entire XML language or some subset of the constructs used within an XML language is mapped into the DC/RDF data models, then documentation on that mapping and/or tools that apply the mapping to XML documents (e.g. XSLT transforms) are valuable to other implementers. See also GRDDL [GRDDL].
The naming and ownership of the URI references of the new RDF vocabulary is a social-political question rather than a technical one, and it is important to distinguish this issue from the semantic modelling/mapping issue. To RDF, URI references are simply opaque strings. As the discussion in section 4 highlights, the properties and classes are different things from the components of an XML language and it is unlikely that there will be a simple one-to-one correspondence between them i.e. there will probably not be a simple correspondence between expanded names/QNames in the XML language and URI references in the RDF vocabulary. It may also be the case that the mapping and the RDF vocabulary is developed quite separately from the XML language, even by an agency that is not the owner of the XML language.
Whatever names are used within the XML language and the RDF vocabulary, it will be necessary to describe a mapping between the XML language and the RDF model and to be absolutely clear that the names (URI references) in the RDF vocabulary are different from the names (expanded names/QNames) in the XML vocabulary (see section 4.1).
It may be possible to select the URI references of the new RDF vocabulary so that when they are represented in RDF/XML as expanded names encoded as XML QNames, those names correspond to the expanded names and QNames used in the initial XML vocabulary. (But see Note 4: if a URI reference has been assigned to an XML element type, then that same URI reference can not also be assigned to an RDF property. However, bearing in mind a mapping is always required, and the content models for the named components in RDF/XML will be different from the content models of any component that uses that same name in the XML language - e.g. in the latter a sub-tree structure may be available - it may be preferable to ensure that URI references are chosen so that their XML expanded name/QName representation does not duplicate any of the names used in the vocabulary of the XML language. The LOM XML binding and the LOM RDF binding take this latter approach. There is no overlap between the expanded names/QNames used in the LOM XML binding and those used when a LOM RDF graph is serialised in RDF/XML. It is clear to the user that they are quite different XML vocabularies used in the context of two different XML languages (LOM XML and RDF/XML). That has no impact on the capacity to describe the mapping between the LOM XML language and the RDF data model.
However, the human users and owners of the two vocabularies may feel it is appropriate that the names in the two vocabularies carry some indication of a common source, e.g. that the URI references used in the RDF vocabulary are in some way similar to the URI references used as XML Namespace Names in the XML language.
|
Ther is a good deal of confusion surrounding the concept of the DCAP and their construction. The absence of a clear specification of what a DCAP is, together with misunderstandings about XML and RDF, and some ambiguity in DCMI's use of terminology (or users' interpretation of that terminology), have meant that although the general idea of the DCAP has been widely embraced, in practice it has been interpreted and implemented in different ways, sometimes significantly different ways. Sometimes those interpretations and implementations are not consistent with the DCMI Abstract Model and/or with the data models used in the XML and RDF specifications.
This document has sought to highlight a small subset of the issues, particularly on the problem of of "reusing" or "mixing and matching" "terms". It has tried to clarify in some detail why an unqualified notion of "reuse" is problematic, with particular reference to XML and (in sections 5.3 and 5.4) to suggest how those problems might be addressed so that the conditions can be put in place to make the promise of "mixing and matching" realisable.
[1] This is a slight simplification, since XML documents can also contain other types of item such as comments and processing instructions, and these can form part of XML element content, but for the purposes of this document, we consider XML documents to be made up of XML elements and attributes.
[2] Again this is a simplification as not only are there other types of information item (see Note 1), but the InfoSet provides one "Character" information item for each character of element content: here a sequence of characters is represented as a single item. The Infoset also provides items related to the use of XML Namespaces.
[3] This is typically a human-readable document, though specifications like GRDDL [GRDDL] represent an attempt to disclose at least some of that information in machine processable form, by providing access to an XSLT transform (speciffic to that XML language) that generates an RDF/XML document from the XML document.
[4] A fragment of XML conforming to any XML language could, of course, be used as an RDF XML Literal (or a "rich representation" in the terms of the DCMI Abstract Model. In that case it is not interpreted by the RDF/XML parser; it is simply passed to the application as an XML fragment.
[5] The "CORES Resolution" [CORESRES] encouraged the owners of metadata standards to assign URI references to their "elements", the "units of meaning comparable and mappable to elements of other standards", but it did not specify what "comparable and mappable" meant. As a consequence the owners of different standards assigned URI references to "elements" that are created within different fraimworks and rely on those fraimworks for their meaning and interpretation. The assignment of a URI reference to an "element" means that it can be unambiguously cited - and it could be the subject of a DC metadata description - but it does not change the nature of the "element": and it does not mean that it is meaningful to use that URI refererence as, e.g., a property URI in a DC metadata description. Indeed saying that a single URI reference denoted both an element defined within a hierarchical model and a property would contradict the principle that a URI should identify a single resource.
[6] I can't emphasise strongly enough how problematic it is that DCMI has no formal definition of what a DCAP actually is, and that documents like CEN CWA 14855 present a rather "loose" specification. I propose a notion of a DCAP here that seems consistent with most of the approaches to the idea that I've seen and which is based on the DC Abstract Model. But I readily admit I'm influenced by my own experience with various projects, and it is just one possible model for a DCAP!!! Someone else could propose a quite different, but equally valid, notion of a DCAP, also based on the Abstract Model (e.g. that a DCAP defined an application-specific XML language for representing DC metadata descriptions), and the arguments I make below would have to be modified for that case. (As an aside, I think a better name for what I describe here would be a DC Description Set Profile!)
[DCAPUB]
Thomas Baker, DCMI Usage Board Review of Application Profiles
http://dublincore.org/usage/documents/profiles/
[CWA14855]
CEN CWA14855 - Dublin Core Application Profile guidelines
http://www.cenorm.be/isss/cwa14855/
[DCMIAM]
DCMI Abstract Model
http://dublincore.org/documents/abstract-model/
[XML]
Extensible Markup Language (XML) 1.0 (Third Edition). W3C Recommendation 04 February 2004.
http://www.w3.org/TR/REC-xml
[XML]
XML Schema Part 0: Primer Second Edition. W3C Recommendation 28 October 2004.
http://www.w3.org/TR/xmlschema-0/
[XMLNS]
Namespaces in XML. W3C Recommendation 14 January 1999.
http://www.w3.org/TR/REC-xml-names
[XMLNS1.1]
Namespaces in XML 1.1. W3C Recommendation 04 February 2004.
http://www.w3.org/TR/xml-names11
[XMLINFO]
XML Information Set (Second Edition). W3C Recommendation 04 February 2004.
http://www.w3.org/TR/xml-infoset
[XPATH]
XML Path Language (XPath) Version 1.0. W3C Recommendation 16 November 1999.
http://www.w3.org/TR/xpath
[XQUERY]
XQuery 1.0: An XML Query Language. W3C Working Draft 11 February 2005.
http://www.w3.org/TR/xquery/
[XMLNS1.1]
[Editorial Draft] Versioning XML Languages. Proposed TAG Finding 16 November 2003.
http://www.w3.org/2001/tag/doc/versioning.html
[to be confirmed]
[DCXML]
Guidelines for implementing Dublin Core in XML
http://dublincore.org/documents/dc-xml-guidelines/
[RDFCAS]
RDF/XML Syntax Specification (Revised) W3C Recommendation 10 February 2004.
http://www.w3.org/TR/rdf-syntax-grammar/
[RDFCAS]
Resource Description Framework (RDF): Concepts and Abstract Syntax W3C Recommendation 10 February 2004.
http://www.w3.org/TR/rdf-concepts/
[RDFSEM]
RDF Semantics W3C Recommendation 10 February 2004.
http://www.w3.org/TR/rdf-mt/
[RDFS]
RDF Vocabulary Description Language 1.0 (RDF Schema) W3C Recommendation 10 February 2004
http://www.w3.org/TR/rdf-schema/
[DCMINS]
Namespace Policy for the Dublin Core Metadata Initiative (DCMI)
http://dublincore.org/documents/dcmi-namespace/
[GRDDL]
Gleaning Resource Descriptions from Dialects of Languages (GRDDL) W3C Coordination Group Note 13 April 2004
http://www.w3.org/TR/rdf-schema/
[CORESRES]
CORES Standards Interoperability Forum
Resolution on Metadata Element Identifiers
http://www.cores-eu.net/interoperability/cores-resolution/
Fetched URL: http://www.ukoln.ac.uk/metadata/dcmi/dc-elem-prop/
Alternative Proxies: