W3C

Proposal for XML Fragment Identifier Syntax 0.9

W3C Working Group Note 12 September 2003

This version:
http://www.w3.org/TR/2003/NOTE-xml-fragid-20030912
Latest version:
http://www.w3.org/TR/xml-fragid/
Previous version:
[none]
Editor:
Paul Grosso, Arbortext <paul@arbortext.com>

Abstract

A URI reference may include an optional fragment identifier that consists of additional reference information to be interpreted by the user agent after the URI has been successfully retrieved. The format and interpretation of fragment identifiers is dependent on the media type of the retrieval result. The XML media type can therefore specify a fragment identifier syntax that takes advantage of the XML structure to define ways to point into an XML resource. This document recommends the adoption of a specific fragment identifier syntax for use with XML resources.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This proposal for an XML fragment-identifier syntax was prepared by the XML Linking Working Group as part of its work, before its charter expired and the Working Group went out of existence in December 2002. It is published now (on 12 September 2003) to encourage discussion; comments on the proposal are invited.

This document represents the majority, but not unanimous, recommendation of the XML Linking Working Group. A minority would have preferred to omit the element() scheme from this recommendation.

This is a W3C Working Group Note for review by W3C Members, IETF members, and other interested parties. It has been produced as part of the XML Activity. Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Patent disclosures relevant to this specification may be found on the Working Group's public patent disclosure page.

This is work in progress and does not imply endorsement by the W3C membership. A list of current W3C Recommendations and other technical documents, including Working Drafts and Notes, can be found at http://www.w3.org/TR.

Comments and discussion on this document should be sent to www-xml-linking-comments@w3.org (public archive).

Table of Contents

1 Introduction
2 Some Considerations
3 Actual Proposal
4 Conformance

Appendix

A References


1 Introduction

RFC 2396 [RFC2396] defines the generic syntax for a Uniform Resource Identifier (URI). It defines a URI reference as a URI that may have additional information attached in the form of a fragment identifier. It describes fragment identifiers as follows:

When a URI reference is used to perform a retrieval action on the identified resource, the optional fragment identifier, separated from the URI by a crosshatch ("#") character, consists of additional reference information to be interpreted by the user agent after the retrieval action has been successfully completed. ...

The semantics of a fragment identifier is a property of the data resulting from a retrieval action, regardless of the type of URI used in the reference. Therefore, the format and interpretation of fragment identifiers is dependent on the media type [RFC2046] of the retrieval result. ... Individual media types may define additional restrictions or structure within the fragment for specifying different types of “partial views” that can be identified within that media type.

A fragment identifier is only meaningful when a URI reference is intended for retrieval and the result of that retrieval is a document for which the identified fragment is consistently defined.

The XML media type ([RFC2376] and [RFC 3023]) can therefore specify a fragment identifier syntax that takes advantage of the XML structure to define ways to point into an XML resource. This document suggests a specific fragment identifier syntax and recommends that the XML media type be augmented to adopt this syntax as the fragment identifier syntax for use with XML resources.

2 Some Considerations

The W3C XLink Working Group is developing the XPointer family of specifications to support addressing into the internal structures of XML documents, and it seems this work should be the basis for any fragment identifier syntax for XML. However, given that all processes are generally required to support fragment identifiers, there is a strong case to be made for assuring that the core of any fragment identifier syntax is as simple as possible. In consideration of this need for simplicity, this document recommends something less than full XPointer be adopted as the fragment identifier syntax for XML resources.

It is the position of this Working Group that the minimum requirement for an XML fragment identifier is the ability to identify any element in any XML resource.

One of the key methods of identifying XML elements are ID-typed element and attribute values. IDs are robust ways to identify a given element if the element that needs to be identified already has an ID or can have an ID added to it. But this requires that the element allow an ID attribute and in general that the document can be modified. However, there are many important user requirements for pointing into an XML resource where it is not practical or possible for the resource to be modified.

The element structure itself can be used to identify any specific element in a document without requiring any specific information (such as an ID). Taking advantage of the element structure allows for the unambiguous identification of any element in an XML resource without requiring any modification of the document.

Given the widespread use of fragment identifiers and the scope of XPointers in particular and URI references in general, it is not a user requirement to be able to write “query-like” statements in the fragment identifier. Furthermore, XML structures such as processing instructions, attributes, and comments are probably not in the 80/20 part of the solution, as the cost in terms of added complexity to the syntax and processing requirements of being able to access such structures is not worth the benefits.

Therefore, this document recommends the use of IDs and element structure to form the basis of the fragment identifier syntax for the XML MIME type.

3 Actual Proposal

This document recommends that the XML media type define its fragment identifier syntax to be that defined by the XPointer Framework [XPtrFrame] and XPointer Element() Scheme [XPtrElement].

The following are some fragment identifiers allowed by this definition:

Intro
SKU153976
element(/1/5/4/3)
element(Terminology/3/1/2)
xpointer(id('boy-blue')/horn[1])element(boy-blue/3)

Note that in the last example, a processor minimally conforming to the fragment identifier definition proposed in this document would not be required to recognize or handle the xpointer(id('boy-blue')/horn[1]) pointer part, it would merely be required (by conformance to the Framework specification) to properly skip over it to find and process the element(boy-blue/3) pointer part.

4 Conformance

Assuming the recommendation of this document is reflected in an update to the definition of the XML Media type, a processor claiming to be able to handle URI references to XML resources must be able to handle any valid URI reference with a fragment identifier than conforms to the XPointer Framework and XPointer element() Scheme Recommendations.

A References

RFC2046
IETF (Internet Engineering Task Force) RFC 2046: Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types, eds. N. Freed, N. Borenstein. Available at ftp://ftp.ietf.org/rfc/rfc2046.txt
RFC2376
IETF (Internet Engineering Task Force) RFC 2376: XML Media Types, eds. E. Whitehead, M. Murata. July 1998. Available at ftp://ftp.ietf.org/rfc/rfc2376.txt
RFC2396
IETF (Internet Engineering Task Force) RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax, eds. T. Berners-Lee, R. Fielding, L. Masinter. August 1998. Available at ftp://ftp.ietf.org/rfc/rfc2396.txt
RFC 3023
M. Murata, S. St.Laurent, and D. Kohn, RFC 3023: XML Media Types. Internet Engineering Task Force, 2001.
XPtrFrame
Paul Grosso, Eve Maler, Jonathan Marsh, and Norman Walsh, editors.XPointer Framework. World Wide Web Consortium, 2002.
XPtrElement
Paul Grosso, Eve Maler, Jonathan Marsh, and Norman Walsh, editors.XPointer element() Scheme. World Wide Web Consortium, 2002.