Copyright © 2003 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
The document specifies requirements for Full-Text search for use in XQuery [XQuery] and XPath [XPath].
This is a public W3C Working Draft for review by W3C Members and other interested parties. This section describes the status of this document at the time of its publication. It is a draft document and may be updated, replaced, or made obsolete by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress." A list of current public W3C technical reports can be found at http://www.w3.org/TR/.
The Full-Text Requirements have been defined jointly by the XQuery Working Group and the XSL Working Group (both part of the XML Activity).
This document is a work in progress. It contains many open issues, and should not be considered to be fully stable. Vendors who wish to create preview implementations based on this document do so at their own risk. While this document reflects the general consensus of the working groups, there are still controversial areas that may be subject to change.
Public comments on this document and its open issues are welcome. Comments should be sent to the W3C XPath/XQuery mailing list, public-qt-comments@w3.org (archived at http://lists.w3.org/Archives/Public/public-qt-comments/).
Patent disclosures relevant to this specification may be found on the XML Query Working Group's patent disclosure page at http://www.w3.org/2002/08/xmlquery-IPR-statements and on the XSL Working Group's patent disclosure page at http://www.w3.org/Style/XSL/Disclosures.
A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.
1 Introduction
2 Terminology
2.1 MUST
2.2 SHOULD
2.3 MAY
2.4 SCORE
2.5 Full-Text
Search
3 Language Design
3.1 The Data
Model
3.2 Side-effects on the data
3.3 Score Function and Full-Text
predicates
3.3.1 Predicate and Score
Independence
3.3.2 Score language
3.4 Score
algorithm
3.4.1 Return Score
3.4.2 Sort by Score
3.4.3 Type, Range of Score
3.4.4 Score Statistics
3.4.5 Semantics of Score
3.5 Combined
score
3.5.1 Score Combination
3.5.2 Score algorithm
vendor-provided
3.5.3 Score algorithm
overridable
3.5.4 Score influence
3.6 Extensibility
3.6.1 Extensible by vendors
3.6.2 Extensible by users
3.7 First,
Future Versions
3.8 End user language
3.9 Searchable query
3.10 Universality
4 Integration
4.1 XPath
4.2 Extensibility
Mechanisms
4.2.1 Integration into
XQuery/XPath
4.2.2 XQuery/XPath Full-Text
Extensibility
4.3 Composability
4.4 Human-readable
4.5 XML
syntax
5 Implementation
5.1 Declarativity
6 Functionality and Scope
6.1 Functionality
6.2 Search Scope
6.2.1 Search within arbitrary
structure
6.2.2 Constructed Structures
6.2.3 Return Arbitrary Nodes
6.2.4 Parts of Search Tree
6.3 Attributes
6.3.1 Search within attributes
6.3.2 Search across attributes and
content
6.4 Markup
6.5 Element Boundaries
6.5.1 Search across element
boundaries
6.5.2 Element as a token
boundary
6.6 Score
6.6.1 Score accessible
6.6.2 Implicit ordering
6.6.3 Score extendable
A References
A.1 Non-Normative
B Change Log
"Full-Text Search" (FTS) is a large field which covers a vast array of functionality. In addition, there are many different ways one could combine FTS capabilities with XQuery and XPath.
This paper describes a set of requirements for FTS in XQuery/XPath (XQuery/XPath Full-Text). At this stage in the life of the document, these requirements should be read as suggestions only: the issues associated with the requirements are to be discussed and resolved by the relevant Working Groups. This format provides a firm basis for the Working Groups to set the direction of the work on XQuery/XPath Full-Text, and to compare existing proposals. Once the issues are resolved and this Requirements document is finalized, it will be easier to define the functionality of XQuery/XPath Full-Text and it's integration with XQuery and/or XPath.
Note that we will attempt to define requirements for the language without reference to any particular solution.
We use the terms MUST, SHOULD and MAY throughout the document to specify the extent to which an item is a requirement for the work of XQuery/XPath Full-Text. We use the same definitions of MUST, SHOULD and MAY as The XML Query Requirements [XML Query Requirements]
[Definition: MUST means that the item is an absolute requirement.]
[Definition: SHOULD means that there may exist valid reasons not to treat this item as a requirement, but the full implications should be understood and the case carefully weighed before discarding this item.]
[Definition: MAY means that an item deserves attention, but further study is needed to determine whether the item should be treated as a requirement.]
When the words MUST, SHOULD, or MAY are used in this technical sense, they occur as a hyperlink to these definitions. These words will also be used with their conventional English meaning, in which case there is no hyperlink. For instance, the phrase "the full implications should be understood" uses the word "should" in its conventional English sense, and therefore occurs without the hyperlink.
Other terminology used in this document:
[Definition: SCORE reflects relevance of matched material.]
[Definition: Full-Text Search in this document is an extension to the XQuery/XPath language. It provides a way to query text which has been tokenized, i.e. broken into a sequence of words, units of punctuation, and spaces. Tokenization enables functions and operators whch work with the relative positioning of words (e.g., proximity operators). Tokenization also enables functions and operators which operate on a part or the root of the word (e.g., wildcards, stemming).]
This section covers requirements for XQuery/XPath Full-Text language design that are independent from, but related to, integration and scoping requirements.
XQuery/XPath Full-Text functions MUST operate on instances of the XQuery/XPath Data Model.
XQuery/XPath Full-Text MUST NOT introduce or rely on side-effects.
XQuery/XPath Full-Text MUST define the type and range of SCORE values. The SCORE SHOULD be a float, in the range 0-1.
XQuery/XPath Full-Text MUST be able to generate a SCORE for a combination of Full-Text predicates.
The algorithm to produce combined SCOREs MUST be vendor-provided.
The algorithm to produce combined SCOREs SHOULD be overridable by users.
Users MUST be able to influence individual components of complex score expressions.
XQuery/XPath Full-Text MUST be extensible by vendors.
XQuery/XPath Full-Text MAY be extensible by users.
The first version of XQuery/XPath Full-Text MUST provide a robust framework for future versions.
It is not a requirement that XQuery/XPath Full Text be designed as an end-user UI language.
It SHOULD be possible to search XQuery/XPath Full-Text queries.
This section specifies requirements for the integration of XQuery/XPath Full-Text with XQuery and XPath.
Part, but not necessarily all, of XQuery/XPath Full-Text MUST be usable as part of an XPath expression..
XQuery/XPath Full-Text SHOULD use the extensibility mechanisms that exist in XQuery and XPath for integration into XQuery and XPath.
XQuery/XPath Full-Text MUST use the extensibility mechanisms that exist in XQuery and XPath for it's own extensibility.
XQuery/XPath Full-Text MUST be composable with XQuery, and SHOULD be composable with itself.
XQuery/XPath Full-Text may have more than one syntax binding. One query language syntax must be convenient for humans to read and write. See XML Query Requirements
XQuery/XPath Full-Text MAY have more than one syntax binding. One query language syntax MUST be expressed in XML in a way that reflects the underlying structure of the query. See XML Query Requirements
This section defines requirements for the functionality in XQuery/XPath Full-Text, and the scope of XQuery/XPath Full-Text queries.
XQuery/XPath Full-Text MUST provide, in the first release, the minimum set of Full-Text functionality that is useful.
single-word search
phrase search
support for stopwords
single character suffix
0 or more character suffix
0 or more character prefix
0 or more character infix
proximity searching (unit: words)
specification of order in proximity searching
combination using AND
combination using OR
combination using NOT
word normalization, diacritics
ranking, relevance
Additional functionality represented in the [XQuery and XPath Full-Text Use Cases] MUST be considered, but may be left to a future release.
Additional functionality from other Full-Text search contexts such as [SQL/MM Full-Text] MUST be considered, but SHOULD be left to a future release.
XQuery/XPath Full-Text MUST allow search within an arbitrary structure (an arbitrary XPath expression).
XQuery/XPath Full-Text MUST NOT preclude Full-Text search within structures constructed during a query.
XQuery/XPath Full-Text MUST allow a query to return arbitrary nodes.
XQuery/XPath Full-Text MUST allow the combination of predicates on different parts of the searched document 'tree'.
XQuery/XPath Full-Text MUST support Full-Text search within attributes.
XQuery/XPath Full-Text MAY support Full-Text search within attributes in conjunction with Full-Text search within element content.
If XQuery/XPath Full-Text supports search within names of elements and attributes, then it MUST distinguish between
element content and attribute values
and
names of elements and attributes
in any search.
XQuery/XPath Full-Text MUST support search across element boundaries, at least for NEAR.
Author | Date | Action | Description |
Stephen Buxton | 2003-03-19 | Added a Change Log | |
Stephen Buxton | 2003-03-19 | Terminology definition changes | Switched the definitions of SHOULD and MAY, to be consistent with [XML Query Requirements]. The rest of the document does not need to change, since the earlier versions of this document, on which the text of the spec is based, referred to the definitions in [XML Query Requirements]. |
Stephen Buxton | 2003-04-18 | Change XML Query Requirements link to external URI | Changed links in the document body to point to external latest copy of XML Query Requirements. |