W3C P3 Vocabulary Working Draft

This is a W3C Working Draft for review by W3C members and other interested parties. It is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current W3C working drafts can be found at: http://www.w3.org/TR/

This document represents a work in progress. It is not intended to be advanced toward W3C recommendation status, but rather it will be used, along with the P3P Architecture Working Draft, as a basis for developing the Protocols and Data Transport Working Group's deliverable of a specification, fully specifying the conversational framework for user-agent/service interaction. It is strongly recommended that only experimental software be implemented to this specification. The Platform for Privacy Preferences Project will not allow early implementations to affect their ability to make changes to the framework described in this document.

Comments on this working draft should be sent to the P3P Project Manager, Philip DesAutels

Purpose

The W3C Platform for Privacy Preference Project Vocabulary Working Group presents this basic model for the P3P privacy conversation between a user agent and a service.

Definitions

Access	For P3P purposes, a clause that expresses the ability of users to obtain and correct information that an entity has collected about them. A vocabulary may define various degrees of access.
Agreement	A statement that a service and a user agent have agreed to abide by.
Clause	The "parts of speech" from which P3P statements are constructed.
Credentials	Signed statements of authorization, identification, or practice (e.g. certificates granting authority or identity, or signed metadata). These credentials may be presented or be requested by either the user agent or the service. Credentials need not be accompanied by digital signatures.
Data category	A quality of a data element or class that may be used by a trust engine to determine what type of element is under discussion (for example anonymous demographics or personal contact information).
Data class	A grouping of data elements such as mailing address (which includes, e.g., name, street address, city, state, and country).
Data element	A single data entity such as last name or phone number.
Grammar (P3P)	In P3P, the grammar defines the structure of P3P clauses used to make a valid P3P statement. The grammar are the rules for properly ordering clauses. The following example structures clauses (in caps) to make a simple privacy practice statement: for (these URIs) the following (practices) apply to this (set of data) Also see grammar at Princeton's Free On-line Dictionary of Computing.
P3P data repository	A mechanism for storing data under the control of P3P preferences over a period of time. These data might include personal data.
Permissions	Permissions are constraints which, when evaluated, allow or prevent access or modification of data classes or elements within the data repository of a user agent. Permissions are set by a service and evaluated along with a user's preferences. Thus permissions serve only to restrict actions permitted by a user's preferences; they never allow actions that the user's preferences would otherwise prohibit.
Persona	A persona is an image of a user that he or she presents in a particular situation. The user's preferences and P3P data may combine to form a persona. A user may have multiple personae. The choice of which persona to present to a service may be based upon the service's purpose (e.g., business, gaming, home, etc.), credentials (e.g., level of associated trust), consequences and practices (e.g., personalization, shipping, mailing list), or any user defined rationale (e.g., time of day, phase of moon, etc).
Policies	The collection of all user defined preferences, including, but not limited to, P3P preferences.
Practice	A P3P clause that describes what a service plans to do with data.
Preference	A rule, or set of rules, that determines what action(s) a user agent will take or allow when involved in a conversation or negotiation with a service. A preference might be expressed as a formally defined computable statement; e.g., a PICSRules rule. In this document, preferences govern the types of agreements that can be reached between a user agent and a service. Within this document, "preferences" are assumed to be P3P preferences.
Proposal	A series of statements. A proposal is used when a user agent and a service are negotiating to form an agreement.
Request	A message in which a service asks a user agent to transmit (read request) or store (write request) a data element or set of data elements.
Result set	The user's data sent to the service by the user agent.
Service	A program, for P3P purposes, requesting data from, or providing data to, a user agent. By this definition, a service may be a server, a local application, a piece of locally active code, such as an ActiveX control or Java applet, or even another user agent.
Statement	A description of what data a service will request, what the service will do with it, and the consequence to the user. P3P statements are composed of clauses, as specified by the P3P grammar.
Trust Engine	A mechanism for evaluating credentials and policies to make a decision. For P3P purposes, the trust engine evaluates P3P proposals and requests, user preferences, and possibly other credentials.
User	An individual or group of individuals acting as a single entity. For the purposes of this document, the user is further qualified as an entity for which personal data exists and/or can be collected.
User agent	A program that acts on a user's behalf. The agent may act on preferences (rules) for a broad range of purposes, such as content filtering, trust decisions, or privacy. For P3P purposes, a user agent acts on a user's privacy preferences. Users may use different user agents at different times.
Vocabulary (schema)	The defined set of words or statements that are allowable in a clause. For instance, a vocabulary may define the practice clause to be one of the two values: 'for system administration', 'for research'.

Data Design

In this section we present a data design model for expressing and referencing data elements, classes, and categories.

Data Elements and Data Classes

A data element is a single data entity such as a last name or phone number. A data class is a named set of data elements and/or other data classes. Data classes inherit the data elements from all classes they contain. While most data elements correspond to a specific piece of information that can be stored as a text string, some data elements represent streams of data such as a user's click stream.

There is a base set of well known P3P data classes (to be defined by a future working group). Neither user nor service can change a base data class. This restriction allows for a common understanding between user agent and service that can simplify the negotiation process. However, new classes may be introduced by services. A future P3P working group will recommend a RDF application for defining data elements and data classes.

In an attempt to minimize the problem of duplicate information and to allow for the advantages of standardization, the following naming conventions are proposed:

While these standards will aid in creating some consistency between services and user agents, the possibility for duplication of information will still arise when there are requests for the same information using unknown names (e.g., Size_Of_Shoe). In some cases, this data may already exist under a different heading (i.e., Shoe_Size). In other cases, there is no duplication, just a new request. The user agent implementation may provide the user with assistance in determining whether to use an already stored value, create a new data element, or create a name space replication containing a different value.

A result set is a set of data elements sent by a user agent to the service as a result of a request. Result sets contain traditional value pairs wherein one half of the pair describes the value and the other is the value itself. P3P data repositories may store data in a similar manner, however this will be implementation dependent.

User agents may also provide users with the ability to create their own groupings of data elements, for example groups of elements over which the user has similar preferences. However, this is an implementation detail left up to each implementor.

Data Categories

A data category is a quality of a data element or class. The data category is a hint to the agent regarding the type or sensitivity of a data element that is unrecognized by the agent. For instance, an agent encountering the previously unknown data element shoe_size can see that the data element has been categorized by its creator as demographic data. This can then simplify the user interface and the resulting user experience. User agents may allow users to express preferences about individual data elements, data classes and data categories. When a service proposes to request a data element, the agent can check whether the user has expressed a preference about that particular element. If so, the agent would follow that preference, otherwise the agent can check the user's preferences over the category to which that element belongs. Thus categories can be used to reduce the number of choices users must make while browsing.

Designers of new data class schemas should use data categories to give hints to the user or the user agent about the characteristics of the data. However, users should have the option to over-ride or distrust the hints provided by the service.

A data class or element can be described by multiple categories. A data class should be inherit at least the categories of the contained elements. So for instance, if a data class contains demographic and contact information elements, the class should be described as such. The grouping of elements into a class may also deserve an additional categorization. For instance, a first name or last name alone are not identifiable information, but together they may be.

[The necessity of the data categories was considered to be questionable among some members of the working group. However, others felt strongly that data categories would be important for maintaining a seamless user experience.]

Grammatical Model

This section describes the grammatical model for proposals and requests. P3P will define the syntax for a proposal, other parties will describe specific vocabularies for use in P3P proposals. The Harmonized Vocabulary Working Group will document the issues related to the development of a uniform vocabulary and may recommend a single vocabulary if such a vocabulary can be agreed to by the working group. All statements made by the user agent or service as part of the negotiation are to be defined by the Protocols and Data Transport Working Group; we expect the results of this working group to closely follow the grammatical model proposed here.

Proposals

Services make proposals to user agents in which they specify one or more sets of terms under which they are willing to grant the user access. Each set of terms is expressed as a statement.

Proposals must include a schema, one or more statements, and optionaly any of the following statement clauses applied globally to the entire proposal: experience space, contact, agreement with, access, consequence, qualifier, and signature. Each of these clauses is described in detail below.

General Form and Pseudocode Form of Grammatical Model

Statements are composed of the following mandatory clauses: experience space, practice, qualified data set. Statements might also include the following optional clauses: qualifier, access, consequence, agreement_with, contact. These clauses are further specified below. The proposed protocols and negotiation working group will also determine any additional syntax that might be needed for combining statements into proposals.

This grammar specifies the ordering and structure of the clauses. However, some clauses may be further specified. The table below provides a brief description of each clause. The clauses are further explained in following the table. The rest of the document specifies further details about each of the clauses.

Grammar Clauses Described

The following table briefly describes the clauses specified by the grammar. The following text are descriptions of the column headings.

Clause	Description	Applies to	Required in Every Proposal	Default Type of Label	May be Defined in Vocabulary
Access	The ability of users to obtain and correct information that an entity has collected about them	proposal or statement	yes	URI or text string	yes
Agreement With	The entity who the user is entering into an agreement with	proposal or statement	yes	certificate, URI, or text string	yes
Consequence	The impact on the user of reaching an agreement	proposal or statement	yes	URI or text string	yes
Contact	Information for contacting a service with inquiries about the service’s privacy practices.	proposal or statement	yes	URI or text string	yes
Experience Space	The space where a particular statement is valid	proposal or statement	yes	set of URIs	no
Practice	What a service will do with data	statement	yes	none	yes
Qualified Data Set	The data a service proposes to collect or requests	statement	yes	see definition below	no, but qualifier and category components may be
Qualifier	Used by the creator of a vocabulary to provide extra functionality beyond that in the base P3 Proposal Grammar	proposal, statement, or clause	no	none	yes
Required	A binary value indicating whether a particular practice, qualified data set, or statement is required in an agreement	practice clause, qualified data set clause, or statement	no (defaults to 1 when not present)	0 or 1	no
Schema	A URI that identifies a particular P3 proposal vocabulary	proposal	yes	URI	no
Signature	Signature and attribution information as defined by the Digital Signature effort	Proposal or statement	no	none	yes

Grammar Clauses Specified

The P3P clauses are described in more detail below. Please note that much of the specification is incomplete. Also note that some of the working group members expressed concern about the complexity of the grammar and suggested that the non-essential clauses be eliminated. In particular, questions were raised about the necessity of the qualifier and required clauses, and the categories and permissions within the qualified data sets. The majority of the group members felt that while these clauses are probably not essential, they are sufficiently useful that they should be retained in the grammar.

Access

An access clause expresses the ability of users to obtain and correct information that an entity has collected about them. A vocabulary may define various degrees of access, for example 'view' and 'correct'. Access clauses should include a label that contains or references specific information about how to obtain access. They either take the form of a URL which can be dereferenced to provide the information or a text string which provides the information directly.

Agreement With

This is the entity with whom the user is entering into the agreement with. It can either be expressed as a certificate identifying the entity, a URI which when dereferenced identifies an entity, or text identifying an entity. There must be at least one agreement clause in a proposal. The resulting agreement is between the superset of all parties named in the agreement clause and the user.

When these clauses contain certificates they take the form "schema","data". The schema defines the data that will follow it. For example, an agreement clause indicating the party identified by the certificate "W3CCert" would look like:

where "http://www.w3.org/DSig/x509v3/" indicates the schema of the data which follows, in this case an X509v3 certificate as defined at http://www.w3.org/DSig/x509v3/ and "W3CCert" is the identity certificate for W3C.

The party specified in the agreement with clause should not be confused with any of the other parties who may be mentioned in a proposal, for example, parties associated with the experience space or parties associated with a signature.

Consequence

Consequence clauses are labels that provide information about the impact on the user of reaching an agreement. Consequence clauses may take the form of a URI which can be dereferenced to provide the information, a text string which provides the information directly, or labels enumerated by a vocabulary. An example consequence would be a value added service such as customized information, or a coupon, or rebate.

Contact

Contact clauses are labels that provide information for contacting a service with inquiries about the service's privacy practices. Contact clauses either takes the form of a URL which can be dereferenced to provide the information or a text string which provides the information directly. A vocabulary may place further restrictions on a contact clause.

Experience Space

An experience space identifies the space where a particular statement is valid, not necessarily with whom the user is having the interaction with. This experience space is identified by a set of URIs as defined in RFC 1738. This set is made up of included and excluded URIs. For example, the service w3.org may wish to make a statement about its entire experience space (http://www.w3.org) minus its user registration area (http://www.w3.org/registration) and minus its mailing list area (http://www.w3.org/maillist). In pseudo-RDF:

The experience space indicates to the user agent (and user) where the stated practice is in effect. The method for determining and informing the user about which practice is in effect as they browse the Web is very important but implementation dependent. For instance, a smart agent would be able to remember for which set of URIs a given agreement applies. A dumb agent with no memory may have to ask for a new agreement as it encounters each new URI. However, under no circumstances should the user be misled that their privacy preferences are being acted upon when this is not the case. Consequently, framed content or included GIFs from outside an experience space may require their own practice statements.

A service can state a general practice for its entire experience space that excludes any pages within that space that have different practices. Upon encountering such a page, a new agreement must be reached.

Practice

A practice clause describes what a service will do with data. The practice clause, as defined in the vocabulary schema, is mandatory in every statement. It is applied to one or more qualified data sets.

Example: A vocabulary might define practices such as 'used to complete the transaction', 'used to customize content', or 'disclosed for marketing purposes'.

Required

The Required clause is a binary value that indicates whether a particular practice, qualified data set, or statement is required in an agreement. When practice clauses, qualified data sets clauses, and statements are not modified by a required clause, they are assumed to be required by default (required = 1).

Note, the required clause is useful mostly as a short-hand or macro. Without this clause, a service that was willing to enter into one of many different agreements with a user would have to enumerate all the agreements. With this clause, such a service can indicate all optional elements of each agreement with a required = 0.

Qualified Data Set

A qualified data set identifies the data that is being referenced in a statement. It consists of :

Permissions

Permissions are constraints which, when evaluated, allow or prevent access or modification of data classes or elements within the data repository of a user agent. Permissions are set by a service and evaluated along with a user's preferences. Thus permissions serve only to restrict actions permitted by a user's preferences; they never allow actions that the user's preferences would otherwise prohibit.

Permissions may only be set or changed as the result of an agreement between a user agent and a service, and changes may only be made when they do not conflict with previously set permissions.

The action specifies whether the rule is for read-only access or read/write access. (Read/write access is permission for a service to store data in a user's P3P data repository.)

The restriction is a statement in the P3P preference interchange language (to be specified by a future working group) that restricts access to the data element. For example, the restriction might prohibit services other than the one that wrote a data element from reading it.

The working group did not reach an agreement on whether the restriction syntax is necessary, and is thus not recommending one syntax over the other at this time. Arguments in favor of the restriction syntax focussed on greater flexibility, and speculation that this syntax would facilitate and encourage services' storing unencrypted data in the user's data repository (something that was viewed as desirable from a privacy perspective). Arguments against the restriction syntax focussed on added complexity, and speculation that services would be unlikely to trust user agents to enforce a service's policy and would thus encrypt any proprietary data stored in a user's data repository any way. It is unclear at this point how much complexity this syntax would add to a P3P implementation or whether it has other useful applications.

A qualifier clause is used by the creator of a vocabulary to provide extra functionality beyond that in the base P3P Proposal Grammar.

Example: Vocabulary creators might wish to include a qualifier clause that would allow assertions about the length of time a particular piece of data will be retained.

Schema

The schema clause consists of a URI that identifies a particular P3P proposal vocabulary. A single schema applies to an entire proposal. A schema must be identified for a proposal.

Signature

This is the signature and attribution information as defined by the W3C Digital Signature effort. A signature on a proposal indicates that the signer believes the statements in the proposal are true.

Requests

After an agreement is reached between a service and user agent, the service may request that the user agent transmit a data element. All requests must reference a particular agreement. Requests should only be granted by a user agent if they are consistent with the referenced agreement (for example, if there is an agreement that the user agent will provide only the user's zip code and the user agent requests the user's phone number, the request should be denied). A future working group is expected to determine the syntax for requests. We expect requests to include a reference to an agreement, a list of data elements requested, reference to a transport protocol, and reference to a mechanism for prompting the user for that information.

Sample Vocabulary and Recommended Data Set

There are two vocabularies which can be defined within the P3P framework. The first is the data category vocabulary, and the second is the proposal vocabulary. A proposal vocabulary may define one or more of the clauses used in proposals (such as privacy practices, access, schema, qualifier, etc.). The data category vocabulary defines the list of data categories that describes data elements.

A future working group will recommend a detailed specification for the RDF syntax of these vocabularies.

Recommended Data Set

It is proposed that there should be a certain number of base data elements provided with every privacy user agent. These base data elements may closely follow those commonly in use today by such mechanisms as VCard. However, there may be data elements contained within the VCard schema that are considered "rarely used", such as home pager, and should probably not be required as part of the base set.

Further, the base set should probably contain data elements not found within the VCard schema, such as some anonymous demographics. These data elements represent frequently asked marketing questions that do not require personally identifiable information.

Note: When designing the base schema it should be kept in mind that either the user agent or the service may expand the data elements hosted on the user’s machine. This implies that the number of base elements should be kept to a reasonable minimum rather than attempting to "guess" what elements both user and service would want.

Other Recommendations

Preference specification

It is up to user agent implementors to determine the level of detail at which end users will be able to specify their preferences. However, we recommend that user agents allow users to express preferences over all types of clauses contained in the P3P grammar.

WD-P3P-grammar-971014

P3P Vocabulary Working Group

Grammatical Model and Data Design Model

W3C Working Draft 14-October-97

Status of This Document