Copyright © 2011 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
The need to share data with collaborators motivates custodians and users of relational databases (RDB) to expose relational data on the Web of Data. This document defines a direct mapping from relational data to RDF. This definition provides extension points for refinements within and outside of this document.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is a Public Working Draft of the "A Direct Mapping of Relational Data to RDF" for review by W3C members and other interested parties.
This document was developed by the W3C RDB2RDF Working Group. The Working Group expects to advance this Working Draft to Recommendation Status. A complete list of changes to this document is available.
Comments on this document should be sent to public-rdb2rdf-comments@w3.org, a mailing list with a public archive.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
1 Introduction
2 Direct Mapping Description (Informative)
2.1 Direct Mapping Example
2.2 Preliminaries: Generating IRIs
2.2.1 IRIs generated for the initial example
2.3 Mapping Rules
2.3.1 Triples generated for the example in Section Direct Mapping Example
2.4 Additional Examples and Corner Cases
2.4.1 Foreign keys referencing candidate keys
2.4.2 Multi-column keys
2.4.3 Empty (non-existent) primary keys
2.4.4 Referencing tables with empty primary keys
2.5 Hierarchical Tables
3 Direct Mapping Definition
3.1 Notations
3.2 Relational Data Model
3.2.1 RDB Abstract Data Type (Normative)
3.2.2 RDB accessor functions (Normative)
3.3 RDF Data Model (Non-normative)
3.4 Denotational semantics (Normative)
4 Direct Mapping as Rules (Normative)
4.1 Generating Table Triples
4.1.1 Table has a primary key
4.1.2 Table does not have a primary key
4.2 Generating Literal Triples
4.2.1 Table has a primary key
4.2.2 Table does not have a primary key
4.3 Generating Reference Triples
4.3.1 Table r1 has a primary key and table r2 has a primary key
4.3.2 Table r1 has a primary key and table r2 does not have a primary key
4.3.3 Table r1 does not have primary key and table r2 has a primary key
4.3.4 Table r1 does not have primary key and table r2 does not have a primary key
5 References
Relational databases proliferate both because of their efficiency and their precise definitions, allowing for tools like SQL [SQLFN] to manipulate and examine the contents predictably and efficiently. Resource Description Framework (RDF) [RDF-concepts] is a data format based on a web-scalable architecture for identification and interpretation of terms. This document defines a mapping from relational representation to an RDF representation.
Strategies for mapping relational data to RDF abound. The direct mapping defines a simple transformation, providing a basis for defining and comparing more intricate transformations. This document includes an informal and a formal description of the transformation.
The Direct Mapping is intended to provide a default behavior for R2RML: RDB to RDF Mapping Language. It can be also used to materialize RDF graphs or define virtual graphs, which can be queried by SPARQL or traversed by an RDF graph API.
The direct mapping defines an RDF Graph [RDF-concepts] representation of the data in any relational database. The direct mapping takes as input a relational database (data and schema), and generates an RDF graph that is called the direct graph. This graph is composed of relative IRIs that may be resolved against a base IRI per [RFC3987]. Foreign keys in relational databases establish a named reference from any row in a table to exactly one row in a (potentially different) table. The direct graph conveys these references, as well as each value in the rows.
The concepts in direct mapping can be introduced with an example RDF graph produced by a relational database. Following is SQL (DDL) to create a simple example with two tables with single-column primary keys and one foreign key reference between them:
CREATE TABLE Addresses ( ID INT, city CHAR(10), state CHAR(2), PRIMARY KEY(ID) ) CREATE TABLE People ( ID INT, fname CHAR(10), addr INT, PRIMARY KEY(ID), FOREIGN KEY(addr) REFERENCES Addresses(ID) ) INSERT INTO Addresses (ID, city, state) VALUES (18, "Cambridge", "MA") INSERT INTO People (ID, fname, addr) VALUES (7, "Bob", 18) INSERT INTO People (ID, fname, addr) VALUES (8, "Sue", NULL)
HTML tables will be used in this document to convey SQL tables.
The primary key of these tables will be marked with the PK
class to convey an SQL primary key such as ID
in CREATE TABLE Addresses (ID INT, ... PRIMARY KEY(ID))
.
Foreign keys will be illustrated with a notation like "→ Address(ID)
" to convey an SQL foreign key such as CREATE TABLE People (... addr INT, FOREIGN KEY(addr) REFERENCES Addresses(ID))
.
PK | → Address(ID) | |
---|---|---|
ID | fname | addr |
7 | Bob | 18 |
8 | Sue | NULL |
PK | ||
---|---|---|
ID | city | state |
18 | Cambridge | MA |
Given a base IRI http://foo.example/DB/
, the direct
mapping of this database produces a direct graph:
@base <http://foo.example/DB/> @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <People/ID=7> <rdf:type> <People> . <People/ID=7> <People#ID> 7 . <People/ID=7> <People#fname> "Bob" . <People/ID=7> <People#addr> <Addresses/ID=18> . <People/ID=8> <rdf:type> <People> . <People/ID=8> <People#ID> 8 . <People/ID=8> <People#fname> "Sue" . <Addresses/ID=18> <rdf:type> <Addresses> . <Addresses/ID=18> <Addresses#ID> 18 . <Addresses/ID=18> <Addresses#city> "Cambridge" . <Addresses/ID=18> <Addresses#state> "MA" .
In this expression, each row, e.g. (7, "Bob", 18)
,
produces a set of triples with a common subject.
The subject is an IRI formed from the concatenation of the base IRI, table
name (People
), primary key column name (ID
)
and primary key value (7
).
The predicate for each column is an IRI formed from the concatenation
of the base IRI, table name and the column name.
The values are either RDF literals formed from the lexical form of the
column value, or, in the case of foreign keys, row identifiers
(<Addresses/ID=18>
).
Note that these reference row identifiers must coincide with the
subject used for the triples generated from the referenced row.
In the process of translating relational data into RDF, the direct
mapping must create IRIs
for identifying tables, the columns in a table, and each row in a
table. In this section, we assume that
http://foo.example/DB
is the the base IRI. All the
examples in this section will contain relative IRIs which are to be
understood as relative to this base IRI.
The following are the IRIs that need to be generated:
base_IRI
is the base IRI and table_name
is
the table name, then base_IRI/table_name
is the Table IRI for
the table.
base_IRI
is the base IRI,
table_name
is the table name and column_name
is the column name, then base_IRI/table_name#column_name
is the Column IRI for the column.
base_IRI
is the base IRI,
table_name
is the table name and column_name_1
,
column_name_2
, ..., column_name_k
is a
sequence of k columns (k > 1), then
base_IRI/table_name#column_name_1,column_name_2,...,column_name_k
is the Column IRI for the columns.
base_IRI
is the base IRI, table_name
is the
table name, column_name
is the column name of the primary key and
value
is the value of the row in that column, then
base_IRI/table_name/column_name=value
is the Row RDF Node (or Row IRI) for the row.
base_IRI
is the
base IRI, table_name
is the table name,
column_name_1
, column_name_2
, ...,
column_name_k
is the sequence of k columns (k > 1) that
constitute the primary key, and value_1
,
value_2
, ..., value_k
is the sequence of
values of the columns that constitute the primary key of the row, then
base_IRI/table_name/column_name_1=value_1,column_name_2=value_2,...,column_name_k=value_k
is the Row RDF Node (or Row IRI) for the row.
GET
on a row identifier to retrieve a small resource (i.e. not all rows from the same table) and distinguish between the retrieved resource People/ID=7
and the row People/ID=7
. The "slash" alternative would offer a direct graph with identifiers like People/ID=7
but would demand the server respond to GET /People/ID=7
with a 303 redirect to some other resource.Resolution:
None recorded.
Given the base IRI http://foo.example/DB/
, the following are some of the IRIs that are used when translating into RDF the relational data given in the initial example:
For the table People, the following IRIs are considered in the translation process:
Table IRI:
<People>
Column IRIs:
<People#ID> <People#fname> <People#addr>
Row IRIs:
<People/ID=7> <People/ID=8>
For the table Addresses, the following IRIs are considered in the translation process:
Table IRI:
<Addresses>
Columns IRIs:
<Addresses#ID> <Addresses#city> <Addresses#state>
Row IRI:
<Addresses/ID=18>
Each row in the database produces a set of RDF triples with a subject, predicate, and object composed as follows:
Issue (primary-is-candidate-key):
Should the following exception be included in the definition of the direct mapping?
Primary-is-Candidate-Key Exception: If the primary key is also a candidate key K to table R:
Resolution:
None recorded.
Next we show how the 11 triples in the example of Section Direct Mapping Example are classified into the above categories:
Triples generated from table People:
<People/ID=7> <rdf:type> <People> . <People/ID=8> <rdf:type> <People> .
<People/ID=7> <People#ID> 7 . <People/ID=7> <People#fname> "Bob" . <People/ID=8> <People#ID> 8 . <People/ID=8> <People#fname> "Sue" .
<People/ID=7> <People#addr> <Addresses/ID=18> .
Triples generated from table Addresses:
<Addresses/ID=18> <rdf:type> <Addresses> .
<Addresses/ID=18> <Addresses#ID> 18 . <Addresses/ID=18> <Addresses#city> "Cambridge" . <Addresses/ID=18> <Addresses#state> "MA" .
More complex schemas include compound and composite primary keys. In this example, the columns deptName and deptCity in the People table reference name and city in the Department table. The following is the schema of the augmented database:
CREATE TABLE Addresses ( ID INT, city CHAR(10), state CHAR(2), PRIMARY KEY(ID) ) CREATE TABLE Deparment ( ID INT, name CHAR(10), city CHAR(10), manager INT, PRIMARY KEY(ID), UNIQUE (name, city), FOREIGN KEY(manager) REFERENCES People(ID) ) CREATE TABLE People ( ID INT, fname CHAR(10), addr INT, deptName CHAR(10), deptCity CHAR(10), PRIMARY KEY(ID), FOREIGN KEY(addr) REFERENCES Addresses(ID), FOREIGN KEY(deptName, deptCity) REFERENCES Department(name, city) )
The following is an instance of the augmented relational schema:
PK | → Addresses(ID) | → Department(name, city) | ||
---|---|---|---|---|
ID | fname | addr | deptName | deptCity |
7 | Bob | 18 | accounting | Cambridge |
8 | Sue | NULL | NULL | NULL |
PK | ||
---|---|---|
ID | city | state |
18 | Cambridge | MA |
PK | Unique Key | → People(ID) | |
---|---|---|---|
ID | name | city | manager |
23 | accounting | Cambridge | 8 |
Per the People tables's compound foreign key to Department:
deptName="accounting"
and deptCity="Cambridge"
references a row in Department with a primary key of ID=23
.
deptName,deptCity
", reflecting the order of the column names in the foreign key.
ID=23
".
In this example, the direct mapping generates the following triples:
@base <http://foo.example/DB/> @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <People/ID=7> <rdf:type> <People> . <People/ID=7> <People#ID> 7 . <People/ID=7> <People#fname> "Bob" . <People/ID=7> <People#addr> <Addresses/ID=18> . <People/ID=7> <People#deptName> "accounting" . <People/ID=7> <People#deptCity> "Cambridge" . <People/ID=7> <People#deptName,deptCity> <Department/ID=23> . <People/ID=8> <rdf:type> <People> . <People/ID=8> <People#ID> 8 . <People/ID=8> <People#fname> "Sue" . <Addresses/ID=18> <rdf:type> <Addresses> . <Addresses/ID=18> <Addresses#ID> 18 . <Addresses/ID=18> <Addresses#city> "Cambridge" . <Addresses/ID=18> <Addresses#state> "MA" . <Department/ID=23> <rdf:type> <Department> . <Department/ID=23> <Department#ID> 23 . <Department/ID=23> <Department#name> "accounting" . <Department/ID=23> <Department#city> "Cambridge" . <Department/ID=23> <Department#manager> <People#ID=8> .
The green triples above are generated by considering the new elements in the augmented database. It should be noticed that:
Although deptName is an attribute of table People that is part of a foreign key, the Literal
Triple <People/ID=7> <People#deptName> "accounting"
is generated by the direct mapping because
deptName is not the sole column of a foreign key of table People.
The Reference Triple <People/ID=7> <People#deptName,deptCity> <Department/ID=23>
is generated by considering a foreign key
referencing a candidate key (instead of the primary key):
(deptName, deptCity) is a multi-column foreign key in the table
People which references the multi-column candidate key
(name, city) in the table Department.
We note that primary keys may also be
composite. For example, if the primary key for Department were
(name, city) instead of ID in the
example in Section Foreign keys referencing candidate keys, then the identifier for the only row in this table would be
<Department/name=accounting,city=Cambridge>
,
and the following triples would have been generated by the direct
mapping:
<Department/name=accounting,city=Cambridge> <rdf:type> <Department> . <Department/name=accounting,city=Cambridge> <Department#ID> 23 . <Department/name=accounting,city=Cambridge> <Department#name> "accounting" . <Department/name=accounting,city=Cambridge> <Department#city> "Cambridge" .
Even if there is no primary key, rows generate a set of triples with a shared subject, but that subject is a blank node. For instance, assume that the following table is added to the schema of the example in Section Foreign keys referencing candidate keys (for keeping track of tweets in Twitter):
CREATE TABLE Tweets ( tweeter INT, when TIMESTAMP, text CHAR(140), FOREIGN KEY(tweeter) REFERENCES People(ID) )
The following is an instance of table Tweets:
→ People(ID) | ||
---|---|---|
tweeter | when | text |
7 | 2010-08-30T01:33 | I really like lolcats. |
7 | 2010-08-30T09:01 | I take it back. |
Given that table Tweets does not have a primary key, each row in this table is identified by a Blank Node. In fact, when translating the above table the direct mapping generates the following triples:
@base <http://foo.example/DB/> @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . _:a <rdf:type> <Tweets> . _:a <Tweets#tweeter> <People/ID=7> . _:a <Tweets#when> "2010-08-30T01:33"^^xsd:dateTime . _:a <Tweets#text> "I really like lolcats." . _:b <rdf:type> <Tweets> . _:b <Tweets#tweeter> <People/ID=7> . _:b <Tweets#when> "2010-08-30T09:01"^^xsd:dateTime . _:b <Tweets#text> "I take it back." .
It is not possible to dereference blank nodes ("_:a" and "_:b" above). Queries or updates may be made to these nodes via SPARQL queries.
Rows in tables with no primary key may still be referenced by foreign keys. (Relational database theory tells us that these rows must be unique as foreign keys reference candidate keys and candidate keys are unique across all the rows in a table.) References to rows in tables with no primary key are expressed as RDF triples with blank nodes for objects, where that blank node is the same node used for the subject in the referenced row.
This example includes several foreign keys with mutual column names. For clarity; here is the DDL to clarify these keys:
CREATE TABLE Projects ( lead INT, FOREIGN KEY (lead) REFERENCES People(ID), name VARCHAR(50), UNIQUE (lead, name), deptName VARCHAR(50), deptCity VARCHAR(50), UNIQUE (name, deptName, deptCity), FOREIGN KEY (deptName, deptCity) REFERENCES Department(name, city) ) CREATE TABLE TaskAssignments ( worker INT, FOREIGN KEY (worker) REFERENCES People(ID), project VARCHAR(50), PRIMARY KEY (worker, project), deptName VARCHAR(50), deptCity VARCHAR(50), FOREIGN KEY (worker) REFERENCES People(ID), FOREIGN KEY (project, deptName, deptCity) REFERENCES Projects(name, deptName, deptCity), FOREIGN KEY (deptName, deptCity) REFERENCES Department(name, city) )
The following is an instance of the preceding schema:
Unique key | |||
---|---|---|---|
Unique key | |||
→ People(ID) | → Department(name, city) | ||
lead | name | deptName | deptCity |
8 | pencil survey | accounting | Cambridge |
8 | eraser survey | accounting | Cambridge |
PK | |||
---|---|---|---|
→ Projects(name, deptName, deptCity) | |||
→ People(ID) | → Departments(name, city) | ||
worker | project | deptName | deptCity |
7 | pencil survey | accounting | Cambridge |
In this case, the direct mapping generates the following triples from the preceding tables:
@base <http://foo.example/DB/> @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix pencil: <http://foo.example/DB/TaskAssignment/worker=7,project=pencil+survey> . _:c <rdf:type> <Projects> . _:c <Projects#lead> <People/ID=8> . _:c <Projects#name> "pencil survey" . _:c <Projects#deptName> "accounting" . _:c <Projects#deptCity> "Cambridge" . _:c <Projects#deptName,deptCity> <Department/ID=23> . _:d <rdf:type> <Projects> . _:d <Projects#lead> <People/ID=8> . _:d <Projects#name> "eraser survey" . _:d <Projects#deptName> "accounting" . _:d <Projects#deptCity> "Cambridge" . _:d <Projects#deptName,deptCity> <Department/ID=23> . pencil:_ <rdf:type> <TaskAssignments> . pencil:_ <TaskAssignments#worker> <People/ID=7> . pencil:_ <TaskAssignments#project> "pencil survey" . pencil:_ <TaskAssignments#deptName> "accounting" . pencil:_ <TaskAssignments#deptCity> "Cambridge" . pencil:_ <TaskAssignments#deptName,deptCity> <Department/ID=23> . pencil:_ <TaskAssignments#project,deptName,deptCity> _:c .
The absence of a primary key forces the generation of blank nodes, but does not change the structure of the direct graph or names of the predicates in that graph.
It is common to express specializations of some concept as multiple tables sharing a common primary key. In such cases, the primary keys of the inherited tables are in turn foreign keys to the table from which they derive.
PK | ||
---|---|---|
ID | city | state |
18 | Cambridge | MA |
PK | ||
---|---|---|
→ Addresses(ID) | ||
ID | building | ofcNumber |
18 | 32 | G528 |
PK | |
---|---|
→ Offices(ID) | |
ID | desk |
18 | oak |
In this example, Offices are a specialization of Addresses and ExecutiveOffices are a specialization of Offices. The subjects for the triples implied by rows in Offices or ExecutiveOffices are the same as those for the corresponding row in Addresses.
@base <http://foo.example/DB/> @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <Addresses/ID=18> <Addresses#ID> 18 . <Addresses/ID=18> <Addresses#city> "Cambridge" . <Addresses/ID=18> <Addresses#state> "MA" . <Addresses/ID=18> <Offices#ID> 18 . <Addresses/ID=18> <Offices#building> 32 . <Addresses/ID=18> <Offices#ofcNumber> "G528" . <Addresses/ID=18> <ExecutiveOffices#ID> 18 . <Addresses/ID=18> <ExecutiveOffices#desk> "oak" .
The Primary-is-foreign Key Exception allows the generation of a triple with an RDF literal for the ID column in the Offices and ExecutiveOffices table (Offices.ID=18
and ExecutiveOffices.ID=18
).
Resolution:
None recorded.
Issue (many-to-many-as-repeated-properties):
The direct graph is arguably more faithful to the conceptual model if it reflects e.g. a person with multiple addresses (some many-to-many Person2Address table) as repeated properties. It is difficult to detect which tables with exactly two foreign keys and no other attributes are many-to-many. As a counter example, a Wedding table may have exactly two spouses but it's still not a many-to-many relation in most places.Resolution:
None recorded.
Resolution:
None recorded.
The RDB and RDF data models make use of the commonly defined Abstract Data Types Set, List and MultiSet, used here as type constructors. For example, Set(A)
denotes the type for the sets of elements of type A
. We assume that they come with their common operations, such as the function size : Set → Int
.
The definitions follow a type-as-specification approach, thus the models are based on dependent types. For example, { s:Set(A) | size(s) ≤ 1 }
is a type denoting the sets for elements of type A, such that those sets have at most one element.
The denotational RDF semantics makes use of the set-builder notation for building the RDF sets.
[1] | Database | ::= | Set(Table) |
A relational database is a set of tables. | |||
[2] | Table | ::= | (TableName, Header, List(CandidateKey), Set(ForeignKey), Body) |
A relation has
| |||
[3] | Header | ::= | Set((ColumnName, Datatype)) |
A header is an associative array mapping each column to a SQL datatype. | |||
[4] | Body | ::= | MultiSet(Row) |
A body is a set of (potentially duplicate) rows. | |||
[5] | Row | ::= | Set((ColumnName, CellValue)) |
A row is a associative array mapping each column in a row to a value. | |||
[6] | CellValue | ::= | LexicalValue | NULL |
A cell value is either a lexical value or NULL, denoting the absence of value. | |||
[7] | ForeignKey | ::= | (List(ColumnName), Table, CandidateKey) |
A foreign key relies on a list of columns (their order matters) and points to a candidate key into another table. | |||
[8] | CandidateKey | ::= | List(ColumnName) |
A candidate key is made of a list of columns (their order matters). | |||
[9] | Datatype | ::= | Int | Float | Date | … |
A datatype is a common SQL datatype. | |||
[10] | TableName | ::= | String |
A table name is a string. | |||
[11] | ColumnName | ::= | String |
A column name is a string. |
Per RDF Concepts and Abstract Syntax, an RDF graph is a set of triples of a subject, predicate and object. The subject may be an IRI or a blank node, the predicate must be an IRI and the object may be an IRI, blank node, or an RDF literal.
This section recapitulates for convience the formal definition of RDF.
[24] | Graph | ::= | Set(Triple) |
An RDF graph is a set of RDF triples. | |||
[25] | Triple | ::= | (Subject, Predicate, Object) |
An RDF triple is composed of a subject, predicate and object. | |||
[26] | Subject | ::= | IRI | BlankNode |
A subject is either an IRI or a blank node. | |||
[27] | Predicate | ::= | IRI |
A predicate is always an IRI. | |||
[28] | Object | ::= | IRI | BlankNode | Literal |
An object is either an IRI, a blank node, or a literal. | |||
[29] | BlankNode | ::= | RDF blank node |
A blank node is an arbitrary term used only to establish graph connectivity. | |||
[30] | Literal | ::= | PlainLiteral | TypedLiteral |
A literal is either a plain literal or a typed literal. | |||
[31] | PlainLiteral | ::= | lexicalForm | (lexicalForm, langageTag) |
A plain literal has a lexical form and an optional language tag. | |||
[32] | TypedLiteral | ::= | (lexicalForm, IRI) |
An typed literal is composed of lexical form and a datatype IRI. | |||
[33] | IRI | ::= | RDF URI-reference as subsequently restricted by SPARQL |
An IRI is an RDF URI reference as subsequently restricted by SPARQL. | |||
[34] | lexicalForm | ::= | a Unicode String |
SQL string representing a value. |
In this model, Databases are inhabitants of RDB and they are denoted by mathematical objects living in the RDF domain. This denotational semantics is what we call the Direct Mapping.
Most of the functions defining the Direct Mapping are higher-order functions parameterized by a function φ : Row → Node
. This function maps any row to a unique node IRI or Blank Node. φ
is formally defined by the following axioms:
The Direct Mapping is defined by induction on the structure of RDB. Thus it is defined for any database. The entry point for the Direct Mapping is the function ⟦ ⟧φdatabase
.
[37] | ⟦ ⟧φdatabase | : | Database → Graph |
A mapping from a relational database to an RDF graph. | |||
[38] | ⟦db⟧φdatabase | = | { triple | triple ∈ ⟦t⟧φtable | t ∈ db } |
The union of the triples expressing each table in the database. | |||
[39] | ⟦ ⟧φtable | : | Table → Set(Triple) |
A mapping from a table to a set of RDF triples. | |||
[40] | ⟦t⟧φtable | = | { triple | triple ∈ ⟦r⟧φrow | r ∈ body(t) } |
The triples expressing each row in the table t | |||
[41] | ⟦ ⟧φrow | : | Row → Set(Triple) |
A mapping from a row to a set of RDF triples. | |||
[42] | ⟦r⟧φrow | = | let s = φ(r) in |
The union of the triples coming from
| |||
[43] | ⟦ , ⟧φref | : | (Row, ForeignKey) → (Predicate, Object) |
A mapping from a foreign key in a row to an RDF predicate and an RDF object. | |||
[44] | ⟦r, fk⟧φref | = | let p = ⟦table(r), fk⟧col in |
The predicate based on the column name and the object refered by the foreign key fk. | |||
[45] | ⟦ , ⟧lex | : | (Row, Column) → { s:Set((Predicate, Object)) | size(s) ≤ 1 } |
A mapping from a column in a row to an optional pair of RDF predicate and object. | |||
[46] | ⟦r, c⟧lex | = |
let p = ⟦table(r), fk⟧col in |
If the cell value for this column is NULL: nothing; Otherwise: a predicate based on the column name and a typed literal made of the value in the cell plus the corresponding RDF datatype. | |||
[47] | ⟦ , ⟧col | : | (Row, List[Column]) → IRI |
A mapping from a list of columns in a row to an IRI. | |||
[48] | ⟦r, c*⟧col | = | ue(tablename(table(r))) + '#' + ue(c0) + ',' + ⋯ + ',' + ue(cn-1) |
A concatenation, with punctuation as separators, of the url-encoded table name and the url-encoded column names. | |||
[49] | ⟦ ⟧datatype | : | Datatype → IRI |
A mapping from a SQL datatype to an XML Schema datatype IRI. | |||
[50] | ⟦d⟧datatype | = |
if d is Int then XSD:integer |
The XML Schema datatype for d as defined by IWD 9075 §9.5 | |||
[51] | ue | : | String → String |
An URL encoding per WSDL urlEncoded. |
In this section, we formally present the Direct Mapping as rules in Datalog syntax. The left hand side of each rule is the RDF Triple output. The right hand side of each rule consists of a sequence of predicates from the relational database and built-in predicates. The built-in predicates are divided into three groups. The first group contains some built-in predicates for dealing with repeated rows in a table without a primary key.
The second group contains a predicate to deal with null values.
Finally, the third group of built-in predicates is used to generate IRIs for identifying tables and the columns in a table, and to generate IRIs or blank nodes for identifying each row in a table.
Consider again the example from Section Direct Mapping Example. It should be noticed that in the rules presented in this section, a formula of the form Addresses(X, Y, Z) indicates that the variables X, Y and Z are used to store the values of a row in the three columns of the table Addresses (according to the order specified in the schema of the table, that is, X, Y and Z store the values of ID, city and state, respectively). In particular, uppercase letters like X, Y, Z, S, P and O are used to denote variables. Moreover, double quotes are used in the rules to refer to the string with the name of a table or a column. For example, a formula of the form generateRowIRI("Addresses", ["ID"], [X], S) is used to generate the Row RDF Node (or Row IRI) for the row of table "Addresses" whose value in the primary key "ID" is the value stored in the variable X. The value of this Row IRI is stored in the variable S.
Assume that r is a table with columns a1, ..., am and such that [a1, ..., an] is the primary key of r, where 1 ≤ n ≤ m. Then the following is the direct mapping rule to generate Table Triples from r:
Triple(S, "rdf:type", O) ← r(X1, ..., Xm), generateRowIRI("r", ["a1", ..., "an"], [X1, ..., Xn], S), generateTableIRI("r", O)
For example, table Addresses in the Direct Mapping Example has columns ID, city and state, and it has column ID as its primary key. Then the following is the direct mapping rule to generate Table Triples from Addresses:
Triple(S, "rdf:type", O) ← Addresses(X1, X2, X3), generateRowIRI("Addresses", ["ID"], [X1], S), generateTableIRI("Addresses", O)
As a second example, consider table Department from the example in Section Foreign keys referencing candidate keys, which has columns ID, name, city and manager, and assume that (name, city) is the multi-column primary key of this table (instead of ID). Then the following is the direct mapping rule to generate Table Triples from Department:
Triple(S, "rdf:type", O) ← Department(X3, X1, X2, X4), generateRowIRI("Department", ["name","city"], [X1, X2], S), generateTableIRI("Department", O)
Assume that r is a table with columns a1, ..., am and such that r does not have a primary key. Then the following is the direct mapping rule to generate Table Triples from r:
Triple(S, "rdf:type", O) ← r(X1, ..., Xm), card("r", [X1, ..., Xm], U), V ≤ U, generateRowBlankNode("r", [X1, ..., Xm], V, S), generateTableIRI("r", O)
For example, table Tweets from Section Empty (non-existent) primary keys has columns tweeter, when and text, and it does not have a primary key. Then the following is the direct mapping rule to generate Table Triples from Tweets:
Triple(S, "rdf:type", O) ← Tweets(X1, X2, X3), card("Tweets", [X1, X2, X3], U), V ≤ U, generateRowBlankNode("Tweets", [X1, X2, X3], V, S), generateTableIRI("Tweets", O)
Assume that r is a table with columns a1, ..., am and such that [a1, ..., an] is the primary key of r, where 1 ≤ n ≤ m. Then for every aj (1 ≤ j ≤ m) that is not the only constituent of a foreign key of r or is the only constituent of a foreign key of r that references a candidate key, the direct mapping includes the following rule for r and aj to generate Literal Triples:
Triple(S, P, Xj) ← r(X1, ..., Xm), nonNull(Xj), generateRowIRI("r", ["a1", ..., "an"], [X1, ..., Xn], S), generateColumnIRI("r", ["aj"], P)
For example, table Addresses in the Direct Mapping Example has columns ID, city and state, and it has column ID as its primary key. Then the following are the direct mapping rules to generate Literal Triples from Addresses:
Triple(S, P, X1) ← Addresses(X1, X2, X3), nonNull(X1), generateRowIRI("Addresses", ["ID"], [X1], S), generateColumnIRI("Addresses", ["ID"], P) Triple(S, P, X2) ← Addresses(X1, X2, X3), nonNull(X2), generateRowIRI("Addresses", ["ID"], [X1], S), generateColumnIRI("Addresses", ["city"], P) Triple(S, P, X3) ← Addresses(X1, X2, X3), nonNull(X3), generateRowIRI("Addresses", ["ID"], [X1], S), generateColumnIRI("Addresses", ["state"], P)
As a second example, consider again table Department from the example in Section Foreign keys referencing candidate keys, which has columns ID, name, city and manager, and assume that (name, city) is the multi-column primary key of this table (instead of ID). Then the following are the direct mapping rules to generate Literal Triples from Department:
Triple(S, P, X1) ← Department(X3, X1, X2, X4), nonNull(X1), generateRowIRI("Department", ["name", "city"], [X1, X2], S), generateColumnIRI("Department", ["name"], P) Triple(S, P, X2) ← Department(X3, X1, X2, X4), nonNull(X2), generateRowIRI("Department", ["name", "city"], [X1, X2], S), generateColumnIRI("Department", ["city"], P) Triple(S, P, X3) ← Department(X3, X1, X2, X4), nonNull(X3), generateRowIRI("Department", ["name", "city"], [X1, X2], S), generateColumnIRI("Department", ["ID"], P)
It is important to notice that no rule is generated from column manager, as this column is the only constituent of a foreign key that references a primary key: FOREIGN KEY(manager) REFERENCES People(ID).
Assume that r is a table with columns a1, ..., am and such that r does not have a primary key. Then for every aj (1 ≤ j ≤ m) that is not the only constituent of a foreign key of r or is the only constituent of a foreign key of r that references a candidate key, the direct mapping includes the following rule for r and aj to generate Literal Triples:
Triple(S, P, Xj) ← r(X1, ..., Xm), nonNull(Xj), card("r", [X1, ..., Xm], U), V ≤ U, generateRowBlankNode("r", [X1, ..., Xm], V, S), generateColumnIRI("r", ["aj"], P)
For example, table Tweets from Section Empty (non-existent) primary keys has columns tweeter, when and text, and it does not have a primary key. Then the following are the direct mapping rules to generate Literal Triples from Tweets:
Triple(S, P, X2) ← Tweets(X1, X2, X3), nonNull(X2), card("Tweets", [X1, X2, X3], U), V ≤ U, generateRowBlankNode("Tweets", [X1, X2, X3], V, S), generateColumnIRI("Tweets", ["when"], P) Triple(S, P, X3) ← Tweets(X1, X2, X3), nonNull(X3), card("Tweets", [X1, X2, X3], U), V ≤ U, generateRowBlankNode("Tweets", [X1, X2, X3], V, S), generateColumnIRI("Tweets", ["text"], P)It is important to notice that no rule is generated from column tweeter, as this column is the only constituent of a foreign key that references a primary key: FOREIGN KEY(tweeter) REFERENCES People(ID).
For each foreign key from a table r1 to a table r2, one of the following four cases is applied.
Assume that:
r1 is a table with columns a1, ..., ai and such that [a1, ..., aj] is the primary key of r1, where 1 ≤ j ≤ i
r2 is a table with columns c1, ..., ck and such that [c1, ..., cm] is the primary key of r2, where 1 ≤ m ≤ k
the foreign key indicates that the columns ap1, ..., apn of r1 reference the columns cq1, ..., cqn of r2, where (1) 1 ≤ p1, ..., pn ≤ i, (2) 1 ≤ q1, ..., qn ≤ k, and (3) n ≥ 1
Then the direct mapping includes the following rule for r1 and r2 to generate Reference Triples:
Triple(S, P, O) ← r1(X1, ..., Xi), generateRowIRI("r1", ["a1", ..., "aj"], [X1, ..., Xj], S), r2(U1, ..., Uk), generateRowIRI("r2", ["c1", ..., "cm"], [U1, ..., Um], O), nonNull(Xp1), ..., nonNull(Xpn), Xp1 = Uq1, ..., Xpn = Uqn, generateColumnIRI("r1", ["ap1", ..., "apn"], P)
For example, ... to-do ...
... to-do ...
Assume that:
r1 is a table with columns a1, ..., ai and such that [a1, ..., aj] is the primary key of r1, where 1 ≤ j ≤ i
r2 is a table with columns c1, ..., ck, and it does not have a primary key
the foreign key indicates that the columns ap1, ..., apn of r1 reference the columns cq1, ..., cqn of r2, where (1) 1 ≤ p1, ..., pn ≤ i, (2) 1 ≤ q1, ..., qn ≤ k, and (3) n ≥ 1
Then the direct mapping includes the following rule for r1 and r2 to generate Reference Triples:
Triple(S, P, O) ← r1(X1, ..., Xi), generateRowIRI("r1", ["a1", ..., "aj"], [X1, ..., Xj], S), r2(U1, ..., Uk), card("r2", [U1, ..., Uk], V1), V2 ≤ V1, generateRowBlankNode("r2", [U1, ..., Uk], V2, O), nonNull(Xp1), ..., nonNull(Xpn), Xp1 = Uq1, ..., Xpn = Uqn, generateColumnIRI("r1", ["ap1", ..., "apn"], P)
For example, ... to-do ...
... to-do ...
Assume that:
r1 is a table with columns a1, ..., ai, and it does not have a primary key
r2 is a table with columns c1, ..., ck and such that [c1, ..., cm] is the primary key of r2, where 1 ≤ m ≤ k
the foreign key indicates that the columns ap1, ..., apn of r1 reference the columns cq1, ..., cqn of r2, where (1) 1 ≤ p1, ..., pn ≤ i, (2) 1 ≤ q1, ..., qn ≤ k, and (3) n ≥ 1
Then the direct mapping includes the following rule for r1 and r2 to generate Reference Triples:
Triple(S, P, O) ← r1(X1, ..., Xi), card("r1", [X1, ..., Xi], V1), V2 ≤ V1, generateRowBlankNode("r1", [X1, ..., Xi], V2, S), r2(U1, ..., Uk), generateRowIRI("r2", ["c1", ..., "cm"], [U1, ..., Um], O), nonNull(Xp1), ..., nonNull(Xpn), Xp1 = Uq1, ..., Xpn = Uqn, generateColumnIRI("r1", ["ap1", ..., "apn"], P)
For example, ... to-do ...
... to-do ...
Assume that:
r1 is a table with columns a1, ..., ai, and it does not have a primary key
r2 is a table with columns c1, ..., ck, and it does not have a primary key
the foreign key indicates that the columns ap1, ..., apn of r1 reference the columns cq1, ..., cqn of r2, where (1) 1 ≤ p1, ..., pn ≤ i, (2) 1 ≤ q1, ..., qn ≤ k, and (3) n ≥ 1
Then the direct mapping includes the following rule for r1 and r2 to generate Reference Triples:
Triple(S, P, O) ← r1(X1, ..., Xi), card("r1", [X1, ..., Xi], V1), V2 ≤ V1, generateRowBlankNode("r1", [X1, ..., Xi], V2, S), r2(U1, ..., Uk), card("r2", [U1, ..., Uk], V3), V4 ≤ V3, generateRowBlankNode("r2", [U1, ..., Uk], V4, O), nonNull(Xp1), ..., nonNull(Xpn), Xp1 = Uq1, ..., Xpn = Uqn, generateColumnIRI("r1", ["ap1", ..., "apn"], P)
For example, ... to-do ...
... to-do ...
$Log: Overview.html,v $ Revision 1.3 2018/10/09 13:24:52 denis fix validation of xhtml documents Revision 1.2 2017/10/02 10:43:56 denis add fixup.js to old specs Revision 1.1 2011/03/23 22:17:52 bertails + snapshot of rdb-direct-mapping Revision 1.21 2011/03/23 20:53:12 bertails ~ cleaning before moving to TR space Revision 1.20 2011/03/17 23:16:34 eric - fragments on node IRIs Revision 1.19 2011/03/08 04:14:06 bertails ~ fix some typos Revision 1.18 2011/03/07 00:48:06 bertails + phi function mapping rows to RDF nodes Revision 1.17 2011/03/07 00:13:06 bertails + RDB accessor functions Revision 1.16 2011/03/06 21:56:29 bertails ~ migrating to cleaner denotational semantics Revision 1.15 2011/03/02 17:26:34 marenas Datalog rules in Section 4 were simplified Revision 1.14 2011/03/01 02:35:49 marenas Section 4 now includes all the Datalog rules that define the direct mapping Revision 1.13 2011/02/01 16:15:17 marenas Section 4 now includes an example for each type of Datalog rule used to define the direct mapping Revision 1.12 2011/01/27 01:16:42 marenas New version of Datalog rules to deal with repeated tuples in a table without a primary key Revision 1.11 2010/11/17 21:36:44 eric ~ validated HTML, CSS, links for publication Revision 1.10 2010/11/16 17:45:35 eric ~ xml well-formed Revision 1.9 2010/11/16 17:43:47 eric ~ 2010-11-16T17:34:38Z <ericP> mhausenblas: s/very simple direct mapping/direct mapping/ ~ re-title notation ~ addressed nunolopes's issue with rule 23 ~ text from #rdb2rdf 2010-11-16T17:40:18Z <juansequeda>... Revision 1.8 2010/11/16 17:30:18 eric ~ fixed Notation title Revision 1.7 2010/11/16 17:25:52 eric ~ re-oranized algebra section Revision 1.6 2010/11/16 17:22:39 eric ~ re-ordered authors Revision 1.5 2010/11/11 18:39:27 marenas rephrasing the definition of table tuples Revision 1.4 2010/11/11 17:58:25 marenas New Section 2.2: "Preliminaries: Generating IRIs" Examples are now grouped in Section 2.4: "Additional Examples and Corner Cases" Revision 1.3 2010/11/10 14:54:48 marenas Removing "(Editor)" from the list of authors Revision 1.2 2010/11/10 12:56:22 eric + formalism-model issue Revision 1.1 2010/11/10 02:51:03 eric moved from ../directMapping Revision 1.56 2010/11/10 02:47:08 eric ~ well-formedness error Revision 1.55 2010/11/10 02:45:37 eric ~ finished adopting the all-relative-IRI model in order to sync with the merged text from alt/ ~ adopted "direct" mapping per the resolution of the 2010-11-09 telecon ~ made Juan and Marcello editors instead of authors ~ fixed a couple typos : I believe this specification follows the intent of: RESOLUTION: http://www.w3.org/2001/sw/rdb2rdf/directMapping/ with Juan, Marcelo and Eric as editors based on Richard's proposal as of http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2010Nov/0052.html and try to work in J&M's IRI and Triple generations part; move hierarchical table and the M-M mappings into Ed note; datalog as a separate section; Eric perform merge with review/approval/consensus of Juan, Marcelo, & Eric Revision 1.54 2010/11/09 22:46:56 eric + hier-table-at-risk issue Revision 1.53 2010/11/09 22:41:02 eric ~ date Revision 1.52 2010/11/09 22:39:12 eric ~ s/stem/base/g + inclusion of collapsible sections from alt/ Revision 1.51 2010/11/09 15:39:46 eric ~ removed collapsible sections per request mid:AANLkTikvnrgXuu5fDAw+c2nUv5ENkmngPAJJ05c2gASk@mail.gmail.com Revision 1.50 2010/11/09 15:06:34 eric + exp sections Revision 1.49 2010/11/09 14:12:08 eric ~ addressed cygri's comments of 2010-11-09T06:46:12Z Revision 1.48 2010/11/09 04:11:35 eric ~ addressed cygri's comments of 2010-11-07T04:13Z + inclusion of some explanatory details from alt Revision 1.47 2010/11/04 12:42:21 eric ~ working on style for editorial choices Revision 1.46 2010/11/04 06:10:08 eric ~ hilit triples in query in Use of Direct Mapping Revision 1.45 2010/11/04 05:42:55 eric ~ incorporated Richard Cyganiak's comments targeted at alt Revision 1.44 2010/11/02 08:18:07 eric ~ updates per DanC's feedback Revision 1.43 2010/10/29 03:10:12 eric ~ s/relational terminology/SQL terminology/ Revision 1.42 2010/10/17 13:46:48 eric + SQL constraints Revision 1.41 2010/10/12 14:21:36 eric ~ renumbered Revision 1.40 2010/10/12 12:14:52 eric + SQL for example 1 Revision 1.39 2010/10/11 03:12:21 eric ~ prettied up mutual-hilights Revision 1.38 2010/10/10 22:09:55 eric + pfkexception Revision 1.37 2010/10/10 14:25:41 eric ~ re-worked front-loaded informative rules Revision 1.36 2010/10/10 11:59:01 eric ~ prettied-up pre@class=turtle ~ experimenting with new presentation of transformation rules ~ validated XSLT output Revision 1.35 2010/10/09 15:12:40 eric + crosslinks for hier-tabl Revision 1.34 2010/10/09 14:52:31 eric + crosslinks for ref-no-pk Revision 1.33 2010/10/09 13:45:17 eric ~ symmetric xrefs between tables and triples for emp-addr and multi-key Revision 1.32 2010/10/08 21:59:54 eric + hilights Revision 1.31 2010/09/29 19:53:37 eric ~ align with https://dvcs.w3.org/hg/stemGraph/ Revision 1.30 2010/09/29 15:13:18 eric ~ align with https://dvcs.w3.org/hg/stemGraph/rev/75cf39ef7d74 Revision 1.29 2010/09/29 03:34:55 eric + 2nd gen hierarchical example Revision 1.28 2010/09/28 03:10:53 eric validation Revision 1.27 2010/09/28 03:08:52 eric + hierarchical (untested) Revision 1.26 2010/09/27 21:49:18 eric ~ XML validation (per xsltproc) Revision 1.25 2010/09/27 21:46:42 eric ~ fixed reference table name Revision 1.24 2010/09/27 18:48:46 eric + noticed another key in ref-no-pk Revision 1.23 2010/09/27 18:13:03 eric + ref-no-pk Revision 1.22 2010/09/27 14:50:44 eric + nodemap + a rough pass on <scala>scala code</scala> Revision 1.21 2010/09/26 04:50:07 eric ~ fix load state for syntax display Revision 1.20 2010/09/25 18:40:39 eric + some tips Revision 1.19 2010/09/24 16:34:02 eric + some tips Revision 1.18 2010/09/24 16:00:53 eric + some tips Revision 1.17 2010/09/24 15:50:41 eric + buttons for different languages Revision 1.16 2010/09/07 12:14:44 eric ~ fixed pk invocation errors per mid:04C1B62C-42A5-424C-974B-6E894ED7B11A@cyganiak.de Revision 1.15 2010/08/30 18:37:19 eric + Empty (Non-existent) Primary Keys section Revision 1.14 2010/08/30 14:05:45 eric + fks Revision 1.13 2010/08/30 13:27:01 eric ~ content-free prettying-up Revision 1.12 2010/08/23 14:24:22 eric ~ simplified per Richard's suggestions Revision 1.11 2010/07/18 21:33:21 eric ~ proof-reading for an explanatory email Revision 1.10 2010/07/12 23:15:42 eric ~ typos revealed when geeking with Sandro Revision 1.9 2010/06/15 15:23:43 eric ~ validating Revision 1.8 2010/06/15 14:33:59 eric ~ s/Heading/Header/ per google fight Revision 1.7 2010/06/15 14:27:08 eric ~ s/∀/∣/g per cygri's comment Revision 1.6 2010/06/07 15:49:34 eric + Extending the Direct Mapping section Revision 1.5 2010/06/07 14:35:18 eric + finished mappings Revision 1.4 2010/06/03 23:02:58 eric ~ SNAPSHOT Revision 1.3 2010/06/03 13:06:00 eric + start on literal map Revision 1.2 2010/06/03 12:43:30 eric ~ made algebra all symboly Revision 1.1 2010/06/02 15:03:34 eric CREATED