Pretty Printing - Design Issues

"Pretty Printing" is the practice of formatting code in a computer language so that it is easy to read by humans, and aesthetically pleasing. There is a Wikipedia article which goes over it well in general. In the case of data in the Solid ecosystem there is value not only in legibility but also in making small changes to data evident as small changes to the text, and in tests as a simple more or less canonical form to easily compare expected and actual results. And specifically for RDF data in N3 or Turtle, those languages provide powerful features for representing a graph data in a form close to English language. Legibility and aesthetic properties affect not only reading code but also people writing it. If are expressing by hand instance data, from random facts through configuration files to ontologies and rules, being able to write it in a clear, English like way also is more pleasant, and easier to check for mistakes. When a group is collaborating the same same data document, such as on a whiteboard or in a chat, then legibility and writability are both important.

Introduction

Pretty Printing is the practice of formatting code in a computer language so that it is easy to read by humans, and aesthetically pleasing. There is a Wikipedia article which goes over it well in general.

Specifically in the Solid ecosystem of read-write linked data, legibility of the data in a solid pod is important. it is important because developers who are playing with exiting programs need to be able see how they work. Then, when they are making their own applications, to be able to understand what is happening and what has gone wrong. The "view source effect" which allowed the web to spread though people copying and adapting each others web pages is important linked data in Solid pods too.

It has been the long communal experience in the internet protocol design community that simple text formats (like FTP, SNTP, HTTP, HTML, XML) have been key to adoption and understanding of new systems. So while you can ship a binary format and give everyone tools for translating it into something legible, actually is is better to ship legible formats directly across the net.

It turns out there a few other benefits of legible serialization in the system.

Storage space, transmission time and bandwidth

When RDF triples are encoding using the syntax tools Turtle provides they can take significantly less space. This may or may not be important. The text may get compressed at another stage.

Consistency and continuity

When a data is automatically serialized it is useful for small changes in the it is useful for the same data to consistently give the same text.

It also should be the case that small changes in the data produce small changes in the text.

In tests

Tests of a system involve performing a function, and then comparing the actual output to see if it matches the expected output. When the output is RDF data, then it is good to compare the serialized versions, the pretty-printed versions, rather than the raw triples. This allows on easily to understand what has happened when they don't match.

With Source Code Control

When the serialized data is checked in to a source code control system, such as Mercurial (hg), git, SVN or CVS, then it is very important for small changes in the underlying data to map to small changes in the serialized text. This makes the record of changes smaller, and makes it possible for someone investigating a problem which happened at a certain point to compare two version of the code before and after.

Legibility when writing

Computer languages which have features for being clear and legible also allow people writing code by hand to be clearer in what they are writing. It is easier to write code -- such as say configuration files, or business rules -- if you are writing in a legible, pretty, language.

Collaborative editing of code

Even more important is the clarity and simplicity of expression when more than person is brainstorming over something in RDF, such as on a whiteboard or in a chat. Being able to share and co-design a bit of data is key to the remote collaborative design process. (This is what N3 was originally made for.)

Legibility tools in Turtle

Data in a pod is in RDF typically in Turtle, or JSON-LD. It may be in or in a superset of Turtle called N3. Turtle, and the more powerful n3, have a set of syntax forms which make things more legible. Use them when you serialize a document.

Turtle/N3 is a free format language, like C and Javascript (but not python) and so the author is free to use whitespace such as spaces and newlines freely to make the information as legible as possible.
Full URIs can be replaces with short names with a namespace prefix.
Repeated predicates about the same subject can be separated by semicolon shorthand
Repeated objects for the same subject and predicate can just use the comma shorthand
Blank nodes can be represented by a [ square bracket ] clause
Collections (List) are written with a () parenthesis form
Raw numbers do not need an explicit datatype, the syntax allows them raw.

The following are only available in the full N3 language.

Raw dates and date times do not need an explicit datatype
A reverse arc can be given with a "is ... of" form so all arcs from a subject can be listed together. (This also means that a serializer can serialize any acyclic RDF graph using [] syntax for the blank nodes. In general the [] form is more legible than the form with generated _:aaa bnode identifiers.)
A lot of the clutter in a typical turtle file is the leading ":" on symbols in the default namespace. The @keywords directive allows you to declare the specific words in the file which will be used as keywords (like "a", "is", "of"). After that, any other names with no leading ":" will be interpreted as qnames in the default namespace
Nested graphs are represented with a {} curly bracket syntax. This takes the underlying data out of RDF into the N3 data model.

Specific formatting parameters

The specific values of these parameters are famous for being very much a question of personal choice. Different people's brains work differently when it comes to white space, indent, line length, breaking of long comment strings in rdf. The "spaces or tabs" question is infamous for bike-shedding, and indent and whitespace amounts are a question of taste. But if you are a developer and users may in same cases end up seeing the formatted text you generate, then you have to guess their preferences not just insert your own. I won't give you my favorite settings here, to keep the topic on the principles.

References

Prettyprint, Wikipedia
Feross Aboukhadijeh, Javascript Standard Style

Up to Design Issues

Tim BL