Towards Quality Printing of Web Documents

A Position Paper for the W3C Quality Printing Workshop

Brad Chase, Bitstream Inc.

As active users of the web are finding out, authors are placing increasing importance on document appearance. Publishers know that a well-designed document is easy to understand. Also, as businesses continue to move onto the web, preserving corporate identities--through faithful rendition of corporate font styles--becomes more important.

Although the web may have been conceived as an electronic medium, users do print web documents. In many ways, the formatting of a web document for printing may be more important than for display - users will likely refer to the permanent printed version time and again. Nor will the reader be able to scroll the printed version to assure that logically related parts are presented together. Fortunately, many of the advances coming to the display of web documents are equally useful for printing.

Over the past year, Bitstream has been working closely with industry leaders to resolve the problems of formatting, displaying, and printing web documents. This work, combined with Bitstream's experience in the font and printing industries has led to a number of insights in the areas of fonts, style sheets, and media types. In the following paragraphs, we will share these insights.

Fonts

Fonts lend an important ambiance to textual material. Corporations demand the ability to preserve their look and feel in web documents. Corporate font portfolios include unusual, custom fonts, and it is unlikely that users will have the same fonts installed on their systems. Even in more traditional academic usage, it is quite likely that authors will want to use fonts (e.g. mathematical symbol fonts, or foreign language fonts) that viewers do not possess. What is needed is a reliable method for an author to ensure that the fonts required to render a document are available to (and usable by) the viewing system.

Requirements of Reliable Font Usage

Authors require the ability to use their fonts on their pages. They expect pages to render properly when displayed or printed, whether or not the viewing system has the fonts, and regardless of platform. Document viewers do not want to download bandwidth intensive image files or complete fonts, nor do they want to incur the expense or liability of distributing copyrighted fonts to end users.

In order to accomplish these objectives, mechanisms must be implemented to:

  1. Select fonts within an HTML document.
  2. Ensure that the font is available for both viewing and printing the document while minimizing bandwidth and protecting intellectual property rights
  3. Associate the font data with the document.

Font Selection

Existing standards provide no mechanism for an author to specify fonts, bu the problem is well on the way to being solved. Cascading Style Sheets, level 1 (CSS1), a working draft of the W3C, proposes one mechanism for accomplishing this, while a non-standard HTML extension provides another.

Style sheets provide a very general mechanism for document formatting including the selection of fonts. As discussed below, style sheets also help solve the formatting problems associated with printing web documents. The CSS1 proposal has received wide support in the industry, and almost certainly points to the future of stylistic markup for web documents. Though the specification is only in the draft stage, a number of developers are already working to implement style sheets.

With Internet Explorer 2.0, Microsoft has implemented an extension, FACE=, to the Netscape <FONT> tag that allows selection of typeface. This extension allows the author to specify a preferred typeface by name, for example;

<FONT FACE="Zurich Blk BT">This text is in Zurich Black.</FONT>

Given the relative ease of implementation of this approach, and the fact that it is already supported by one of the more popular browsers, it seems likely that the FACE= extension will become a de facto standard for font selection.

Font Availability

As publishing moves from a paper based "print then distribute" model to electronic "distribute then print (or display)" methods (of which the web is but one example) the issue of font availability becomes more important. While, as described above, there has been significant activity in the area of specifying fonts within web documents, only recently has any attention been given to ensuring that the fonts are at hand for rendering documents.

At first glance, the solution might seem as simple as embedding the original font files within the document files. There are, however, a number of drawbacks to this approach.

  • File size - Font files can be rather large - 70 K is not uncommon. Kanji font files are megabytes of data. Character sub-setting, including only the glyphs actually required to render the document in the font resource, can go a long way to addressing this problem, but an efficient outline description format can reduce file size even more.
  • Format incompatibility - Like it or not, there are multiple formats for font data. And of the two most popular formats, many computers (and printers) are capable of understanding one or the other, but not both. Without some method of converting between formats, authors will be forced to forgo using fonts or exclude part of their potential audience.
  • Character set differences - Even when the authoring and the viewing systems use the same basic font format, there is still no guarantee that they will use matching character sets. The differences can be all encompassing, as between Kanji and Cyrillic, or subtle, as between the Latin 2 and Latin 5 character sets. While Unicode fonts, such as Cyberbit, will be an important tool for overcoming this problem, they still cannot handle symbol fonts or reliably preserve the appearance of the original glyphs.
  • Extensibility to other platforms - Macintosh and Windows users have fairly advanced font services built into their systems and most UNIX workstation users now have access to X-Windows. Other platforms (e.g. set-top boxes or DOS systems) have to implement their own font sub-systems. Many of these systems have limited memory and processing power. Font display software needs to be compact, efficient, and available in a highly portable format.
  • Property rights - The outline descriptions in the font file are the intellectual property of the creator. While displaying the resulting glyphs when rendering a document with the font is clearly legal (otherwise, the font is of little use), the legality of embedding a font within a document depends entirely on the license granted by the creator. Legal concerns can be eliminated by recording the shapes generated by the font on the authoring system, thus assuring that the font has been legally rendered.

Finally, because of the differing resolutions of displays and printers, the use of a hinted outline font format is essential - bitmaps will not render reliably. What is needed then, is a font technology that can record the glyphs used by the author (and only the glyphs actually used) in a compact hinted outline format and make them available to viewing systems in their native format. For systems without native outline font capabilities a compact, efficient, and highly portable rasterizer implementation is preferred.

To meet these requirements, Bitstream created TrueDoc®, a font portability and compression technology that has been shipping for over a year. For more information on TrueDoc and the proposals from other companies, please visit Bitstream's page about Font Technology for the Web.

Associating Font Data with Documents

Because it is not text, font data needs to be separate from the HTML source. Additionally, keeping it separate allows it to be transmitted only when wanted and needed. Thus, a mechanism for linking the font data to the document (HTML file) is needed.

If a style sheet is being used to specify fonts, then one might think of linking the font data to the style sheet. This is generally impractical because:

  • Style sheets can be attached to multiple documents. If the font data was attached to such a style sheet, character sub-setting could not be performed to reduce the size of the font data. This is a significant disadvantage, as in many instances sub-setting results in an 80% reduction in the size of the font data.
  • The CSS1 draft provides multiple methods for including style data that combine within the HTML document, and can only be resolved there. The actual font data needed won't be known until all the style references in the document are parsed.

To provide the greatest flexibility, font data must be referenced from the document itself. Within HTML 3.0, the already existing <LINK> tag appears to be appropriate. This tag creates a typed hyperlink between a document and a resource, and is the tag being proposed for linking a style sheet file to a document. An example of using it for fonts is:

<LINK REL=resource HREF="myfonts.pfr" TYPE="application/truedoc">

Bitstream proposes the use of the LINK tag for associating font data with HTML documents. We also propose that a new relationship (REL) of "resource", which indicates that the LINK references data that may be useful in rendering the document as the author intended.

New Media Types

Given the model of font resource linked into a document, then the concept of "font" as a media type becomes very desirable. Though basic differentiation between data formats could be accomplished via the "application" MIME type, font data is of a fundamentally different nature than the existing media types. Bitstream therefore proposes the formation of a new MIME type, named "font", with initial sub-types of "truetype", "type1", and "pfr" (Portable Font Resource).

This implementation will assist browsers in determining, similar to image data, what resources they are capable of processing and performing. Non-graphical browsers can simply ignore font data (or request it only for printing), while other clients can negotiate the format best for them. Examples of the LINK tag proposed above, but using the new MIME type are:

<LINK REL=resource HREF="myfont.ttf"TYPE="font/truetype">
<LINK REL=resource HREF="myfonts.pfr"TYPE="font/pfr">

Style Sheets

As described above, style sheets are a mechanism for associating fonts with web documents, but they are far more useful than that. Style sheets provide authors with a great deal of control over the appearance of their documents. Unfortunately, as the author himself admits, CSS1 as currently defined needs further development for printing.

While printers can use nearly all of the defined style sheet properties, certain characteristics, for example margins, should generally be treated differently for printed media than for video display. In order to handle these types of device specific properties, Bitstream proposes the addition of printer specific properties to control the following:

  • Recommended paper size, orientation, and margins
  • Page headers and footers
  • Widow and orphan control
  • Location of page breaks - both preventing a block of text from being split across a page break and forcing a page break to occur at a specific location

Conclusion

The tools for bringing high quality printing are within our grasp. Style sheets will address formatting concerns, while font portability solves character set problems and provides authors with the expressiveness they desire. Bitstream is proud to be part of leading edge web publishing solutions such as FutureTense Texture, Hexmac's Hexweb HTML authoring product for newspaper and magazine publishers, and Spyglass' WTK (Web Tool Kit) which provides browser support for style sheets and TrueDoc portable font resources. We will continue to partner with standards groups and industry leaders to make these capabilities a reality for all web users.


This document has been prepared using font specification, and can be best viewed using a browser that supports the <FONT FACE=...> tag.