Volume VII Number 1, August 2000

Digital Talking Book Standards Developed By NLS And Partners Under NISO Auspices

John Cookson
Michael Moodie
Lloyd Rasmussen
National Library Service for the Blind and Physically Handicapped (NLS)
The Library of Congress

ABSTRACT

The functionality, compatibility, and longevity planned for future digital talking books require clear, exact definitions of component format and content. NLS will achieve this by working with a diverse team of experts to establish an applicable standard. This article outlines the plan, describes progress, and indicates what further work is necessary to complete the standard.

OVERVIEW

Under the auspices of the National Information Standards Organization, a standards developer accredited by the American National Standards Institute, NLS is leading a committee of experts in the development of a digital talking book (DTB) standard. For a more detailed discussion of why and how the standards process was begun, please see our CSUN 97 paper, titled Talking Books: Toward a Digital Model.

The committee has taken a very general approach to a DTB standard to accommodate a wide variety of books, users, producers, and playback devices. There is interest in compatibility with commercial electronic books as well as ease of interaction on an international basis. To this end there is common membership with groups of similar interest, specifically, the DAISY Consortium, the World Wide Web Consortium, and the Open eBook Forum. This perspective allows us to utilize rather than duplicate prior work and enhances the standardís prospects for longevity and support. The committeeís membership may be found on the NISO Standards Committee site. A DTB is envisioned to be, in its fullest implementation, a group of digitally encoded files containing an audio portion recorded in human speech; the full text of the work in electronic form, marked with the tags of a descriptive markup language; and a linking file that synchronizes the text and audio portions.

As this document illustrates, such a structure will allow the DTB user a broad range of capabilities not possible with cassette talking books. The standard uses concepts and components found in other open web-based standards, specifically, Open eBook, Synchronized Multimedia Integration Language (SMIL), and Extensible Markup Language (XML). For more details on these please see:

The standard covers three different classes of players from very simple to very sophisticated and six different types of books, again, from simple to complex. All text files use the ASCII character set. For an overview of the larger NLS digital audio development project please see the companion article in this issue entitled "National Library Service for the Blind and Physically Handicapped: Digital Plans and Progress."

COMPONENTS

The provisions of the NISO DTB standard are expressed in two different kinds of documents: normative and formative. Normative documents define the characteristics of a product required for standard compliance. Informative documents provide general information about the standard and recommend ways to achieve compliance.

The informative documents presently consist of the following:

1. Prioritized List of Features for Digital Talking Book Playback Devices

As the name indicates, this document describes characteristics of DTB playback systems. It allows for three different types of players: very simple hand-helds, mainly for linear leisure reading; more complex portables mainly for students and professionals; and user-supplied computer-based players capable of supporting the most sophisticated features. Features are prioritized as essential, highly desirable, or useful for each type of player, and include functions such as variable-speed reading, book-marking, and the ability to immediately access items listed in a table of contents. The full feature set may be found at the Digital Talking Book Standards Committeeís document titled "Playback Device Guidelines."

2. Document Navigation Features List

Although the prioritized feature set mentioned above contains general navigation requirements, the topic is complex enough to motivate a separate explanatory document. The Navigation Features List describes mechanisms for immediate random access to selected areas of a book and other capabilities such as searching, highlighting, excerpting, and skipping user-selected elements.

3. Structuring Guidelines for Digital Talking Books

The Structuring Guidelines document tells DTB producers how to put XML tags into text so that the relationships between components are properly represented. It suggests where tags from the allowable set should be inserted into a document and indicates the proper syntax. For example, <p> marks the beginning of a paragraph and the end is marked by </p> Consult the document titled. "DAISY Structure Guidelines," for more information.

4. Open eBook Standard

We refer to this standard because it represents an industry effort to achieve compatibility among various playback devices and content from various producers. We recognize that for the widest and most enduring support it is advisable to converge on standards that dominate the consumer market. Moreover, eBook participants have a keen and authoritative interest in resolving difficult issues such as Digital Rights Management (DRM) methods and metadata requirements. One section of the eBook standard that is of particular interest is the package file. This file lists the components of a given product and indicates various relationships among them. For example, the spine area of the package file lists product files in a logical linear reading order. It appears that with minor modifications, the package file specification, which is embodied in an XML document type definition (DTD), would be suitable for use in the DTB standard. The modifications would expand the allowed file types to include various audio formats.

The normative documents that comprise the standard presently consist of the following:

1. Digital Talking Book (DTB) Document Type Definition (DTD)

This technical paper defines what XML tags are to be used to indicate the structure of a particular document and the proper syntax for their use. As with all DTDs, it is typically used by parser software to verify that target documents are "well formed" and "valid," i.e., are properly marked up. A DTD is typically read only by a computer. The application of this DTD is the subject of the Structuring Guidelines. For a view of the DTD and its history, please see the document titled "Document Type Definition (DTD) for Digital Talking Books."

2. DTB Bookmark DTD

This technical paper defines the structure, syntax, and content of a bookmark file. Bookmark files are portable files to be composed and read by various playback devices. They are designed to allow a user to set a large number of bookmarks or highlight many sections and attach text or audio labels to them. To ensure compatibility, it is necessary to define a standard format. This DTD is nearing completion. In service it would be directly used only by player software, not by the patron.

3. DTB Navigation Control Center for XML Applications (NCX) DTD

This technical paper defines the structure, syntax, and content of a file called the Navigation Control Center that is used by a player to provide direct access to various areas of the book being read. The NCX is typically built by software and accessed directly only by player software. This DTD is nearing completion.

4. DTB Package File DTD

This technical paper defines the structure, syntax, and content of a file called the package file. The concept and most of the details are borrowed from the Open eBook Forum. The package file would be built by software with producer intervention and accessed directly only by playback software. This DTD is posted on the Open eBook site and suggested modifications will be the subject of discussions at the eBook Forum.

5. DTB File Specification

This technical paper defines the types of ASCII (text) and binary (audio and image) files that are allowed in a DTB. Most of the text files are of the XML type and consist of the following:

* Book text with tags added to indicate its structure, e.g. RevStd.XML

* Package file to identify the DTB, list contents, include metadata, etc., e.g. RevStd.OPF

* SMIL file for fast access and synchronization of text with audio, e.g. RevStd.SMIL

* NCX file to enable fast access to book components, e.g. RevStd.NCX

* Bookmark file containing points of interest marked by the user, e.g. RevStd.BKM

There is one other type of text file besides XML:

* CSS files that tell the player how to present the material to the user, e.g. RevStd.CSS

Binary files can be divided into two classes, audio and image. Audio files are as follows:

* PCM files represent audio with numeric samples like music CDs, e.g. RevStdFwd.WAV

* ADPCM files are similar to PCM but more compact, e.g. RevStdIntro.WAV

* MPEG files are very compact but have adequate fidelity, e.g. RevStdHist.MP3

Binary image files in formats such as JPEG are also allowed but the set of allowed formats has not yet been determined.

SAMPLE BOOK

On the basis of their relative audio and text content, the specification identifies six different classes of books listed below. The producer will select the class for a particular book on the basis of production cost, e.g. #6 (low) vs #4 (high); the bookís topic and structure, e.g. novel vs cookbook; and patron need, e.g. textbook for classroom use vs leisure reading.

1. Audio with only the title in text; access is similar to audio cassette

2. Audio with NCX only; allows direct access to segments, e.g parts, chapters, sections, and subsections, via a text table of contents

3. Audio with NCX and partial text; access is as in #2; partial text example might be an index

4. Audio with NCX and full text; access is as in # 2; plus direct audio access at the paragraph or sentence level.

5. Full Text with NCX and minimal audio (e.g., dictionary); access as in #4

6. Text only; synthesized audio only; access similar to a word processor is possible

To help clarify concepts found in this paper we present an example of a DTB in terms of five components. Because it is the most comprehensive type we choose an example from class #4, audio with full text.

1. Package File

The package file contains general information about the book such as the title, subject, ISBN, and brief description. It contains an inventory of all of the files in the product, a linear order of presentation, and other information to help the player quickly find selected points in the book. Excerpts from the package file follow:

... <dc:Title>Revised Standards and Guidelines of Service for the Library of Congress Network of Libraries for the Blind and Physically Handicapped 1995</dc:Title>

<dc:Subject> library information networks</dc:Subject> ...

<manifest>

<item id="text" href="RevStd.XML" media-type="text/xml"/>

<item id="text_style" href="dtbbase.css" media-type="text/css"/>

<item id="NCX" href="RevStd.NCX" media-type="text/xml"/>

<item id="NCX_style" href="ncx12.css" media-type="text/css"/>

<item id="NCX_dtd" href="ncx13.dtd" media-type="text/xml"/>

<tem id="SMIL" href="RevStd.SMIL" media-type="text/sml"/>

<item id="forward" href="RevStdFwd.MP3" media-type="audio/mp3"/>

<item id="standards" href="RevStdIntro.MP3" media-type="audio/mp3"/>

<item id="audio_title" href="RevStdTitle.MP3" media-type="audio/mp3"/>

<item id="fig_01" href="fig1.gif" media-type="image/gif"/>

</manifest>

<spine>

<itemref idref="SMIL"/>

</spine> ...

2. XML Text

The XML text is a file representing the source document as tagged to indicate structure and component relationships. The XML tags tell a playback device exactly what each block of text is and allows the device to present material appropriately. The DTB for a book such as a novel recorded by NLS would typically have no text component, except perhaps a table of contents. The DTB for a text book recorded by Recording for the Blind and Dyslexic (RFB&D), or a reference work produced by NLS, however, might have the full text as a component. Excerpts taken from the text file of the sample book presented above follow:

... <title> Revised Standards and Guidelines of Service for the Library of Congress Network of

Libraries for the Blind and Physically Handicapped 1995</title> ...

<ul><li>Foreword 1

<ul>

<li>History 1</li>

<li>Development of Standards 2</li>

<li>Structure of the National Network 4</li>

<li>Acknowledgments 6</li>

</ul>

</li>

<li>Introduction 7 ...

</li> ...

<li>Standards 16 ...

</li>

</ul>

<level1 class="forward" id="lvl1_3">

<pagenum id="p1" page="normal">1</pagenum>

<h1 id="h1_3">Foreword</h1>

<level2 class="section" id="lvl2_1">

<h2 id="h2_1">History</h2> <p id="para_8"> Today's network is a confederation of 56 regional libraries, 86 subregional libraries, and 2 multistate centers serving eligible readers and is the result of more than one hundred years development and experience. Before the turn of the century, library service for blind people was initiated by several libraries throughout the United States. The Boston Public Library established a department for the blind in 1868 after receiving eight embossed volumes. Between 1882 and 1903 public libraries in Philadelphia, Chicago, New York City, and Detroit established circulating collections of embossed books for the blind. New York was the first state to create a department for the blind in a state library.</p> ...

3. NCX File

The XML text file excerpted above has tags within it at all of the points that a reader can access directly. In theory, a player could locate them on demand from the XML file, however, this would pose a significant computational burden on the player and might make response time uncomfortably long. To make entry points more readily available, the standard includes a Navigation Control Center (NCX) that lists key entry points and makes access faster. In the following excerpt each <audio> tag indicates the audio clip for the heading of an entry point, e.g. the narration saying "Foreword", the <text> tag shows what would appear on the screen of a PC-based player (useful for readers with visual impairment or reading disabilities), and the <content> tag hold the link to the synchronized text and audio of the given section of the document.

?xml:stylesheet type="text/css" href="ncx12.css"?>

<!DOCTYPE ncx SYSTEM "ncx17.dtd">

<ncx>

<head>

<title>Revised Standards and Guidelines of Service for the Library of Congress

Network of Libraries for the Blind and Physically Handicapped 1995</title>

</head>

<doctitle>

<text>Revised Standards and Guidelines of Service for the Library of Congress

Network of Libraries for the Blind and Physically Handicapped 1995</text>

<audio src="RevStd.MP3"/>

</doctitle>

<NavStruct id="main" class="main">

<navLevel levelNumber="1">

<navObject class="level1" id="lvl1_3">

<text>Foreword</text>

<audio src="RevStdFwd.MP3" clipBegin="00:01.5" clipEnd="00:02.0"/>

<content src="RevStd.SMIL#h1_3"/>

</navObject>

<navLevel levelNumber="2">

<navObject class="level2" id="2_1">

<text>History</text>

<audio src="RevStdFwd.MP3" clipBegin="00:03.4" clipEnd="00:03.9"/>

<content src="RevStd.SMIL#h2_1"/>

</navObject> ...

4. Audio

A typical DTB, will have as its largest component a set of digital audio files that contain a narrated rendition of the book. The specification will allow a variety of audio formats such as PCM, ADPCM, and MPEG Levels 2 and 3. The final set of supported formats has not yet been specified. Because they are usually quite large, we do not provide an example within this document.

5. SMIL

Except for a text-only DTB, every DTB will have one or more SMIL files. This entity relates text to the corresponding audio in terms of specific timing. It instructs the player, for example, to display a particular line of text while playing audio from a particular point in a designated audio file. Envision a user issuing a command to a player such as "go to Chapter 5". The player reacts by finding information in the NCX that points to an entry in a SMIL file that in turn will cause the player to begin displaying text and playing audio at Chapter 5. Response is rapid because all of the necessary information is at hand; the player need not scan the text file for "Chapter 5" and try to figure out where and from which file to begin playing audio. SMIL files are generated by computer software from the XML text file and timing information generated by authoring software or audio analysis software. Excerpts from a SMIL file follow:

<smil>

<head>

<meta name="title" content="Revised Standards and Guidelines of Service for

the Library of Congress Network of Libraries for the Blind and Physically Handicapped

1995"/>

<layout>...

<body>

<seq>

<par id="p1" uGroup="pagenum">

<text region="text" src="RevStd.XML#p1"/>

<audio src="RevStdFwd.MP3" clipBegin="0s" clipEnd="0.9s"/>

</par>...

<par id="p2" uGroup="pagenum">

<text region="text" src="RevStd.XML#p2"/>

<audio src="RevStdFwd.MP3" clipBegin="00:53.9" clipEnd="00:54.6"/>

</par>...

VALIDATION AND TESTING

The committee will develop software systems and standard books to support validation and testing. Standard books will consist of one of each class of DTB. These books will be carefully built and rigorously checked for conformance with the specification. They will serve as benchmarks for testing players. They will help developers build compatible players and will enable evaluators to verify that player features are properly implemented.

In the process of building standard books it may be convenient to write software to help check their consistency and accuracy- -for example, software to verify that all of the files mentioned in the package file are present and have the expected format. Such software could screen books for conformance to the specification.

The largest software effort will be the writing of PC-based players; there may be several based on different program components. Players will serve several purposes. They will confirm that the specification can be implemented with a PC and test the conformance of books to the specification. They will indicate the relative difficulty and expense in implementing various player features and help with interface design.

Validation and testing software is considered necessary support for the specification but not an integral technical component.

CONCLUSION

Although there are additional technical components of the specification not mentioned in this paper, the ones described above constitute the core of the specification. They are presented to give the reader a concise view of the essential elements and an insight into the complexity of DTBs. The NISO DTB committee has targeted September 2000 for making a draft specification available for comment on the NISO web site. Observers can monitor our progress by occasionally checking news items on the NLS web site.

Cookson, J., Moodie, M., & Rasmussen, L. (2000). Digital talking book standards developed by NLS and partners under NISO auspices. Information Technology and Disabilities E-Journal, 7(1).