We argue the case for abstract document structure as a separate descriptive level in the analysis and generation of written texts. The purpose of this representation is to mediate between the message of a text (i.e., its discourse structure) and its physical presentation (i.e., its organization into graphical constituents like sections, paragraphs, sentences, bulleted lists, figures, and footnotes). Abstract document structure can be seen as an extension of Nunberg's “text-grammar” it is also closely related to “logical” markup in languages like HTML and LaTEX. We show that by using this intermediate representation, several subtasks in language generation and language understanding can be defined more cleanly.

This content is only available as a PDF.