GLOSS: Converting plain text to XML

Richard Kaye,
School of Mathematics,
University of Birmingham

Glosing is a full glorious thing certain,
For letter slayeth, as we clerkes sayn.

from The Summonner's Tale,
By Geoffrey Chaucer

"A discerning friend of mine," said Don Quixote, "was of opinion that no one ought to waste labour in glossing verses; and the reason he gave was that the gloss can never come up to the text, and that often or most frequently it wanders away from the meaning and purpose aimed at in the glossed lines; and besides, that the laws of the gloss were too strict, as they did not allow interrogations, nor 'said he,' nor 'I say,' nor turning verbs into nouns, or altering the construction, not to speak of other restrictions and limitations that fetter gloss-writers, as you no doubt know."

from Don Quixote,
by Miguel de Cervantes,
Translated by John Ormsby

Short manifesto

Well marked-up text in XML formats carry much more useful information than many traditional formats such as LaTeX and can be used in a variety of ways. The main obstacles to using XML are that XML is verbose, difficult and slow to author directly in a text editor, and can be difficult to read and maintain when it is written. Furthermore, many existing documents, despite perhaps having a certain amount of structure or mark-up, require considerable amounts of work to convert to well-marked up XML.

GLOSS (for Gloss Linguistic Or Semantic Structure), is a program written to convert plain text files to XML with mark-up added automatically. This is a general purpose tool to extract structural information from a text file and write well-formed XML as output. Any well-formed XML can be written this way. The glossing that is performed follows rules expressed in XML in a form somewhat similar to an XSL stylesheet.

GLOSS may be used as an input device for new documents, to remove much of the tedium of entering XML tags in an ordinary text editor, or as a tool to convert legacy documents to XML. It has potentially very many applications, but GLOSS is particularly well-developed for the authoring of (X)HTML web pages with embedded MathML. It is as easy to use (or possibily easier!) than traditional LaTeX and the quality of the output and conformance to the standards is very high. Additionally there is a XHTML-to-LaTeX converter bundled with GLOSS, so most pages produced this way can be exported as LaTeX documents anyway.

A more detailed manifesto of GLOSS's purpose and abilities is available in the documentation.

GLOSS is written in java, and should run on any system where java runs. It can be used from the command-line or from the scripts provided for Unix or MS-Windows. A plugin for the jEdit editor is under development and will provide a graphical interface.

Further Documentation

This lists a selection of the documentation available. For a full list go to the contents page.

Examples

This page is copyright. Web page design and creation by GLOSS.