html head title [How (and why) I use GLOSS to write XHTML+MathML] author [Richard Kaye, School of Mathematics, University of Birmingham] contributor ~http://web.mat.bham.ac.uk/R.W.Kaye/ date [2006-05-24] keywords [MathML XML XHTML mathematics publishing GLOSS] body { p @style[text-align:center] [[i[[q[A discerning friend of mine,]] said Don Quixote, [q[was of opinion that no one ought to waste labour in glossing verses; and the reason he gave was that the gloss can never come up to the text, and that often or most frequently it wanders away from the meaning and purpose aimed at in the glossed lines; and besides, that the laws of the gloss were too strict, as they did not allow interrogations, nor [q[said he,]] nor [q[I say,]] nor turning verbs into nouns, or altering the construction, not to speak of other restrictions and limitations that fetter gloss-writers, as you no doubt know.]]]]] p @style[text-align:right] [from Don Quixote,[br] by Miguel de Cervantes,[br] Translated by John Ormsby] section { title[XML] para{ p [XML is a general format for exchange of information between computer systems. It was originally devised as a [q[light]] version of SGML intended to present complex structured data containing the meaning or other information suggesting possible rendition of each individual part. Thus the presentation-MathML (p-MathML) code for [math {mrow a + 3} = beta],] pre !CDATA[ a+3 = β ] p [indicates that [tt[a]] and [tt[β]] are identifiers or variables (and probably will be type-set in a font suitable for variables), [tt[3]] is a number (to be type-set in another font) and [tt[+]], [tt[=]] are operators (with some extra space around them). The [tt[mrow]] delimits the sub-expressions so that the whole thing can be unambiguously read.] } p [As XML is intended as a universal medium, there are a great number of computer systems equipped for reading and using XML data, including systems in web browsers such as for rendering mathematics to the visually impaired, that we (as authors) have little control over. That means, for mathematics, that we must be much more precise in marking up the individual expressions and subexpressions than we are accustomed to. Typing around eighty characters for the ambiguous (but conventional) [tt[a+3=beta]] seems a lot. And it is in fact worse: I still haven't pointed out that in fact I intended the objects [q[a]] and [q[beta]] to be elements of the field with two elements, addition is addition modulo 2, equality is congruence modulo 2, and [q[3]] means the equivalence class modulo 2 of the number 3 (which, of course, is the same as the equivalence class of [q[1]]). We will have to say all of this (and we can, using other XML mark-up from OpenMath or content-MathML) if we are going to type our mathematics in a way that can unambiguously be copied and pasted into a computer algebra system. Being able to express and use expressions like this in a wide variety of systems is a major advantage, but one that comes with an apparent burden attached to it. Actually, even before we get into these details, just typing plain XML is awkward: those closing tags must be present and nested correctly, and any error may stop the application working.] p [We may set our sights lower, and not cater for such a wide range of systems. It certainly is true that MathML can be used in many ways, from quick but rather ambiguous mark-up that may have limited utility to painstakingly careful mark-up that would take a huge amount of time to write by hand. Whatever compromise is taken here though, it seems to me very necessary that there should be a way of entering the required data accurately enough so that an automatic system can apply appropriate defaults and add the necessary XML code.] p [One possibility is to use a LaTeX-to-MathML converter, such as TtM or one of the other text-based syntaxes for MathML (many of which use a syntax similar to LaTeX). I have rejected these for my own personal use because: (a) LaTeX source code does not contain enough information for any system to infer the correct output; (b) any use of macros in LaTeX can obfuscate or break the translation process; and (c) such translators never seem to work on my own documents, possibly because of macros, different fonts, or something else. However, as they say, your mileage may vary.] p [In this article I will look at the case of using GLOSS to author p-MathML embedded in a web page or similar document.] };section section { title[GLOSS] para{ p [GLOSS is a general text-to-XML convertor. It is intended mainly for authors with some basic knowledge of both XML in general and the target XML application they are writing for. In its basic form it enables you to write any XML (including MathML, XHTML, etc.) saving considerably better than 50% of the time and effort. GLOSS uses a syntax based on indentation, like the computer language Python, but unlike TeX and LaTeX. Plain characters and text are delimited by square brackets. (This choice was made as square brackets rarely occur in text, and are easily accessible on most keyboards.) Everything else in the input is a [q[word]] or [q[token]] or [q[command]], usually producing an XML element with the same name. So the example above would be coded in GLOSS as] pre !CDATA[mrow mrow mi\[a\] mo\[+\] mn\[3\] mo\[=\] mi\[β\] ] p [To get this to work, save it in a text file called [tt[example.xml.gloss]] and run the command-line command [tt[gloss-xml example.xml.gloss]] and you should have beautiful XML in a new file called [tt[example.xml]]. Note: [tt[β]] is the standard MathML name for the unicode character β. If you have a unicode editor you can use the unicode character itself instead.] } para{ p [Attributes are encoded in GLOSS with the construct [tt[@name\[text\]]] so the matrix equation] math mrow A = mfenced @open[\[] @close[\]] mtable mtr x y mtr z w p [could be encoded in GLOSS with ] pre !CDATA[mrow mfenced @open\[\\\[\] @close\[\\\]\] mtable mtr mtd mi\[x\] mtd mi\[y\] mtr mtd mi\[z\] mtd mi\[w\] ] p [which gives] pre !CDATA[ A = x y z w ] p [In p-MathML terminology a [q[fence]] is a pair of brackets that may change size according to context. Note the use of [tt[\\]] to [q[escape]] the [tt[\]]] character, which would otherwise be taken to be the end of an empty text block. The other characters that need to be escaped like this are [tt[\[]], [tt[\{]], [tt[\}]], and [tt[\\]]. This provides a useful check built in to the system that you remembered the closing [tt[\]]] character.] } para { p [Indentation is very nice most of the time (and many text editors are already set up to utilise it) but sometimes more control is needed. Braces [tt[\{...\}]] are used in GLOSS to over-ride indentation. The rule is that an XML group cannot cross an open or close brace. So [tt[\}]] closes all elements that were opened after the corresponding [tt[\{]]. This means that the above example could be encoded as] pre !CDATA[mrow mfenced @open\[\\\[\] @close\[\\\]\] mtable \{ mtr \{mtd mi\[x\]\} \{mtd mi\[y\]\} mtr \{mtd mi\[z\]\} \{mtd mi\[w\]\} \} ] p [Note that if it wasn't for the [tt[\{]] immediately following [tt[mtable]] the [tt[mtr]] elements would be children of [tt[mfenced]], not [tt[mtable]].] } para { p [GLOSS also allows you to [q[push back]] into element-mode when in text mode, like TeX does—unlike normal XML. So, using GLOSS to write XHTML this time, you can write,] pre !CDATA[p \[This text is part of an HTML paragraph. Let's test HTML's \[i\[italics\]\] and \[b\[bold\]\] mark-up elements.\] ] p [giving] pre !CDATA[

This text is part of an HTML paragraph. Let's test HTML's italics and bold mark-up elements.

] p [This feature is really like a combination of [tt[\{..\}]] and [tt[\[..\]]]. The above is equivalent to] pre !CDATA[p \{\[This text is part of an HTML paragraph. Let's test HTML's \]\{i\[italics\]\}\[ and \]\{b\[bold\]\}\[ mark-up elements.\]\} ] p [but a little clearer and easier to type.] } p [GLOSS is a highly configurable and extensible system. That means you can write your own code (rather like [q[macros]]) to deal with situations like matrices that occur many times over a group of documents to save even more typing. The whole idea of GLOSS is that your plain text is parsed by GLOSS in many different [q[modes]]; GLOSS will be in a different mode depending on the local context, and [q[macros]] will be context-dependent. So a new command in maths mode will not impact on what happens in text mode. What's more, you can have as many modes as you like.] p [I'm not going to explain how to write new modes here. That would be rather too technical for this article. However it is a feature that GLOSS's modes can be arranged into [q[modules]] and separate modules can be loaded according to needs. As well as the base XML module, there is a base XHTML module (using the commands [tt[gloss-html]] or [tt[gloss-xhtml]] instead of [tt[gloss-xml]]) and several optional extension modules for XHTML including ones supporting: sections, subsections, etc., and automatic section numbers; definitions, theorems, propositions, lemmas, and proofs; p-MathML; some convenient syntactic [q[extensions]] to p-MathML which GLOSS maps to standard p-MathML; automatic detection of whether maths should be [q[inline]] or [q[display]]; and several more.] para { p [For another example, consider] math mrow mfenced @open[\[] @close[\]] mtable mtr alpha beta mtr -1 nabla + A = mfenced @open[\[] @close[\]] mtable mtr x 45 mtr 3.14159E-2 w p [The p-MathML extension module knows all the standard MathML names for individual characters, such as [tt[beta]]. It also has names for all the single-letter alphabetical characters and can recognise numbers. It also has a default way to wrap each of these with the appropriate tag from [tt[mi]], [tt[mo]], [tt[mn]]. So using XHTML and p-MathML, the paragraph you are reading right now is encoded as] pre [p \[For another example, consider\] math mrow mfenced @open\[\\\[\] @close\[\\\]\] mtable mtr alpha beta mtr -1 nabla + A = mfenced @open\[\\\[\] @close\[\\\]\] mtable mtr x 45 mtr 3.14159E-2 w p \[The p-MathML extension module ... is encoded as\] pre \[p \\\[For another example, consider\\\] math mrow ... \] ] } p [Note also the use of the [tt[math]] command to enter maths mode and insert the MathML [tt[math]] element.] p [I have discovered that, with careful use of the standard XHTML tags, the HTML [tt[class]] attribute, some of GLOSS's HTML extension modules and CSS style-sheets, it turns out that standards-compliant XHTML can be used as an excellent format for shorter mathematics papers. That is how I have typed this paper for example, as well as all of my first-year real analysis pages at [uri[http://web.mat.bham.ac.uk/R.W.Kaye/seqser/]]. (GLOSS sources for all these pages are available from the web-site.) For longer papers or books there are many other XML formats available to choose from. These include DOCBOOK, TEI, OMDoc—all with XSLT style-sheets to transform to HTML or paper-based formats. Gloss can of course be used to write sources for any of these. Or you devise your own format (based on HTML for exmaple) and tailored to your particular application, as I did for a book I am currently working on—also written using GLOSS.] };section section { title [Serving the document] p [Once you have a beautiful XHTML+MathML document you should be able to view it locally (with Firefox, say). It is also a good idea to [i[validate]] your document. This involves running a standard XML tool that makes some basic checks against a document-type definiton (DTD). The DTD contains basic structural details of the format such as: your root [tt[html]] element should have only two children [tt[head]] and [tt[body]]; you are not taking the square root ([tt[msqrt]]) of an HTML anchor; and so on. GLOSS's html modules automatically include references to the correct DTDs, and the GLOSS distribution also contains a simple validator: you may already have a better one on your system. When the document is fully checked and ready, it is time to put this on your web server for others to read.] p [This turned out to be slightly non-trivial on my system. You may need to check and change the way your web server is set up: ask your web-master to make changes or make changes in your [tt[.htaccess]] file. XHTML pages with embedded MathML should be served as mime-type [tt[application/xhtml+xml]] and ordinary HTML should be served as [tt[text/html]]. I use the file-extension [q[xhtml]] for the former to distinguish them from the latter, though there doesn't seem to be any consensus on this. Also, to ensure that the maximum number of people (and search engines) can read your pages, a technique known as content-negotiation is useful. This is rather easy to set up in Apache, but requires you to get out of the habit of including the suffix [tt[.html]] or [tt[.xhtml]] in your web links. See the references below for more on content-negotiation, and the [q[installation]] notes in my Sequences and Series web pages for an example.] }; section section { title [Further topics and references] p [I have only touched on the basics of GLOSS for HTML and p-MathML here in this article. In particular I haven't said anything about how to define semantic content of maths expressions (content-MathML or OpenMath) or how to define other transformations either in GLOSS itself or in XSLT, or in using some other program. These are important topics sadly outside the scope of this article.] p [One of the design decisions that influenced GLOSS is that it is intended for authors with some basic knowledge of both XML in general and the target XML application they are writing for. There is a somewhat steep learning curve at the beginning, and there are a number of pitfalls for the beginner, but once the system is well-understood productivity should be as good or better than with LaTeX. (It certainly has been for me!) There are many freely available web pages and other resources to help a beginner. Some of the ones I found helpful are also listed here.] ul { li [[uri[http://web.mat.bham.ac.uk/R.W.Kaye/gloss/]], the main GLOSS web-pages, including all documentation, and downloadable sources and compiled files for any platform. (It is hoped that these pages will migrate to somewhere more memorable soon, to [uri[http://gloss.bham.ac.uk]] perhaps. If so, the first page will remain operational as long as possible, and will contain links to the [q[real]] home page.)] li [[uri[http://www.w3.org/Math]], the W3C's MathML pages. In particular the page [uri[http://www.w3.org/TR/MathML2/]] contains the specification for MathML 2.0. Chapters 1–3 make very good reading for the details of presentation MathML, which is what I have used. Other chapters cover content MathML, which may also be of interest.] li [[uri[http://www.w3.org/2003/entities/]], information on the standard names for characters used in XHTML and MathML (and GLOSS). Useful character tables are provided.] li [[uri[http://www.unicode.org]], the unicode consortium. With many more character tables, and lists of characters that can be referred to by number rather than name, provided you have the appropriate fonts on your system of course!] li [[uri[http://www.w3.org/2003/01/xhtml-mimetype/content-negotiation]], some information and advice from the W3C on content negotiation.] li [[uri[http://www.mozilla.org/projects/mathml/]], the Mozilla MathML page.] li [[uri[http://hutchinson.belmont.ma.us/tth/mml/]], TtM, a TeX to MathML translator.] } };section p [ [b [Richard Kaye]][br] [i [School of Mathematics]][br] [i [University of Birmingham] ][br] [tt [[a @href[http://web.mat.bham.ac.uk/R.W.Kaye/]]]]] } ; body