html
head
title [How (and why) I use GLOSS to write XHTML+MathML]
author [Richard Kaye, School of Mathematics, University of Birmingham]
contributor ~http://web.mat.bham.ac.uk/R.W.Kaye/
date [2006-05-24]
keywords [MathML XML XHTML mathematics publishing GLOSS]
body {
p @style[text-align:center]
[[i[[q[A discerning friend of mine,]] said Don Quixote, [q[was of opinion that no
one ought to waste labour in glossing verses; and the reason he gave was
that the gloss can never come up to the text, and that often or most
frequently it wanders away from the meaning and purpose aimed at in the
glossed lines; and besides, that the laws of the gloss were too strict,
as they did not allow interrogations, nor [q[said he,]] nor [q[I say,]] nor
turning verbs into nouns, or altering the construction, not to speak of
other restrictions and limitations that fetter gloss-writers, as you no
doubt know.]]]]]
p @style[text-align:right]
[from Don Quixote,[br] by Miguel de Cervantes,[br] Translated by John Ormsby]
section {
title[XML]
para{
p [XML is a general format for exchange of information
between computer systems. It was originally devised as
a [q[light]] version of SGML intended to present complex
structured data containing the meaning or other information
suggesting possible rendition of each individual part.
Thus the presentation-MathML (p-MathML) code for
[math {mrow a + 3} = beta],]
pre !CDATA[
This text is part of an HTML paragraph. Let's test HTML's italics and bold mark-up elements.
] p [This feature is really like a combination of [tt[\{..\}]] and [tt[\[..\]]]. The above is equivalent to] pre !CDATA[p \{\[This text is part of an HTML paragraph. Let's test HTML's \]\{i\[italics\]\}\[ and \]\{b\[bold\]\}\[ mark-up elements.\]\} ] p [but a little clearer and easier to type.] } p [GLOSS is a highly configurable and extensible system. That means you can write your own code (rather like [q[macros]]) to deal with situations like matrices that occur many times over a group of documents to save even more typing. The whole idea of GLOSS is that your plain text is parsed by GLOSS in many different [q[modes]]; GLOSS will be in a different mode depending on the local context, and [q[macros]] will be context-dependent. So a new command in maths mode will not impact on what happens in text mode. What's more, you can have as many modes as you like.] p [I'm not going to explain how to write new modes here. That would be rather too technical for this article. However it is a feature that GLOSS's modes can be arranged into [q[modules]] and separate modules can be loaded according to needs. As well as the base XML module, there is a base XHTML module (using the commands [tt[gloss-html]] or [tt[gloss-xhtml]] instead of [tt[gloss-xml]]) and several optional extension modules for XHTML including ones supporting: sections, subsections, etc., and automatic section numbers; definitions, theorems, propositions, lemmas, and proofs; p-MathML; some convenient syntactic [q[extensions]] to p-MathML which GLOSS maps to standard p-MathML; automatic detection of whether maths should be [q[inline]] or [q[display]]; and several more.] para { p [For another example, consider] math mrow mfenced @open[\[] @close[\]] mtable mtr alpha beta mtr -1 nabla + A = mfenced @open[\[] @close[\]] mtable mtr x 45 mtr 3.14159E-2 w p [The p-MathML extension module knows all the standard MathML names for individual characters, such as [tt[beta]]. It also has names for all the single-letter alphabetical characters and can recognise numbers. It also has a default way to wrap each of these with the appropriate tag from [tt[mi]], [tt[mo]], [tt[mn]]. So using XHTML and p-MathML, the paragraph you are reading right now is encoded as] pre [p \[For another example, consider\] math mrow mfenced @open\[\\\[\] @close\[\\\]\] mtable mtr alpha beta mtr -1 nabla + A = mfenced @open\[\\\[\] @close\[\\\]\] mtable mtr x 45 mtr 3.14159E-2 w p \[The p-MathML extension module ... is encoded as\] pre \[p \\\[For another example, consider\\\] math mrow ... \] ] } p [Note also the use of the [tt[math]] command to enter maths mode and insert the MathML [tt[math]] element.] p [I have discovered that, with careful use of the standard XHTML tags, the HTML [tt[class]] attribute, some of GLOSS's HTML extension modules and CSS style-sheets, it turns out that standards-compliant XHTML can be used as an excellent format for shorter mathematics papers. That is how I have typed this paper for example, as well as all of my first-year real analysis pages at [uri[http://web.mat.bham.ac.uk/R.W.Kaye/seqser/]]. (GLOSS sources for all these pages are available from the web-site.) For longer papers or books there are many other XML formats available to choose from. These include DOCBOOK, TEI, OMDoc—all with XSLT style-sheets to transform to HTML or paper-based formats. Gloss can of course be used to write sources for any of these. Or you devise your own format (based on HTML for exmaple) and tailored to your particular application, as I did for a book I am currently working on—also written using GLOSS.] };section section { title [Serving the document] p [Once you have a beautiful XHTML+MathML document you should be able to view it locally (with Firefox, say). It is also a good idea to [i[validate]] your document. This involves running a standard XML tool that makes some basic checks against a document-type definiton (DTD). The DTD contains basic structural details of the format such as: your root [tt[html]] element should have only two children [tt[head]] and [tt[body]]; you are not taking the square root ([tt[msqrt]]) of an HTML anchor; and so on. GLOSS's html modules automatically include references to the correct DTDs, and the GLOSS distribution also contains a simple validator: you may already have a better one on your system. When the document is fully checked and ready, it is time to put this on your web server for others to read.] p [This turned out to be slightly non-trivial on my system. You may need to check and change the way your web server is set up: ask your web-master to make changes or make changes in your [tt[.htaccess]] file. XHTML pages with embedded MathML should be served as mime-type [tt[application/xhtml+xml]] and ordinary HTML should be served as [tt[text/html]]. I use the file-extension [q[xhtml]] for the former to distinguish them from the latter, though there doesn't seem to be any consensus on this. Also, to ensure that the maximum number of people (and search engines) can read your pages, a technique known as content-negotiation is useful. This is rather easy to set up in Apache, but requires you to get out of the habit of including the suffix [tt[.html]] or [tt[.xhtml]] in your web links. See the references below for more on content-negotiation, and the [q[installation]] notes in my Sequences and Series web pages for an example.] }; section section { title [Further topics and references] p [I have only touched on the basics of GLOSS for HTML and p-MathML here in this article. In particular I haven't said anything about how to define semantic content of maths expressions (content-MathML or OpenMath) or how to define other transformations either in GLOSS itself or in XSLT, or in using some other program. These are important topics sadly outside the scope of this article.] p [One of the design decisions that influenced GLOSS is that it is intended for authors with some basic knowledge of both XML in general and the target XML application they are writing for. There is a somewhat steep learning curve at the beginning, and there are a number of pitfalls for the beginner, but once the system is well-understood productivity should be as good or better than with LaTeX. (It certainly has been for me!) There are many freely available web pages and other resources to help a beginner. Some of the ones I found helpful are also listed here.] ul { li [[uri[http://web.mat.bham.ac.uk/R.W.Kaye/gloss/]], the main GLOSS web-pages, including all documentation, and downloadable sources and compiled files for any platform. (It is hoped that these pages will migrate to somewhere more memorable soon, to [uri[http://gloss.bham.ac.uk]] perhaps. If so, the first page will remain operational as long as possible, and will contain links to the [q[real]] home page.)] li [[uri[http://www.w3.org/Math]], the W3C's MathML pages. In particular the page [uri[http://www.w3.org/TR/MathML2/]] contains the specification for MathML 2.0. Chapters 1–3 make very good reading for the details of presentation MathML, which is what I have used. Other chapters cover content MathML, which may also be of interest.] li [[uri[http://www.w3.org/2003/entities/]], information on the standard names for characters used in XHTML and MathML (and GLOSS). Useful character tables are provided.] li [[uri[http://www.unicode.org]], the unicode consortium. With many more character tables, and lists of characters that can be referred to by number rather than name, provided you have the appropriate fonts on your system of course!] li [[uri[http://www.w3.org/2003/01/xhtml-mimetype/content-negotiation]], some information and advice from the W3C on content negotiation.] li [[uri[http://www.mozilla.org/projects/mathml/]], the Mozilla MathML page.] li [[uri[http://hutchinson.belmont.ma.us/tth/mml/]], TtM, a TeX to MathML translator.] } };section p [ [b [Richard Kaye]][br] [i [School of Mathematics]][br] [i [University of Birmingham] ][br] [tt [[a @href[http://web.mat.bham.ac.uk/R.W.Kaye/]]]]] } ; body