Creating Glossaries in DITA

Recently, I had to delve into how to create and display glossaries for DITA-based documents. One of the problems was that, while I could find information about how all of the individual pieces for a DITA glossary work, there was nothing I could find online describing how it was all put together. This article is what I was able to figure out. Don’t consider it comprehensive, as DITA (somewhat notoriously) often offers multiple ways to accomplish the same thing.

For this article, I needed a source that had a lot of glossary terms in it, and I found what I was looking for in the out-of-copyright title The Book of the Sailboat: How to Rig, Sail, and Handle Small Boats by A. Hyatt Verrill. It dates to 1916 and is available on Project Gutenberg. It contains many nautical terms and definitions, and I decided to convert it to DITA so that I would have a lot of glossentry topics to work with. I have posted a copy of my DITA conversion effort to GitHub so that you can look at and play around with the code yourself.

Who Needs a Glossary?

Not everybody needs a glossary for their technical publications. Glossaries are intended to provide a definition of a term or acronym. In my experience, technical documents aimed at end-users rarely have a need for a separate glossary section and can make do by providing a brief summary or explanation immediately beside any unfamiliar term. But for long-form technical documentation where the audience is some form of engineer, a medical professional, or any job that uses a lot of acronyms or obscure terms, the need for a glossary is more common. In the work I do for a semiconductor company, a lot of electrical, thermal, and board engineering terms come up, and having some form of standardized reference for terms is useful for these readers. I would argue that it is handy for technical writers, too, explaining unfamiliar terms in context.

While it is possible to simply link to external terms on a website–a common practice–in some cases, combining the glossary term with the actual document makes more sense. This ensures that there is always a means for a reader to look up a term, as a web-based reference can change, or the reader may not be online while reading the document.

There are several distinct parts to making a glossary work in DITA:

the glossary topic type (glossentry)
referencing these glossary topics in a map (glossref)
providing a link or display within the content to a specific glossary topic (abbreviated-form and term)
determining whether the glossary topics should be visible at output

Looking at the Glossentry Topic Type

DITA has its own specific topic type for holding glossary information: glossentry.

The DITA specification does a good job explaining the basics of how this topic type works. At its core, each glossentry topic holds a term (glossterm) and a definition (glossdef). There is also the means for including the acronym form (glossAcronym) of the term if one exists, the means to combine the definition with the acronym (glossSurfaceForm), and the ability to add related-links to related terms.

Here’s an example of an electrical engineering term that uses most of these elements:

<glossterm>Peripheral Component Interconnect Express</glossterm>

<glossdef>PCI Express is a high-speed serial computer bus standard, and is the common motherboard interface used for graphics cards and other components.</glossdef>

<glossSurfaceForm>PCI Express (PCIe)</glossSurfaceForm>

</glossAlt>

</glossBody>

</glossentry>

Here’s another glossentry topic, stripped down to just a nautical term and its definition:

<glossentry id=”avast”>

<glossterm>Avast</glossterm>

<glossdef>An order to stop or discontinue anything.</glossdef>

</glossentry>

Placing glossentry Topics in a map

To add a glossentry topic to your document, use the glossref element to reference it within your bookmap or map. glossref works exactly like topicref: it points to the glossary topic. A required element is to add a key to for designating each glossenrty topic, which becomes useful when referencing it within the body content of the document (more on this in the next section). Here’s an example glossref:

<glossref keys=”avast” href=”avast.dita”>

If there are just a few glossentry terms in your document, you can just add these directly to your bookmap, typically at the end of the document.

If you have a lot of glossentry topics and they are used between separate documents, it makes sense to organize them within a map, which is referenced within the bookmap via a mapref. Here’s an example of a map holding several nautical glossentry topics:

<map id=”glossary-map”>

<glossref keys=”avast” href=”avast.dita”>

<glossref keys=”bight” href=”bight.dita”>

<glossref keys=”furl” href=”furl.dita”>

<glossref keys=”reef” href=”reef.dita”>

<glossref keys=”yaw” href=”yaw.dita”>

</map>

If this is left as-is, the glossary topics will not be visible when outputted. There are circumstances where you might want the glossary output to not be visible, such as when you want the definition to appear or be available in the body text. In the case of The Book of the Sailboat, many more nautical terms are referenced in the glossary than are used in the book, so the author clearly intended them to be visible at the end of the book for readers as a reference. To make them visible at output, the print attribute needs to be set to “yes”, as the following example shows:

<map id=”glossary-map”>

<glossref keys=”avast” print=”yes” href=”avast.dita”>

<glossref keys=”bight” print=”yes” href=”bight.dita”>

<glossref keys=”furl” print=”yes” href=”furl.dita”>

<glossref keys=”reef” print=”yes” href=”reef.dita”>

<glossref keys=”yaw” print=”yes” href=”yaw.dita”>

</map>

Referencing Glossentry Topics within the Body of the Document

While you could output all of the glossentry topics at the end of the document and leave it for the reader to find and skim through them, DITA includes tags enabling you to reference a glossentry topic within the body content of the document so that readers can look up the definition of a term in context. Two DITA tags enable this lookup function: abbreviated-form and term.

Using the abbreviated-form Element with Glossentry Topics

The abbreviated-form element uses a keyref to point to the glossentry topic. It is also designed to be an empty tag, so it replaces the term that you are referencing in the body content of your document. Here’s an example of how it is used:

Use the <abbreviated-form keyref=”pcie”/> to attach the graphics card to the motherboard of the computer.

Using the default DITA-OT for producing output, for either PDF or HTML output you get whatever was entered as the glossSurfaceForm (term and acronym) as an italicized link that points to the glossary topic entry.

If the user hovers over the link with their cursor in the HTML output, a popup is displayed with the glossary definition:

Within a PDF output, you just get the link without any popup ever appearing.

Using the term Element with Glossentry Topics

The term element provides access to additional information available in the glossentry topic that is referenced. It is designed to surround the term that you are referencing in the body content of your document. Here’s an example of how it is used:

Always give an <term keyref=”anchor”>anchor</term> plenty of line or <term keyref=”scope”>scope</term> as it is called.

Using the default DITA-OT for producing output, for either PDF or HTML output you get an italicized link which points to the glossary topic entry.

And with the HTML output, you also get a popup appearing with the term definition when a user hovers over the link:

Glossary Topics at Output

Setting the glossref-s to print=”yes” ensures that the individual glossentry topics will be visible in PDF output, with the result from the default DITA-OT looking like this:

They will not appear as part of table of contents when outputted to HTML from the DITA-OT, though the individual glossentry topics will still be there and available in the body content when a user wants to click on them.

If you set the glossref-s to print=”no”, the default DITA-OT behavior is different for the PDF output: not only do the individual glossentry topics not appear (which is to be expected), but the terms in the body content no longer link to their respective glossentry topics. If you are using abbreviated-form, the word or phrase being referenced will not appear, leaving the reader with a sentence without a subject.

Interestingly enough, setting the glossref-s to print=”no” does not have the same effect on term elements when outputting to HTML using the DITA-OT, with the links and their targets appearing in the same way as when print=”yes”.

As you have seen there are a lot of interconnected parts in making a working glossary in DITA. To get a handle on how it all works, I recommend looking at the sample code I have put together on GitHub to see how all of the pieces work together.