Book Excerpt: DITA and Other Structured XML formats

Current Practices and Trends in Technical and Professional Communication - Book Cover

Current Practices and Trends in Technical and Professional Communication was published earlier this week in the U.K. by ISTC. Edited by Professor Stephen Crabbe, it is a book that surveys the current state of technical authoring. I was asked to contribute a chapter to the publication, which focused on DITA. My chapter is called “The development of DITA XML and the need for effective content reuse” which talks what DITA is, how it was developed, who uses it and why, the various tools associated with it, and where it will likely develop in the near future. More info about the book and the subjects it covers can be found after the excerpt.

The following excerpt is derived from my chapter, and compares DITA to the other structured XML-based authoring formats that exist:

DITA is not the only option available when it comes to technical writing: DocBook, S1000D and XML formats developed in-house within a company are other viable options. Each of these have their particular niche in the market, but DITA is making an impact on all of them and it has become the most popular of the XML formats used by technical documentation groups.

A long-standing direct competitor to DITA is the DocBook format. Like DITA it is an open specification sanctioned by OASIS, with version 1.0 issued publicly back in November 2002, though it was originally developed by HaL Computer Systems and the publisher O’Reilly & Associates back in 1991. From there it was developed further by several computer-related companies, including Novell, DEC, Hewlett Packard and Sun Microsystems for use in creating documentation. It also strives to provide a single-source format for publishing content to multiple formats, including HTML, HTML Help, Unix man pages and PDF. Like DITA, DocBook is a highly structured way to write content, but unlike DITA its focus is at the booklevel or article-level rather than on individual topics. This means that content is written as a narrative rather than as discrete, standalone, reusable topics designed to be used anywhere within a given document. DocBook is also designed more for static, unchanging, monolithic content and not around the concept of content reuse which is central to DITA. DocBook superficially shares some structural elements that are similar to DITA at the block level, but has no equivalents to mechanisms such as content references (conrefs) or keys – two of the fundamental reuse mechanisms used within DITA. Several of the firms who contributed to the original development effort for DocBook are now using DITA, including Hewlett Packard, Oracle (which bought Sun Microsystems) and Micro Focus (which acquired Novell). The DocBook specification continues to be developed, with the most recent version ratified in late November 2016. The latest version acknowledges the influence of DITA by including the concept of ‘assemblies’, which are topic-like constructs that can be used within DocBook, but otherwise has no content reuse mechanisms aimed at a more granular level.

The S1000D technical documentation specification was originally developed within the aerospace sector over 20 years ago, and it is still widely used within that sector, as well as in the defense, ship industry and construction sectors. Like DocBook and DITA, S1000D is an open standard, in this case governed by the Technical Publications Specification Management Group. Unlike DocBook, S1000D does include a mechanism for the reuse of content, known as data modules. These data modules can contain text and/or graphic content, and can be ‘plugged in’ where needed within any S1000D document. There are a number of data module types, roughly analogous to the DITA topic types, and include information that is specific for creating checklists, service bulletins, front matter, parts data, wiring data, learning modules, procedures, faults, information for the crew/operator and more. As you can see from this short list, many of the data modules were originally tailored for specific purposes within the aerospace sector which would not apply in more general circumstances. Each data module comes with a unique identifier, called the Data Module Code, which is designed in part as a mechanism for ensuring that the same module do not appear more than once within a single document. This points to one of the key differences between DITA and S1000D, which is the granularity of the level of reuse. While S1000D encourages reuse at the data module level (roughly equivalent to a topic within DITA), it does not have mechanisms for intra-data module reuse. The specificity of some of its module types to the aerospace and related industries limits the appeal for its adoption outside of these sectors. Many aerospace firms are now using DITA along with S1000D, though for different documentation sets. There has been at least one concerted attempt to incorporate DITA content within S1000D data modules, but the proposal was not accepted by the Technical Publications Specification Management Group.

There are also companies that have created their own proprietary XML formats for creating technical documentation. Information on these proprietary formats are hard to find, but it appears that many of these were started prior to the advent of DITA, and some share common roots with SGML. It should be remembered that what would one day become DITA originated as an internal documentation standard devised for use within IBM. Unlike proprietary documentation standards, DITA is an open standard, which has led to the development of supporting tools from both commercial developers and the open source community. As a result, there is broad tool support for DITA, whereas proprietary XML formats often require a significant and continual investment in internal tool development to support. It also means that as new publication formats become common (such as HTML5, ePub, etc.) internal development needs to produce output formats to match. While going with proprietary XML format may have made sense at the time, a company will need to assess whether ongoing development efforts to support it outweigh the cost and benefits of adopting an open documentation standard that comes with commercially available tools.

One of the chief differentiators of DITA when compared to the other documentation standards available is the ability to reuse content at both granular (i.e. word, phrase, sentence, topic) and topic/chapter levels. From a practical perspective, it is these multiple stages of reuse that come into play into making DITA a popular standard, making possible the additional advantages of consistent messaging, lower localization costs, and greater efficiencies as writers reuse existing content instead of having to recreate it.

In a relatively short amount of time DITA has become the most popular XML format for creating technical documentation. In a multi-year survey of technical writer jobs posted to Indeed.com, technical writer positions that asked for DITA experience have outpaced those seeking experience with the other XML documentation formats.

Technical Writer Job Postings and Specific XML Standards Referenced on Indeed.com (Updated to July 2017)

Technical writer job postings in the United States seeking those with experience with DITA far outstrips that for the other two XML-based documentation formats. And while technical writer job postings that seek S1000D experience continues to grow, at any given time over the past four years there are roughly 3.5–4 times the number of equivalent postings looking for DITA experience. Technical writer positions where DocBook experience is sought is essentially flat, with some months having the number of such job listings across the U.S. equal to that which you could count on one hand. My advice to any aspiring technical writer these days looking to work with structured content is to learn DITA over the other competing formats.

More Information About the Book
The book is now available from Amazon.com. (If you buy the book from using this link, it will help defray the costs for running this website). One of the other chapters in this book is written by my IXIASOFT colleague Nolwenn Kerzreho, where she talks about her experience and rationale for teaching DITA to technical writing students in Europe.

For more information on the other chapters in the book and their authors, here is the book’s table of contents:

Part 1: Writing for Technical Communication

Global authoring: writing for a global audience, by Lorcan Ryan
The case for ASD-STE100 Simplified Technical English, by Mike Unwalla and Ciaran Dodd
Why do we take the people out of our writing?, by Kirstie Edwards
A change in tone, by Ellis Pratt
“No manuals” – writing user interface copy, by Andy Healey
Technical communication and accessibility, by Klaus Schubert and Franziska Heidrich

Part 2: Resources for Technical Communicators

Scalable video production for technical communicators, by Jody Byrne
Managing digital complexity in technical communication, by Marie Girard and Patricia Minacori
Writing good API documentation: an expert’s guide, by a complete beginner, by Neal Goldsmith
The development of DITA XML and the need for effective content reuse, by Keith Schengili-Roberts
Automatic documentation for software, by Andrew McFarland Campbell

Part 3: Roles of Technical Communicators

Trends in technical communication in Ireland, by Yvonne Cleary
The collaborative effects of cyberspace, by David Bird
Creating effective, timely, and valuable documentation reviews using a risk management framework, by Annette Wierstra and Joe Sellman
Training technical communication students in structured content using DITA, by Nolwenn Kerzreho
Are AI writers capable of work in the current workplace?, by Jason Lawrence and Chelsea Green

Book Excerpt: DITA and Other Structured XML formats

Related

About ditawriter

Share this:

Related

About ditawriter