Don Day and Michael Priestly on the Beginnings of DITA: Part 2

Don Day and Michael Priestley

Continuing from the previous article in this series, the interview with Don Day and Michael Priestley continues as we delve further into how a proto-DITA was developed within IBM, and the process that led to it being released to the open standards community via OASIS.

DITAWriter: Could you tell me about the path that would eventually lead to the creation of DITA? I understand that IBM has been using structured authoring for some time, so what were you using and what were the reasons for wanting to move to the topic-based architecture that would eventually become DITA?

Don Day: With the advent of XML as a new markup standard in 1998, the Customer and Service Information (C&SI) group began adopting a Tools and Technology mantra under Dave Schell who was the strategy lead. By 1999, Dave was aware of my participation as IBM’s primary representative with the XSLT and CSS standards activities at the World Wide Web Consortium, and I delivered a presentation at a formative meeting in California that forecast the possibility of XML to solve IBM’s still-lingering problems with variant tools and markup usage.

As that year progressed, Dave involved me with planning a more principled document strategy and how XML could be used to relieve the pain points. We proposed an internal XML Workgroup to which we invited both a tools and UX person from each of IBM’s main sites and subsidiaries. This was our way of vetting the ideas in a more open way. We were all new to XML and its technologies, so the workgroup would first use XML to pilot a subset of our existing IBMIDDoc design in order to become familiar with the XML tools. It also would give us insights into the benefits and downsides of the only other similar effort at the time: DocBook’s very literal adaptation of its original SGML design into an XML equivalent.

It seems obvious today that a book-oriented approach could have been avoided if we were truly forward-looking, but ingrained attitudes needed proof that a major change was needed. Book-based markup tended to enshrine the design principles of the day, and were unable to take advantage of changes in industry and of new ideas about information modeling and user experience with content. Our Phase 1 report in January 2000 quantified the relative ease of converting an existing design in place from SGML to XML, but also proved to us that the resulting architecture could not gracefully support the more page-oriented nature of Web content. To make the best use of XML’s flexibility while working with page-based units, we needed a topic-based architecture.

For this next round, I split the workgroup into tiger teams who would work independently on their focus area. Michael led the “topic architecture” subgroup, which I participated in as the XML/XSL technology advisor (while keeping an eye on his subversive ideas!)

Michael Priestley: We started work on DITA in early 2000. At the time, IBM was already moving from a book model to a topic model. The three core types (concept, task, reference) were taken straight from IBM’s corporate standards. If you look at the IBM guide for developing technical content (Developing Quality Technical Information: A Handbook for Writers and Editors) from 1998, a lot of the components of that new model were already defined, including task orientation, separation of conceptual content from task content, minimalism, etc. If you read the edition right before DITA came out, you find sections that sound very DITA-like today, with descriptions of task orientation and chunking, separating out conceptual content, organization schemes for guidance vs reference and so on. The three core types were considered to be the bare minimum as to what we needed: the lowest common denominator for information typing. By 2003, when the next edition of DQTI came out, they were the backbone of our whole writing approach.

One more thing I’ll add: when DITA first got released internally, there was a strong push back to limit its applicability to web content only, and preserve IBMIDDoc as the domain for book-oriented publications. That was never the intent of DITA, at least from my perspective—information typing and topic orientation are best practices for books that predate the web. So I don’t think it was a transition from book model to web model so much as it was chapter model to topic model.

It took a few years to overcome internal resistance to the topic model. I remember an IBM Information Management team won an award at the STC one year for one of the first DITA-based publications—and it was a book. I referenced that a lot in the early days, along with examples like John Carroll’s case study of Minimalism applied to a printed user guide for IBM Smalltalk.

Don Day: By late 2000, the workgroup concluded, and delivered a working demo of the new architecture and processing system.

At this point we aggressively applied specialization to the myriad of internal content models until we derived a basic generic base that could be specialized as needed back into whatever semantic structures were actually needed. We learned that we could apply bottom-up specialization for the content shared across top-down topic types; this became domain specialization. We also discovered that we could apply specialization to the maps that grouped topics for particular purposes, which we called “design patterns for information architecture.” Specialization therefore became a universal component throughout the DITA design, apart from a few elements required for CMS-like control from outside of the information architecture.

Then we had to come up with a name. This was by no means easy, but “DITA” fully represented a great deal of messaging in a compact and memorable acronym:

  1. Darwin: for specialization and how things could “evolve” from a base
  2. Information Typing: for representation of knowledge as typed units
  3. Architecture: a statement that this was not just a monolithic design but an extensible tool that could support many uses.

The person who came up with the name was the late Gretchen Hargis, who was a co-author of Developing Quality Technical Information, mentioned earlier by Michael. On a humorous side note, we had also briefly considered immortalizing Gregor Mendel’s genetics concepts as the M in MITA, but since that name was then a popular copier brand, we quickly moved on.

Now we needed to test and iterate on the design so that we could make it public. John Hunt and I announced the architecture at the March 2001 WinWriters Conference (now the WritersUA Conference) in San Jose, while simultaneously publishing the seminal architecture papers on IBM’s developerWorks site.

Dave Schell worked with his team on a progress report published in August 2001 (Status and Direction of XML in Information development in IBM: DITA) that summarized many of its benefits for the company. This document is the key primary record of the team’s experience in developing DITA and understanding the importance of the lessons learned, which were all conveyed to the eventual OASIS effort.

We then signed up several brave internal organizations to test the design for actual customer deliverables. This internal testing enabled us to craft a well-tested reference design that was eventually submitted to OASIS for standardization in 2004.

DITAWriter: What was the reason for donating the standard to OASIS?

Michael Priestley: We made the case that we had more to gain from a shareable content model across organizations and companies than we had to gain from a proprietary one with more limited interchange scope.

We use DITA today to share content with business partners, both suppliers of components of IBM solutions and vice versa. We did a limited amount of that with IBMIDDoc back in the day but it was a lot easier to convince other companies to invest in that interchange support when it became an open standard, and not just an IBM thing.

I also remember Dave Schell saying that part of the rationale was, to put it bluntly, that it would make it easier to integrate content coming into IBM from acquisitions. And it is true that these days there’s a decent chance that a new IBM acquisition is already using DITA, and when they aren’t, there are a lot of competitive solutions for migrating their content.

Don Day: The internal demo showed that applying a new syntax to legacy designs would be a dead end. In particular, the old model of doing documentation—regardless of whether it was Sun, HP, or IBM—tended to enshrine each company’s own best practices, which meant that vendor solutions were always custom solutions that came at great cost for development and maintenance. By committing to offer DITA to the open standards process, the hope was to get out of the business of custom tools support and open the market so that vendors would be encouraged to build best-of-breed solutions based purely on the popularity of the standard. If the drive to standardize DITA had fizzled, all of the expected messaging and benefits would have been lost.

Keep in mind that the XML/Java milieu came at the same time as open/closed software wars that IBM and other companies were having with Microsoft. There was a predisposition at IBM to look for ways to capitalize on open systems, and IBM helped found the SGML Open standards organization in 1993 that would eventually become OASIS. So the timing and the business politics were already favorable for making an open standards play.

We sweated bullets positioning DITA as a royalty-free, open standard donation. Although it took a full year of tedious justifications and paperwork to get the IBM bean-counters and lawyers to agree to that plan, we secured permission and began working with OASIS. The first meeting for the OASIS DITA Technical Committee was convened, and it was held May 4, 2004. One of the first public activities of this Committee was to hold a series of recorded DITA Deep Dive sessions to lay out the principles and uses of the architecture.

Meanwhile, we were cleaning up the internal XSLT-based processing tools to make them ready as a reference implementation in time for the expected initial OASIS approval of DITA 1.0. This also involved justifying the release of IBM-developed code to open source, and in early 2005 we were able to place this set of code, which we called DITA Open Toolkit, into a project on Sourceforge, just in time for supporting the new standard.

The remaining history of DITA is public record. One of the best legacy resources is at http://xml.coverpages.org/dita.html, which contains an archive of published resources representing the first public years of DITA’s availability.

I would say that not only IBM but many other companies are now enjoying the ecosystem of tools and services that now enable graceful DITA deployment at a range of competitive function and price points, with ample room for both more potential adoption scenarios and tools for those needs. And DITA has hardly peaked!

Thanks again to Don Day and Michael Priestley for agreeing to this interview.

About

"DITAWriter" is Keith Schengili-Roberts. I work for AMD as a Senior Manager for Technical Documentation, and have recently helped usher in a new company-wide DITA-based CCMS. And I like to write about DITA and the technical writing community. To get ahold of me you can email me at: keith@ditawriter.com.

View all posts by