What Size Should a DITA Topic Be?

DITA Bird with Topics of Various Sizes
When I was first investigating how DITA XML was being used by others at conferences in the mid-2000s, one of the questions that always came up at the end of a session was: “how big should a DITA topic be?” Even at the time I thought that this was kind of a silly question, since it was clear to me by that point that a topic’s size would vary based on the type of content it contained, whether it was designed to contain reusable content or not, etc. And yet I knew why people were asking this question, because they were looking for guidance on how long a topic ought to be.

Most of the Technical Writers I have met are professionals who take pride in their work, and they realize that writing DITA topics requires a different, non-narrative thought process in order to work properly. Even for those writers who simply try to copy ‘n’ paste legacy content into one of the three* major topic types quickly run into problems the first time they encounter a poorly-structured procedure that cannot be easily shoehorned into a standard task topic. Another factor which is new to many technical writers starting out with DITA is the concept of writing for reuse, which also naturally brings up questions of size and context. So this is a natural question for new writers to DITA XML to think of, before further time and experience shows that there is no “ideal” topic length to shoot for and that it all depends on context.

But that’s not to say that having some sort of guideline might not be useful, and so even after using DITA for a couple of years this question still lingered in my head. Then I realized that I had the tools at hand to sort out what the average topic size was, as our topics were being stored in a Content Management System (CMS) that allowed me to do sorts by topic type and gave me file sizes that I could easily convert to equivalent word counts. So I set up searches in the CMS to return the results of the minimum, maximum and averages for each topic type. Here’s what I got back:

DITA Topic Size Graph
DITA Topic Size Graph

In what I hope is a surprise to no one, on average reference topics were the largest of the topic types. They were typically twice the size of an average task, and about three times the size of an average concept. In my experience reference topics tend to have more table-based data than the other topic types, which adds up to a significant difference in size.

For those who can’t make out the numbers easily in the chart, here they are in a table:

 Average Size (Kb)
Maximum Size (Kb)
Minimum Size (Kb)
# of Topics

So what do these numbers in kb (kilobytes) mean? When I compared the average file sizes to a text file containing lorem ipsum text in MS Word, I found that an average topic ran to about 2.6 pages of text. So that makes an average task topic run to about a single page of lorem ipsum text in MS Word, and a typical concept topic would be just over three-quarters of a page.

Now you have an idea as to what the average size for the topics that were created in the CMS. Note that the range in sizes is large; this had to be plotted on a logarithmic scale because the largest reference topic was a massive 100+ pages of highly-specific table information of voltage values for an ASIC (whose individual sections had next-to-zero reuse). It is also worth noting that the documents were a mix of semiconductor-related engineering and end-user product material, so the results may be skewed more to the reference end of things but the averages are still of interest.

Your results will definitely vary from mine, but from what I hear these results match what I have heard informally from other technical writers and managers who have worked with DITA extensively. In a nutshell: expect your references to be twice as long as your tasks and three times as long as your concepts, all things being equal.

(If you have your own numbers on the relative sizes of DITA topics, please add them to the comments below!)

* I’m still in denial about the fourth glossary topic type. Not for any technical reason; it’s just that I like the neat symmetry of the original three major topic types. In classes I tend to say “Concept, Reference, Task and sometimes Glossary”. đŸ˜‰


"DITAWriter" is Keith Schengili-Roberts. I work for IXIASOFT as a DITA Specialist/Information Architect. And I like to write about DITA and the technical writing community. To get ahold of me you can email me at: keith@ditawriter.com.

View all posts by

2 thoughts on “What Size Should a DITA Topic Be?

Comments are closed.