There has been a long, long, long discussion on LinkedIn about XHTML vs. XML (as if they are different). I am a proponent of coherent, managed, vocabulary controlled XHTML as the only sensible option for publishers of all types of content.
This small dissertation is a response to and commentary on the ideas introduced in the following statement from an XML consultant:
It’s an interesting trail – particularly when it culminates with the question “where does this leave us in terms of XML workflow and 'single source' publishing?”
At the end of the day – if you first create XML and make all of your edits to that XML, you have single source publishing – it’s then just a question of using different tools to develop different types of stylesheets to render that XML for different outputs. Can these be rationalised? Ultimately XHTML is still about form, not purpose – so I still think that for everything you may possibly want to do with that content, XML is your better choice of base format. The only question then remains whether CSS can be used for both print and digital rendering – and whether it’s suitable for all types of print.
I can see that it may work well for highly stylised relatively low extent texts, but a traditional typesetting application like InDesign or 3B2 is probably better for high volume high extent processing of the STM nature.
Is form bad?
Saying "HTML is still about form..." is a significantly incorrect simplification. Firstly all content is absolutely about form (structure), the issue is how you get there. Form is required for humans to engage with content: print, digital, rich media or interactive. Content must ultimately be expressed with form as a presentation context.
XHTML by definition has predefined the following grouping of DTD controlled XML content structures:
- Metadata content
- Flow content
- Sectioning content
- Heading content
- Phrasing content
- Embedded content
- Interactive content
These content classes are the core, vital properties of any content, including rich media and advanced interactive content.Yes there is the web element, but that is part of the incredible flexibility.
Rather than relegating this as form and not purpose, the correct interpretation of XHTML is a highly analyzed, intellectual abstraction of content structure and foundation relationships. It accurately describes how content works at the foundation level. The core elements define NOT only possible form/presentation, but the BEHAVIOUR limits of any content within a content structure context.
It is difficult to get past the fact that all content has a backbone of sections, headers, blocks, paragraphs, lists and tables. The question is how many grammars do you need to describe this backbone.
A massive weakness of XML in contemporary,productive digital content scenarios is that the flow and phrasing properties can never be discovered until it is processed with presentation styling - and that always has to happen, and it is inevitably to XHTML. The only thing a DTD can do is define encapulation rules of elements within each other.
XHTML gives form/presentation, structure, required behaviour constraints and most importantly defines the primary function of a content element within a context.
XML is about purpose?
Next we address the "...not purpose" statement. Purpose is the decoration of content with elements that describe what things are. It's a noun thing. The common wisdom is that XML elements are nouns (the equivalent of programming classes), and attributes are qualifying or metadata in nature.
This is the type of common advice that can be seen on the creation of XML of excellence:
Use elements for data. Use attributes for information that is not relevant to the data. .... There are no rules about when to use attributes or when to use elements. Attributes are handy in HTML. In XML my advice is to avoid them. Use elements instead. http://www.w3schools.com/xml/xml_attributes.asp
Armed with this succinct advice we start creating a vocabulary for the elements in our new XML project. The element/attribute vocabulary is the controlled, defined and agreed grammar for a particular XML representation of content.
XML is basically a load of elements, and the definition of the element vocabulary is descriptive; dependent on documentation for maintenance; and requires correct interpretation and application to execute into workable outputs.
Core flow and sectioning content is relatively easy in XML. But even the lowly paragraph gets overworked between <p>, <par>, <para> and similar variants. A paragraph is structural, full-stop. Without form paragraphs die, all content dies and becomes a string of words. if purpose is the thing a paragraph element should be called something like <narrative-idea-segmentor>, because that is a paragraphs purpose... sometimes.
Look at something as simple as an extract block. A reasonably academic structure. It is a good example because it is inserted in the document flow, it has an element value applied that describes its purpose, but that also describes its behaviour in the flow and any visual presentation must be defined by formatting. The first block at the start of this document is an extract indicated by the indent and empty line above and below. Our block abstract the XML is going to do something like this.
<attribution>where it came from</attribution>
The corresponding XHTML is going to look something like this:
<div class="block-rw extract-rw">
<p class="attribution">where it came from</p>
It's not substantially different, except the XHTML example inherits the flow structures from the underpinning content structure abstraction and will instantly present accurately. The XML above MUST have an XSL, or be processed to interprete the document for presentation. Not one of the elements expresses behaviour within a greater set of content.
The advantage of XHTML is it gives purpose, behaviour in the flow, and the presentation foundation.
"But what if I want the attribution to be flowed inline" the cry goes up. The answer is you are not allowed to think about that when designing XML, you are only allowed to think purpose. Let's leave that one until later.
A real world example
Here is a real world example from a customers book. The XML is technically sound, but is an example of how the content presentation is obscure unless processed.
This is a fragment with highly specific XML elements to create currrent and future value for the content. It is highly particular to this customers books.
<block>The countable noun
means an article of women's clothing (it goes from the shoulders
to below the hips).
<example>This is the first time I've seen you wearing
<block>There is also an uncountable noun
used with the article
<pertinent>a/an</pertinent>). It means 'clothing',
'clothes'. It is not very common in modern English,
and is used
mostly to talk about special kinds of clothing (for example
This was interpreted into XHTML for interactive processing. the XML was provided but no further information. Decisions had to be made on what the elements did.Whether other items are flow or phrasing had to be worked out. Note the -rw is our extension for particularization of the class values. The layout presentation pattern has been kept more or less the same for both fragments.
<div class="body-rw GrammarEntry">
<p>The countable noun
means an article of women's clothing (it goes from the shoulders
to below the hips).
<span class="example">This is the first time I've seen you wearing
<span class="pertinent">a dress</span>
<p>There is also an uncountable noun
(not used with the article
). It means 'clothing', 'clothes'. It is not very common
in modern English,
and is used mostly to talk about special kinds of
clothing (for example
<span class="example">national dress</span>,
<span class="example">evening dress</span>,
The "purpose" grammar is identical, but the XHTML is underpinned with the default XHTML section, header, flow and phrasing (<span>) elements. The apparent advantages of XHTML are:
- Nothing is lost
- The application of the attributes is easy and interactive AND presents itself immediately as correct or incorrect.
- No processing is required to view the content in common tools.
- The "form" and "purpose" are both encapsulated in the XML.
- Grammatical maintenance is the same effort, one as a DTD that must enforce the building rules, the other as defined tagging patterns.
- Presentation can be easily defined to express function for correct tagging.
- The applied XHTML forms can be morphed into any presentation block type.
- It is trivial to process this back into the XML schema for any reason.
Single Source Publishing
Remarking on the statement "...you have single source publishing – it’s then just a question of using different tools to develop different types of stylesheets to render that XML for different outputs".
This is the nature of XML/XHTML. It needs stylesheets and some sort of processing. The real issue is the effort, timeframe and execution.
The "just a question..." statement is a bit of a throw-away - there is a lot of that with XML experts. This part of digital content ownership can become a massive undertaking in its own right and require revision of XML and other things. The outputs have to be available from the application of the first XML element.
The O'Reilly "XML First" slogan seems to have become an industry mindset. It will be all right on the night stuff. This XML First thing may apply to computer books with simple outlines, header generated indexes and an 18 month shelf life, but little else. Don't do it.
Well defined XHTML strategies mean stateful XML. It is instantly ready for all major output formats with minimum technology effort. Make a change, the outputs are instantly ready. For major outputs it requires only the direct application of CSS. For additional reuse it requires additional processing, generally relatively simple XSLT.
Output formats each have their specific complexities further compounded by commerical variances and requirements of distributors in the market. Add to this the fact that devices and presentation options are scaling at a speed XML DTDs cannot keep up with, even at great cost, the inevitable result is XML repositories cannot exploit their business potential instantly and cost-effectively.
The only question remaining...
"The only question then remains whether CSS can be used for both print and digital rendering – and whether it’s suitable for all types of print"
This question is answered by the state of technology. There is no answer if "all" types of print includes the InDesign medieval equivalent to a monk hand-writing a manuscript, because that is the current publishing workflow. To date the print strategy seems to be the InDesign monks pedestal. Here the issue is not XML/XHTML, its X-Y.
Academic content is relatively easy to template typeset using CSS with tools like PrinceXML, Antenna House and others. It takes knowledge of the X-Y paper world to develop the processing options, but the output can be identical in quality.
XSL-FO is something to stay away from, from a Total Cost of Ownership perspective.
Why use new technology to imitate the old and imitate the organ-grinder crank-handle bad parts of the old? The job is to get the monks away from their InDesign pedestals and into the XHTML moveable type printing shop. Yes, design is important, even vital, but one day soon, the users wont want the illuminated manuscript, they want the reflowing interactive thing and the content is not ready.
There are difference points and change strategies are required to make XML to paper work well. There is a finite set of problems to be addressed for paper, a finite set of problems to be addressed for Online and Offline content, but the problems are finite.
By working with XHTML rather than XML, the output problems are addressed as a priority, not a "we can do that later" issue.
Is Academic Content Complex?
Is academic content complex? To upset the apple-cart: not really! Academic content has some complexities, but they are largely descriptive (Bibliographies) and linking (indexes, notes, footnotes, etc.). Then of course there is the large table, MathML or other equation and formula stuff to address, but there are only a few tool options from which something must be choosen. This applies to print and digital presentation.
There are other complexities that can be encountered, but those mentioned are the big ones. It should be noted that semantic elements such as metadata extractors are not "purpose elements", they are explicit, controlled vocabulary statements for the enclosed content for metadata manipulation and extraction.
Academic content makes heavy use of lists. A list is form over function every time. Purpose is an encapsulating property. Lists are incredibly powerful in XHTML and list items can effectively be mini-documents in their own right. Academic content has lists of content, lists of figures, maps, illustrations, tables, notes, glossaries, chronologies, bibliographies and indexes, even multiple indexes. XML strategies prevent the enormous power of XHTML lists being integrated into digital content strategies.
Take the bibliographic list item as an example. Great pain has gone into defining the 14 or so types of bibliographic list styles in various XML strategies. Nothing can change the fact, that to make them useful in a book or online/e-book usable context, they must be presented with symmetry and in alphabetical order. They are list-items with phrasing elements to describe which text is a first or last name, etc. XML makes a straight-forward job hard.
This statement is from the person who designed DX-XML used to tag the Taylor and Francis corpus of 20,000 books. It survives more or less to this day. I understand the limitations and expense required to maintain large scale XML DTDs that must be used by 2,000 people and maintained and extended over time. It was good for the year 2000, but too much has changed to stay there.
What about this?
Moving the form/purpose argument up a notch, here is an example of our FoundationXHTML used for interactive Q&A content. In this case the tagging patterns must serve multiple purposes. These are stated after the elegant XHTML:
<div class="QAA-rw qa-association-rw">
<div class="block-rw abstract-rw">
<p>Match items in a list that are related. Can support more than one
<p class="question">Question content goes here. It can be any
valid XHTML including audio and video with fallback
images for print...</p>
<ol class="option-source shuffle">
<p class="fbm correct">Correct</p>
<p class="fbm wrong">Wrong</p>
<p class="fbm wrong-reinforcement">Type your positive reinforcement message for wrong here </p>
<p class="fbm correct-reward">Type your positive reward message for correct here </p>
<button class="check" type="button">Check</button>
<button class="try-again" type="button">Try Again</button>
<button class="reset" type="button">Reset</button>
<button class="submit" type="button">Submit</button>
For clarity, the attribute values that define "purpose" have been highlighted as bold. These are expressed in a controlled vocabulary that is relatively self-explanitory.
There has been enormous design and testing effort go into this structure to ensure it works for print, online and offline; from childrens books to academic tests. It directly exploits the standard content structures, and with incredibly simple CSS can present for all current delivery formats. The design guidelines are:
- The questions must be easy for a non-technical instructional designer to create and edit in an authoring/editing environment without seeing XML.
- They must be able to be presented highly graphically and work within other structures, and contain other structures.
- They must be an excellent target for processing for evaluation, feedback and other interaction mechanisms
- They must be able to be broken apart so the question form is presented and evaluation can be carried out in a separate environment.
- They must be harvestable and recombined into new Q&A content.
- They must process to a database easily.
- The evaluation must be able to be persisted locally and remotely.
- They must be extractable, reusable and remixable
- They must be able to make engaging user interfaces.
Every block of this structure expresses its purpose for a human, a processor and a presentation program... yet it is XHTML.
The real value of XHTML here is that the forms and their purpose statements can be distorted infinitely in presentation, without losing their processing values.
Yes this can be done in XML, but the cost involved is very high. This can be used as a test section associated with, and inserted into an academic book seamlessly because the form is correct, the purpose is clear, and the limits are defined.
Finally, what about ..."high volume high extent processing of the STM nature".
What is unique about STM (which is just a subset of Academic, which is a subset of all content)? This is about approach rather than any other issue, but also indicates that the output process is separated from the XML. It shouldn't be. The XML must be ready to earn money instantly always, without question, without a consultant. Processors should be on standby with their engines warm.
One of the bigger issues with high volume automated print PDF generation is how the XML is stored in the first instance, rather than the rendering technology.
Once the processors are defined for XHTML, they can do their relentless job.
We address this issue with the process/activity programmable IGP:Long Running Process Engine.
Need to extract 50,000 equations from lots of Book XML with source references. Probably a few hours work. If the grammar is expressed well and consistently, these tasks are straight-forward (of course content processing is never simple).
It can produce 500 print PDFs, Kindles and ePubs more or less simultaneously in an hour or so. Need 5,000 abstracts extracted for an online site. That will be amazingly fast AND delivered in the format they need, probably XHTML, but if they really must have DocBook, it is a trivial cross-walk.
As far as processing speed is concerned, on a mid-level server PrinceXML can produce a 300 page book in 5-10 seconds from submitted XHTML/CSS depending on the count of floating objects and print-res images. Rendering speed is normally defined by quantity of generated content, section count and CPU power.
It is not enough for any XML strategy to define just XML tagging. It must define:
- How the XML will be processed for all and any required outputs and output scenarios.
- What granularity the XML content needs to be tagged at for commercial purposes
- How the XML content can be reused and repurposed.
- Practical business outcomes in terms of cost and recovery.
If any of the above are deferred for later. The XML strategy with be either a failure, or a money hole for the future.
The discussion on XML vs. XHTML invariably looks at XHTML as a "Web page" poor cousin of the "real thing" XML. It should of course also be remembered that XML is a "poor cousin" of SGML, according to some.
The fact that XHML has a forms/presentation component in its design is a foundation strength for digital content strategies. It is not a weakness. It allows specific digital content strategies to be built on a powerful and consistent foundation without doing it over and over again.
XHTML is also a highly abstracted and controlled vocabulary that elegantly walks the lines between simplicity, sophistication, extensibility and flexiblity. It puts fences exactly where they need to be so coherent digital content strategies can be built quickly, easily and affordably.
XHTML is a strong long-lasting framework that can be built on to define any digital content strategy from trade to academic and encompassing new digital paradigms.
Any XML can be transformed to XHTML using the FoundationXHTML approach, and vice-versa. There is also content that should be neither XML or XHTML until presentation or exchange time, and should be artfully stored in a database. But that is a different story.
XHTML is highly amorphous with CSS, CSS3 in particular. This allows new presentation, layout, transforms and other properties to be manipulated at presentation time letting content do new things. Even academic content can benefit from this - letting the note or footnote be presented as a roll-over rather than a distant link is an example. Letting related index terms be displayed and explored from within the content is another. User annotations can be applied and saved locally is yet another. It is difficult to work with these new paradigms when the content is created and stored as a single source so far from the point of action. It just wastes money; publisher's money.
XML has only a few basic rules. The simplicity is masterful. It creates no foundation on which to build so all the core content structures must be created synthetically for every content DTD. Custom XML for publisher content is consistently about inventing the wheel over again, and seldom are the essential, content foundation elements so elegantly achieved as defined by the content structures of XHTML. In fact there is usually an effort to design away from XHTML to avoid the web taint!
XML is extremely powerful for data exchange protocols (SOAP2 and ONIX are examples), and consistent system interchange or submission. However it is always better for content to be processed to an interchange requirement.
Real world content is about tagging patterns that work from a controlled framework. For reuse they must be controlled blocks with a defined semantic granularity. DTD and Schema rules are great for interchange protocols, but bring very little to large bodies of valuable content except a larger Total Cost of Ownership bill.
No matter how clever, brilliant, structured, expensive, deep, wide or defined an XML is, it will always be transformed to XHTML to do business for the foreseeable future. No arguments.
Why not start at the with the best.