The main problem with any
potential content XML strategy is... which or what XML DTD or schema to use. This
decision affects all publishers including corporate, institutional and government.
As part of the "which XML" decision making, you no longer have to think about what you want to do with your
content in the future. Right now if you are not wanting your content ready
for absolutely anything and everything you shouldn't be thinking XML.
Ready for everything means at least all of the following:
We see and repair a lot of e-book files, especially OEB and XML first files that have been created by various digitization/production houses around the world and over the years. Somewhere along the road we appear to have become an e-book repair shop
of last resort - sort of a "Pimp My e-book" where hopeless cases get
rebuilt from the ground up. In 2008-9 we repaired and reprocessed around 2.7M pages for various publishers - mostly OEB, and a fair amount of miscellaneous XML. A record year, but maybe an indicator of the developing e-book market.
One of the most interesting features in re-processing so much of other peoples XML work is the lack of future value implicit in the core XHTML. There is usually a cloud of CSS style statements that are no more than presentation instructions and many weird and not-wonderful structures. There is never a need for <p class="bodytext"> or "btext" or whatever else your house style is, why not let <p> just do its work and only use class statements when <p> really needs to be overloaded.