Things are picking up apace in 2010 and the blog is constantly nagging me for being negligent.
IGP:FLIP allows the importing and conversion to FoundationXHTML of a wide range of formats including word-processor files, spreadsheets, presentations, text and other items. The text and structure aspects of the content are imported substantially intact.What goes missing is the original style information.
We constantly get asked "Why can't IGP:FLIP import XML?" Well of course it can, but not through the general office formats importer. XML is not a format. It is an encoding standard that lets anyone use agreed vocabularies to define machine to machine data interchange. A specific XML schema or DTD is getting close to being a format, but is usually a long way from being usable except in an opaque XML editing or processing environment.
Importing DocBook
We recently created a DocBook importer for IGP:FLIP to enable a customer to import several hundred back-list books that have been tagged in DocBook. Naturally they didn't want to pay for the processor and said the application should be able to import DocBook natively. We agreed. We told them the problem there is no such thing as standard DocBook. The "XML consultant universe" always creates custom DTD extensions for every publishers, even if they are simple trade books.
We cracked their files open and there they were - the custom extensions. In just six hundred files there were four versions all with different strategies. Considerately the DTD author had included love notes to future users stating what things had not been completed and had yet to be changed. The IGP:FLIP importer now had to handle DocBook native, and four version variants. This is not particularly hard work, but it is tedious work and has to go through a lot of testing. Of course after importing there is the inevitable wrongly tagged content that then has to be corrected.
Now I don't think for a minute that XML expert consultants do this on purpose. I have had discussions (along the tones of heated debates) with many and believe they are passionate about what they do. The problem is they have their blinkers on and think that XML elements defined in a DTD somehow deliver better XML content for publishers and that DTDs can keep on changing and still deliver future value. It is not difficult to see, even from the above simple example this is not true. Incomplete XML strategies will never delivery future value, and worst publishers wont know they have a collection of unusable XML content because the process is so opaque.
The future of publisher content is instant reuse and delivery through a wide range of technologies, formats and devices. Very little publisher XML content is ready for the task ahead.
DTD Rigor Mortis
In the emerging world of reusable and remixable content, it must be possible to more or less instantly create a new work from any existing XML files. This should be able to be achieved from a simple paragraph mix-up, through to major sections, and yet nothing (other than IGP:FLIP) can do this or even demonstrate it.
(XML databases don't figure in this discussion.They are not about valuable XML content tagging, but searching on XML and text properties for certain outcomes which may or may not have value. The cost of ownership is very high and have to be programmed to do anything.)
I remember reading once in some database work that any finite set of data has an infinite set of interpretations. That is a very loose paraphrase from memory but the idea is clear the future of content is infinite. So any XML strategy needs to be infinite ready from the start! What if I need to be able to take a poetry extract, mix it with some STM equations and present it as drama? Now!
XHTML to the rescue
This is why XHTML and tagging patterns are the answer. The base elements of XHTML are generic and boil down to <div>, <h1>-<h6>, <p>, <ul>, <ol>, <table> (and many more, but these are the big ones for content). The rules on nesting are sensible and generic. Of course there are more elements than this, but this is the core kitset. It doesn't get more generic. The neutrality of these elements are perfect as a foundation for really flexible content strategies.
Tagging Patterns NOT DTDs
Every XML schema has significant controlled grammars and tagging patterns are no different. Tagging patterns make things easier and give the same control if used from a controlling environment. A tagging pattern is not an example of how content should be tagged, it defines how content must be tagged, the element/attribute/value patterns that must be used, and how those combinations can be extended or reduced for a specific requirement. In this respect it is like Microformats, except it is extended to include a much wider set of content beyond social contracts. IGP:FoundationXHTML naturally supports Microformats.
The difference is putting XML tags onto content, vs. putting content into XML structures. This perspective shift delivers very different results for the long term ownership of content, primarily because a more extensible grammar can be defined.
IGP:FLIP is an XML production environment that ensures content always goes into a valid and well-formed XML container, therefore the content is always ready for instant output and use at any time. From import to detailed tagging, it can never NOT be ready for instant use.
Tagging patterns allow and encourage incremental tagging value addition. We have a client who is using the environment in this way. They are processing very complex magazine and trade book content and a large part of their business is selling content on to other organizations.
In the standard product process we XML tag to IGP:FoundationXHTML and instantly output to text files, ePub, Mobipocket and Online XHTML with print and online images. Recently they wanted to extract a speciality book from a general pet ownership and care book. The book was sitting in the production environment and just needed value add custom properties to semantically classify the various structures - sections, sub-sections, note-boxes, images, etc. The value add tagging took just a few hours, and they instantly had the "For Cats Only" edition.
This is the type of flexibility that is need in publishing today. The XML output matches today's business for today's revenue streams, and can be incrementally tagged at any time to exploit new business revenue opportunities.
More on tagging patterns and how they reduce the risk and cost of long-term content ownership soon.
Comments