We see and repair a lot of e-book files, especially OEB and XML first files that have been created by various digitization/production houses around the world and over the years. Somewhere along the road we appear to have become an e-book repair shop of last resort - sort of a "Pimp My e-book" where hopeless cases get rebuilt from the ground up. In 2008-9 we repaired and reprocessed around 2.7M pages for various publishers - mostly OEB, and a fair amount of miscellaneous XML. A record year, but maybe an indicator of the developing e-book market.
One of the most interesting features in re-processing so much of other peoples XML work is the lack of future value implicit in the core XHTML. There is usually a cloud of CSS style statements that are no more than presentation instructions and many weird and not-wonderful structures. There is never a need for <p class="bodytext"> or "btext" or whatever else your house style is, why not let <p> just do its work and only use class statements when <p> really needs to be overloaded.
Book Structures
Probably the most important elements in an XML representation of a book, especially for reuse, restyling and multi-device application are those that define the core structure. While this may not be important to the enthusiast creating their first ePub, it is of serious consequence for any publisher. Publisher quality XML should always be ready to deliver all of the following content ownership strategies with minimum processing:
- It has to be dumb enough for the simplest devices or be able to be dumbed down easily for element and CSS deprived formats such as Mobipocket/Kindle.
- It has to have all the elements attributes and values for complex print production
- Custom XML extensions for all types of documents should be easy to add
- Stying should be simple to do, but offer a lot of layout and presentation options
- It has to be excellent for Online rendering and should be subscription ready
- It must be -future mobile format ready
- It must support current and future variable content strategies.IE. It should be able to be reduced to semantic content objects
- It should be inexpensive to produce and process
It is not really enough to produce XHTML just for a single format.
To allow book sections to be freely merged and separated we include all book content in a "galley" <div>, that defines the extent of the presentable book. The galley is an inheritance device that makes advanced CSS techniques possible. Of course these can only be used for Print, Online and ePub content and still have to be dumbed down for Mobipocket and other devices.
<div class="galley">
<div class="frontmatter TitlePage" id="fm-1"> ... </div>
<div class="body Chapter" id="ch-1"> ... </div>
<div class="body Chapter" id="ch-2"> ... </div>
<div class="backmatter Index" id="bm-1"> ... </div>
</div>
The objective is too always use XHTML strict elements and keep the XML nesting as flat as possible. For this reason multiple selectors are used rather than XML nesting. The class attributes frontmatter, body and backmatter class values are just as accessible to processors and CSS selectors using this strategy, but the content is far easier to assemble, disassemble and reuse. Multiple class attributes reduces XML construction costs and maintenance, but few tools will allow it easily.
Titles
The rule is stick with standard XHTML headers everywhere possible. We reserve <h1> for titles and<h2>-<h6> for publishing A-Head to E-Head, or Heading-1 to Heading-5 in Word processor parlance.
Of course for simple chapter titles you can use <h1> by itself, but a title-block outer structure give better control and more flexibility. Title blocks styling can then inherit from their XML structural containers. The vocabulary doesn't have to be increased. It looks like this:
<div class="title-block">
<p class="title-num"><span class="title-num-label">Chapter </span>1</p>
<h1>Title</h1>
</div>
The span statement on the number label provides a processing hook for generated various TOC numbering strategies for different devices and formats. Also the number can be generated or regenerated as required in a variable document.
This title block is a trade book approach, but for academic or educational content it can be extend as far as you like. Note that we consistently use a heirarchical notation - title | num | label. This ensures consistence and helps with maintenance of the core Foundation XHTML selector vocabulary.
<div class="title-block">
<p class="title-num"><span class="title-num-label">Chapter </span> 1</p>
<h1>Title</h1>
<p class="title-sub">Document sub-title</p>
<p class="title-author">Authors name</p>
<p class="title-other">anything else you want to put here</p>
<div class="eipigraph">
<p>Epigraph content</p>
<p class-"source">epigraph source</p>
</div>
</div>
This title block can go in any section, and styles and presentation can inherit from the parent class attribute set or the ID. We use exactly the same title structure inside poems and other special content. It is also really easy to dumb down for Mobipocket (aka Kindle) and other feature deprived eBook devices.
Plus Headers
So finally, a really useful XML for any generic book is standard XHTML with just a few <div> containers and some smart CSS class attribute values.<div class="galley">
<div class="frontmatter TitlePage" id="fm-1"> ... </div>
<div class="body Chapter" id="ch-1"> ...
<div class="title-block" id="tb-1">
<p class="title-num" id="tn-1"><span class="title-num-label">Chapter </span> 1</p>
<h1 id="tit-1">Title</h1>
</div>
<h2 id="ah-1">A-Head</h2>
<p>This paragraph isn't indented...</p>
<p>This paragraph is indented by the CSS...</p>
<h3 id="bh-1">B-Head</h3>
<p>This paragraph isn't indented...</p>
<p>This paragraph is indented by the CSS...</p>
<h4 id="ch-1">B-Head</h3>
<p>This paragraph isn't indented...</p>
<p>This paragraph is indented by the CSS...</p>
<h5 id="dh-1">B-Head</h3>
<p>This paragraph isn't indented...</p>
<p>This paragraph is indented by the CSS...</p>
<h6 id="eh-1">B-Head</h3>
<p>This paragraph isn't indented...</p>
<p>This paragraph is indented by the CSS...</p>
</div>
</div>
We don't nest header sections. It is only vaguely useful if you are extracting section content and that is not often a requirement and it is easier to use some light XSL if this is required. Meanwhile header ID's can be used for linking and extended TOC generation based on their element value.
Elegant and future ready XML tagging really is this easy, even for text books and academic books (with other structures of course). You would never think so when you see the convoluted XML structures and verbose style statements in so many XHTML ePub files. If it is any more simple there is probably value being lost.
Styling
The simplicity of the XHTML is matched with more complex CSS. However it is very easy to style for Online and ePub either with global or section specific selectors. There are enough block structural components to do some very interesting things.The following example CSS which attaches is for a modern text look with device bodytext and sans-serif titles and headers. (This is taken from the stylesheet used in the eScape Default format). You can of course style the elements how you like.
/* Get the bodytext indent working */
p {margin: 0; text-align: justify;}
p + p {text-indent: 1.3em;}
.galley {font-size: 0.9em; padding-right: 1em;}
/* Handle the headers general fonts */
.galley h2, .-rwh3, .galley h4, .galley h5, .galley h6 {
font-family: sans-serif; font-size: 1em;
text-align: left; margin: 0; padding: 0;
}
/* Handle the specific header styling */
.galley h2 {
font-transform: uppercase; text align: center;
padding: 1.5em 0 0.5em 0;
}
.galley h3 {
font-weight: bold;
padding: 1em 2em 0.5em 0;
}
.galley h4 {
font-style: italic; font-weight: bold;
padding: 1em 2em 0.25em 0;
}
.galley h5 {
font-style: normal; font-weight: bold;
padding: 1em 2em 0.25em 0; color: #000000;
}
.galley h6 {
font-style: italic; text-align: left;
padding: 1em 2em 0.25em 0; color: #000000;
}
/* The most work is in the title block, where it should be */
.title-block {
margin: 1em 0 1em 0;
padding: 2em 0 2em 0;
border-top: 1px solid #000000;
border-bottom: 1px solid #000000;
}
/* This sets the font styling for all the title block elements */
.title-block h1, .title-block .title-sub, .title-block .title-num,
.title-block .title-author, .title-block .title-contributor,
.title-block .title-other {
font-family: sans-serif;
font-weight: normal;
text-align: left;
text-indent: 0;
margin: 0;
}
/* Next the specific title block element styling */
.title-block h1 {
font-size: 1.5em; line-height: 1.6em;
font-weight: bold;
padding: 0 0 0.5em 0; color: #000080;
}
.title-block .title-sub {
font-size: 1.25em; padding: 0 0 1em 0;
}
.title-block .title-num {
font-size:1.5em; font-weight: bold; font-style: normal;
padding: 0 0 0.5em 0; color: #808080;
}
.title-block .title-author, div.title-block .title-contributor {
font-size: 1.25em; font-weight: normal;
font-style: italic; padding: 0 0 0.5em 0;
}
.title-block .title-other {
font-size: 0.9em;
padding: 1em 0 0.5em 0; color: #000080;
}
And devices not e-Pub, Print or Online
All this is too complex for the geriatric devices and formats such as MS Reader, Mobipocket/Kindle and Palm. In our production environment the FoundationXHTML passes through the IGP:Formats on Demand (FOD) processor attached to IGP:FLIP. The FOD processor dumbs the XHTML and CSS down differently and appropriately for each format and produces the appropriate package.The same processor does other groovy things such as style optimization and stylesheet reduction, so the large default stylesheets are reduced only to styles applied in a specific document.
Because the XHTML and CSS is so consistent, this results in a format that is optimally styled to the limits of the target device. Converting ePub to Mobipocket for the Kindle using the Mobipocket or other conversion tools does not work well. In fact it usually produces a significantly sub-standard product. But that is another story.
Comments