There have been numerous requests on a number of forums for something that works from Open Office directly to ePub - like a plug-in. Well it's here now, it's called Infogrid Pacific eScape, and sorry, it's not a plug in. We see it as the tag-team partner for AZARDI as we move ahead (slowly).
eScape is our "free for non-commercial use" convertor, designed to make creating a respectable ePub from Open Office as easy as it can be, or should be. Of course you have to follow some rules.
Background
There is a functional disjunct between wordprocessors and XML formats. Open Office is style driven at the paragraph and character level, with mechanisms for generating page layouts, numbering and other components. It also has hard-to-use frames to complicate the situation.
The XML required for e-book and other content reuse needs to represent document structure using structural elements, attributes and values. This is then decorated by the CSS for the final presentation. To address this "mismatch" we have created an OTT (Open Office Writer Template) with a number of styles representing structure, rather than layout. We cleverly call that Structure-Styling. (A term also used in the hair-care industry. The two are unrelated.) That means you are using Open Office styles to define the structure NOT the look of the document. You have to think of the styles as instructions to the eScape ePub processor. But all this is explained exhaustively in the documentation.
The Template
The basic OTT file defines a 6in x 9in book with 10pt font and 12pt line height so the amount of content is close to a real book page. However there the similarity ends. There are no headers and footers, just highly coloured section structure indicators, and styled paragraphs representing blocks. There are about 30 custom structural styles defined. But you have to probably see it to get it. Its available on our website here.
Before downloading and using eScape, download the annotated tutorial document to see if you can work with this type of ePub production system. This presents the case that to create consistent ePubs, you need consistent XHTML. If you think "I can do that!", then download eScape. eScape gives you the ability to go wild with the CSS styles as part of the production process.
What happens inside?
Quite a lot.
You ODT is converted to XHTML using the Open Office format generators. This is a pretty messy XHTML, but so far it is reasonably consistent across a few versions of Open Office- with minor variations. It then processes that very messy, not quite valid OOo XHTML and removes all the gazillions of generated metric styles that are not required, until finally it is left with the svelt and useful *-igp style set.
Next paragraph styles are turned into XML block styles, generated content inserted and everything cleaned up, ID's are generated, and this creates an intermediary IPG:FoundationXHTML (FX) file. Then the FX is is passed to the ePub Format generator. This zooms up and down the file splitting the file into sections, creating the *.opf, manifest, *.ncx, cover page, and anything else required. Finally it zips it all together and drops an ePub into your output location. As certain software likes to say at such moments...ta-da. This version does not self validate. Send the file to Bookworm to make sure it's a conforming ePub.
The CSS is made available in a directory. We give two standard designs at this stage, and may add to them later, or even put up a location where people can share designs. You can use the supplied CSS files as templates to create your own layouts. The igp-escape-default.css is annotated to enable you to get creative quickly.
And in closing
That's about it for this first post on eScape. Just a few notes on limitations:
- This version doesn't support images other than the cover.
- Basic table support is there, but sophisticated styling takes a bit of work.
- Section numbering is simultaneously crude and sophisticated. Use it to see what I mean.
- A few other small niggles in the it would be nice if category. Send us in your comments for consideration. Right here on this blog is the place.
Note: eScape is only licensed free for individual use and non-commercial work.
I've been looking for an EPUB authoring software for some months. Thank you for eScape. Do you plan a release for Linux?
Posted by: Dmitri Minaev | Tuesday, February 24, 2009 at 02:54 PM
Dmitri.
Linux version coming up very shortly. It will be a Deb if that is OK. We just a gentle walloping at Mobileread for the same reason and we are a Linux development house. Go figure! Please give us your feedback, complaints it's a different system to most, and we need a bit of feedback.
Posted by: Richard Pipe | Tuesday, February 24, 2009 at 06:36 PM
Only one snag. Export as XHTM(.html) in Open Office. File exported. Open Escape. Get input file. Selected exported html file. Escape replies on the lines 'select xhtml generated by Open Office ...'
So what's happening?
Posted by: CNH | Monday, March 02, 2009 at 12:30 AM
Thanks for the feedback.
That's our ommission on instructions. Sorry about that. The system is hard coded to look for an *.xhtml extension at present. The reason for this is a bit of histroy on browser and parsing behaviour.
The correct filename extension for XHTML is *.html. There is no official *.xhtml mimetype. When files are sent from a server, the browser will interprete .html using the SGML parser instead of XML parsing unless the server passes the mimetype application/xhtml+xml.
Mozilla in their wisdom wanted the browser to understand which parser to use from the file name for local file opening so they made up the *.xhtml convention, which fortunately Safari and Opera have adopted. Of course IE doesn't know a thing about this xhtml stuff and just acts dumb. If you double click the file, you can see the raw exported XHTML in Firefox, Safari/Chrome or Opera.
In eScape this is not really an issue, just a habit we have here of always saving XHTML as *.xhtml and we haven't communicated this clearly anywhere in the documentation. So please rename your file to *.xhtml and all should be OK, and use the *.xhtml extension option next time when exporting.
I will upgrade the documentation accordingly, and also in the next version we will fix this up so either *.html or *.xhtml can be loaded, and we will do our Open Office check internally. For interest, the reason this check exists is to prevent arbitrary XHTML/HTML files being loaded that make the application behave erratically!
Posted by: Richard Pipe | Monday, March 02, 2009 at 09:28 AM
My OpenOffice Writer 3.0 doesn't even have an option to export as XHTML - if I use Save As instead of Export, it has "HTML (OpenOffice.org Writer Document)" (which claims to be HTML 4.0 Transitional, but actually isn't quite), and it uses upper-case tag/attribute names so isn't parsable as (case-sensitive) XHTML. Easily fixed with a pass through HTML Tidy, of course, but... ???
Posted by: Anonymous Coward | Tuesday, March 10, 2009 at 12:35 PM
I have added a page on the tutorial on how to save as XHTML. You can opent the tutorial document here http://www.publisherdams.com/reader/content/c-0002184/?a=lc . Go to the last page titled originally "Saving the File as XHTML". Hope that clears up the problem.
Posted by: Richard Pipe | Wednesday, March 11, 2009 at 08:25 PM
'fraid not - that says it's under Export, but it isn't. My Export filetype selector has:
PDF - Portable Document Format (.pdf)
-------------------------------
BibTeX (.bib)
LaTeX 2e (.tex)
MediaWiki (.txt)
that is all (I didn't count the dashes in the second line). Maybe you have some addon installed?
[Ah...after a bit of poking around, I found the answer: it's in the xsltfilter sub-package, which I don't have installed!]
Posted by: Anonymous Coward | Thursday, March 12, 2009 at 03:40 PM
Thanks for that info. I will put a note in the tutorial. "If you can't see the XHTML option, you probably don't have the xsltfilter sub-package installed". I found the option bit in the OO help file so will include that.
I am one of those close the eyes and install everything types. I use OO on Ubuntu and Vista, and around the office we have every version from 2.2 to 3.0. (about 40 workstations and 20 laptops all running OO on most Windows, and Linux versions available). We thought we had it reasonably covered with various tests. Guess we are all "Install everything" artists.
Again. Thanks for the feedback.
Posted by: Richard Pipe | Thursday, March 12, 2009 at 05:52 PM
sorry
Posted by: saravanan | Saturday, March 14, 2009 at 02:36 PM
I've been playing with eScape and it's really great, but there are a couple of things that bug me:
1) Localization. The terms "Title page" and "Content" in the TOC seem hardcoded. Is there any way to translate that?
2) Support for images inside the text. This could be a deal breaker. Are you planning to add this in a near future, even in a simple way?
Also, while, we are at it, in the tutorial (section "Text file Preparation", points 8 through 13) it says to merge paragraphs together. But I need paragraphs in my text. What happens if I omit this step?
Posted by: Jordi Mustieles | Wednesday, April 22, 2009 at 09:40 PM
Jordi, Thanks for the feedback.
Last point first. That set of instructions is conditional for Gutenberg text. It's definitely ambiguous so I have corrected that. This little Regular Expression trick joins adjacent lines which are broken as individual paragraphs, and creates the paragraph break where there is an empty line. That is the only time it should be used. If your paragraphs are fine - jump these steps.
Localization - hard coded Title Page/Contents. We have had this issue pointed out by a kindly German user as well. The sin of language myopia is ours. We will get this fixed very shortly. The terms will be taken from the book content instead of an internal structure. You will be able to create your full TOC in any language.
Images. Various versions of the Open Office XSL convertors treat images differently (3.0 turns them into Base64) and we were looking for a "middle path". Our idea is to forget trying to get images into the ODT and just create references. I am creating a new blog post on the proposed approach and would like yours and anyone elses feedback. It makes the tool less elegant, but when using third party elements we have to work with them. The fact that they change frequently doesn't make it easier. We will also be introducing font embedding into the same version. This is a little easier.
Posted by: Richard Pipe | Thursday, April 23, 2009 at 08:00 AM
thanks for this article...
I really need something like this.
Posted by: Owl City | Saturday, November 07, 2009 at 05:47 PM
Does not work in OSX 10.6.2
say's that it's not compatable with this computer when I try to install it with the manager.
Posted by: Bryan | Monday, March 22, 2010 at 04:12 PM
Nice post. Lot of inormation is there.Thanks for your innovative idea.
Posted by: XML Training | Tuesday, March 29, 2011 at 02:09 PM