Just to keep the saga on typography in iPad moving, I created a few simple test cases to see what happened. The results speak for themselves (except I cannot resist some comments of course).
So this is typically long-winded and a bit specialist. If you are interested in typography in ePub devices... read on!
The soft hyphen
In these test cases the soft-hyphens have been inserted as UTF-8 characters, therefore if you open the files in a UTF-8 environment you will not see any characters as they are invisible except when required in the rendering context. So you have to be able to rely on your hyphenation process.
Short Test. Headers and Bodytext
This first short test was an attempt at formalization. I constructed a paragraph with as many long words as I could use that made some kind of reading sense. It went awry because the (¬) character was used in the second example to show where the hyphens were placed, but didn't display in either iPad or ADE.(Click the images for a larger view)
The bottom of the centre and the last image are where the soft-hyphen action is. There is a noticeable improvement in the tracking presentation of the text and overall readability .
All of these soft-hyphens were inserted by hand at great pain. That obviously was not going to be a real solution for production work.
Long Test
I took a thousand of the longest words from our proofing dictionary and hyphenated the text using the standard English Open Office hyphenation algorithm. This was a slightly cruel test, because the words are long, and the page is unfairly only one paragraph. That made both ADE and iPad grunt a little.
(Click the images for a larger view)
This first picture shows the results without soft hyphens. It is a fairly unrealistic test case from the word composition point of view, but I wanted to make sure that nearly every line had an end-of-line hyphenation point.
The second image shows the same text with the soft-hyphens applied. The third image is the continued page. Very well done Safari (I don't think Apple really gets the credit for this)! The long word grammar results in some ugly tracks in the text, but this text is long-word extreme.
A few little niggles on line two in the second image and a few other places where the best-fit algorithm obviously had a bit of a calculation problem on the end of the line. This effect is not seen in the short test case where there is a more natural combination of word lengths.
Other devices
(Click the image for a larger view)And for the curious. Does ADE handle this?
Yes and no!
It breaks the words at the soft-hyphen points, but neglects to bother to put in the hyphen. Now I don't think I did anything wrong. The text displayed with hyphens in Firefox, Safari and iPad, so I guess this is "the device". Strange considering Adobe are the typesetting people.
DO NOT TRY THIS AT HOME!
I loaded it onto the Sony PRS 600. Disaster. The soft-hyphens were unrecognised characters showing the missing glyph "?".
On the long word no-hyphen page, it took a minute to load and characteristically, arbitrarily broke the paragraph into two. (It does that on long text). With its dinky little processor the main soft hyphen screen didn't load after around 2-3 minutes. and kicked me back to the menu. Too many "naughty" characters I guess.
Conclusions
(Hopefully this doesn't sound too much like a school lab experiment) There is definitely value in including soft-hyphens if you are targeting your ePubs for iPad.
You have a presentation disaster in-waiting if your beautifully "­ed" ePub gets into the ADE ecosystem. You have been warned
Soft-hyphens have to be pre-processed into the file before shipping. There is no auto hyphenation in the devices today. (But we knew that or we wouldn't be doing these tests.)
There are certain words that would benefit from hyphenation. An example is the title "Acknowledgements", which on larger presentation font sizes (2em) often disappears off the right of the screen in ADE with large type sizes.
Probably only words longer than a certain length should have soft-hyphens included. Eg. Words longer than 8 characters.
Another option could be only words above a certain length in titles and heads.
In UTF8 this is a two byte binary character, so every soft-hyphen
increases the size of the file. Lots of them could lift file size 20% or
more.
Typography in ePub Starts
As a result of this test and a few others, we will be offering soft hyphenation as a production option to our Publisher customers and building it in as an option in the IGP:FLIP Formats on Demand ePub generation engine in the near future.
For those interested you can access the ePub test case here. Download IGPN-AAJ886.
Thanks for this posting. How can you add a soft hyphen when working in xhtml directly?
Posted by: Wiebe | 19 May 2010 at 12:05 PM
We are currently using this amazing bit of Javascript by Mathiasn at http://code.google.com/p/hyphenator/. We have used it to create a little desktop environment. When it is working and you view source, you get the shy's. This can then be cut and pasted into anything. In our case into IGP:FLIP where the entities are automatically converted into UTF8 characters.
That's how this was done. It's crude, but since ePub hyphenation is just at the experimental level, it does the job for now. The default hyphenator may not be the one of your choice but that can be configured.
Hope that helps. Happy hyphenating.
Posted by: Richard Pipe | 19 May 2010 at 12:37 PM
Richard, this is very helpful, thank you. And I love that paragraph with the long words... "Categories are ampliative!" Hear, hear!
Posted by: Liz Castro | 19 May 2010 at 04:59 PM
the sad thing is hyphenated words won't be searchable (without space in between) on most devices, because they don't normalize them.
Posted by: Steini | 19 May 2010 at 06:04 PM
Steini, that is another excellent point. Your perspicacity is exceptional. Interestingly Firefox (the browser) Find handles it, except where the hyphen is active, ADE and Safari don't.
So now the "option mix" is getting even more interesting. You can have mock typography, but not find, or find and no typography. (I don't grace it with the term search!) I don't know about you but generally the find tools in readers are very, very sad.
We can now add this to the Safari bug list and wait.
Posted by: Richard Pipe | 19 May 2010 at 06:40 PM
And the same problem occurs when using the non-break space or narrow non-break space. ADE won't find a word combination separated with these signs, when you search with a normal space. Most browser seem to handle that, but they also fail on finding words with a non-breaking hyphen in it. It really is searchability vs. typography. And yes the find tools are very weak.
Posted by: Steini | 19 May 2010 at 10:24 PM
I will take a look at those other space typography issues. The find issue for novels (largely irrelevant - bookmarks are more useful), academic - essential - and don't drop indexes, and other reference materials is something I guess we have to wait for. The reason I did these tests was because publishers ask about typography control and presentation. I think we now have a pretty clear statement: You can have pseudo typography, but at present, you loose find functions - take your pick.
Meanwhile this journey is about the new kid on the block, iPad, and where we can take it, and for what purpose. The idea is to catalogue the limitations and unlock the potentials to bring relevance to e-publishing. Our next few test cases are looking exciting.
Posted by: Richard Pipe | 20 May 2010 at 06:54 AM
rePublish (my JS/HTML5 ePub reader) uses the soft hyphenation lib that Richard mentions. It works wonderfully, quickly, and seems to be pretty accurate in my experience.
Rather than embedding soft hyphenation in ePubs, I'd rather have a style hint so that ePubs can signal to the reader application where hyphenation is OK, and where it's not.
Ideally, CSS3 should have this support directly, but failing that it seems fairly sensible to put it in the ePub metadata.
Posted by: Blaine Cook | 20 May 2010 at 11:55 AM