At the risk of being shot down by huge intellectual linguistic experts, I have to say paragraphs are pretty much the most important identifiable unit of taggable content that brings sense to text (except I really do like lists too). Everything else hangs off or is inserted between the paragraphs. We really like paragraphs as XML content containers and work them very, very hard in FX.
Note that the class value grammar terms are fully expressed, there are no abbreviations (except we let ourselves use para for paragraph).
This is an FX characteristic. Abbreviations dilute the future value of any XML strategy as they are not always naturally self-explaining. Being consistent in this is important as it is so easy to get clever and cut-down longer phrase style values. Just don't do it. Be verbose for your children's children.
Bodytext
Bodytext is the backbone of all document styled content. FX uses the <p> element (obviously) to identify the bodytext element.
There is a set of class values that allow bodytext to be modified for presentation and layout requirements, especially where the layout has semantic purpose.
There is a further set of paragraph styles to allow paragraphs to describe their purpose to a processing environment. These are rather obviously termed descriptive bodytext styles.
An applied stylesheet or processor can use these definitions for specific editing, processing, format generation and presentation requirements.
Layout classes
Layout classes are very important in all documents to allow reproduction of sophisticated presentation formatting where there is no structural tagging warranted, required or possible. That happens a lot!
An arguable example is in a fictional novel it is possibly inappropriate to tag a lovers' letter using the formal letter block tagging as it is not an extractable letter of value, but part of a fictional narrative that benefits from appropriate formatting, but not the overhead of semantic tagging. Applying formal letter tags is possibly a devaluation or mis-valuation of the content, and if applied is only being used to achieve presentation effects. Of course the fact it is a fictional letter is implied by the metadata description of the work, so there is an individual publisher strategy that can be put into place here. (I just tagged Vanity Fair testing IGP:Typography in the Browser and to structure style the many letters would have been seriously inappropriate XML.)
Classic literature uses a lot of presentation devices such as "He read the poster in the window"... followed by highly styled text showing the content of the poster, nicely center aligned in the galley in a bold decorative font. There is no semantic tagging involved with this, only presentation tagging. Sure we could get clever and define a "poster" block, or even worse a "fiction-poster" block. But center aligning the text does the job without getting lost into XML la-la-land.
Default bodytext uses only the <p> element without any styles. Generally the global bodytext font and size are defined in galley-rw. Where processors can handle pseudo selectors, issues such as first paras after blocks are carried through from the CSS or sequence processing for specific format requirements.
Bodytext standard presentation
In business, online and other manuals context paragraphs are divided by some space below. In most formal publishing paragraphs are continuous and the paragraph start is indicated by an indent.
The general rule is only use one way to show a paragraph start. This rule introduces the "first paragraph" concept, where in positions where there is white space above a paragraph, the paragraph should not be indented (the white space shows the start of a new paragraph). This is easily controlled with CSS selectors. IE. it is a presentation issue.
p {margin: 0;}
p + p {text-indent: 1.3em;}
If paragraphs are going into a context which cannot control first paragraph issues using CSS pseudo selectors, then a class statement can be processed in for that context during format generation. This is done for Kindle/Mobipocket. In this case the p + p is processed out, and in each first paragraph position a bodytext-first-para style is applied. But notice the presentation is a style issue, not an XHTML one.
The layout paragraphs
There are a number of layout issues which need to be addressed for paragraph presentation. The core FX set is:
- <p class="middle-rw"> ... </p>
Center align para. Indicates to a formatter that this paragraph should be aligned in the center of the galley. Its middle not center for reasons that will be explained later. - <p class="right-rw"> ... </p>
Indent para. Indicates to a formatter that this paragraph should be aligned right. - <p class="first-para-rw"> ... </p>
First paragraph. Indicates to a formatter that this paragraph should be presented with no first line indent. - <p class="indent-rw"> ... </p>
Indented first line. Indicates to a formatter that this paragraph should be indented even if it is in a position that would automatically generated a non-indented line. - <p class="indent-para-rw"> ... </p>
Indented paragraph. The whole paragraph is indented from the left margin. - <p class="indent-leading-rw"> ... </p>
Indented paragraph with leading line after. The whole paragraph is indented from the left margin and their is a clear line margin below the paragraph. - <p class="hanging-rw"> ... </p>
Hanging para. Indicates to a formatter that this paragraph should be displayed as a hanging paragraph.
- <p class="hanging-leading-rw"> ... </p>
Hanging paragraph with leading line after. Indicates to a formatter that this paragraph should be displayed as a hanging paragraph and their is a clear margin line below the paragraph.
- <p class="alt-rw"> ... </p>
Alternative text. Indicates to a formatter that this is should be displayed in a defined alternative text font face. In FX this is extended using additional font statements for documents that contain a lot of paragraph styles. - <p class="stand-alone-rw"> ... </p>
Stand-alone. A full leading space before and after the paragraph. An alternative is to use leading lines. - <p class="numbered-rw"><span class="para-number-rw">1.</span> ... </p>
Numbered paragraph. This enables a hanging paragraph to be formatted with a floating number. The number must be tagged and floated left. - <p class="alt-rw"> ... </p>
Alternative text. Indicates to a formatter that this is should be displayed in a defined alternative text font face. In FX this is extended using additional font statements for documents that contain a lot of paragraph styles. - <p class="alt-rw"> ... </p>
Alternative text. Indicates to a formatter that this is should be displayed in a defined alternative text font face. In FX this is extended using additional font statements for documents that contain a lot of paragraph styles.
Lines
Two "pseudo" paragraphs are lines which use <div> rather than <p>. This is to resolve the abstraction difference between a line and a paragraph, and also allows the paragraph following a line to start without an indent.
- <div class="leading-rw"> ... </div>
A blank line. Indicates to a formatter that one or more empty lines should occur.
- <div class="decoration-rw"> ... </div>
A line containing an ornamental element. Indicates to a formatter that this is should be displayed as blank lines containing some defined decoration.
In print CSS the height of these is set to the line-height of the document. This may be reduced in e-book and online formats to reduce vertical travel.
Code examples
All CSS examples are taken from the IGP:FLIP Default Template.
Sample XHTML
<div class="galley-rw">
<p> ... </p>
<p class="middle-rw"> ... </p>
<p> ... </p>
<p class="right-rw"> ... </p>
<p> ... </p>
<div class="leading-rw"> ... </div>
<p class="indent-rw"> ... </p>
<p> ... </p>
<p class="hanging-rw"> ... </p>
<p> ... </p>
<p> ... </p>
<div class="decoration-rw"> ... </div>
<p> ... </p>
<p> ... </p>
</div>
Sample Reader CSS
/* ================================ */
/* F2: BODYTEXT */
p {
margin: 0;
}
p + p {
text-indent: 1.3em;
}
.alt-rw {
font-family: sans-serif;
font-size: 0.9em;
}
.decoration-rw {
text-align: center;
padding: 1em 0 1em 0;
}
.leading-line-rw { height: 1em;
}
p.indent-rw { text-indent: 1.3em;
}
.hanging-rw p, .hanging-rw p + p {
text-indent: -1.3em;
margin-left: 1.3em;
}
.middle-rw {
text-indent:0;
text-align: center;
justify: none;
}
.right-rw {
text-align: right;
justify: none;
}
In these examples the basic paragraph styles are defined for an online context using relative dimensions.
Sample Print CSS
/* ================================ */
/* F2: BODYTEXT */
p { margin: 0;
}
p + p {
text-indent: 18pt;
}
p.indent-rw { text-indent: 18pt;
}
p.indent-rw {
margin: 0 0 0 18pt;
}
p.hanging-rw, .hanging-rw p + p {
text-indent: -18pt;
margin-left: 18pt;
}
p.alt-rw {
font-family: MyriadPro LT;
}
As can be seen from this example, the print stylesheet uses different units where required, and also has specifically addressed additional presentation rendering issues.
The descriptive bodytext paragraphs
The standard FX descriptive paragraph set is as follows. This can be extended for special content requirements.
Caption
Use for all table, figure, illustration, etc. numbered captions. You will also have to tag the number separately. The number type generally inherits from the type of block it is in for processing. Eg. This will become Table number 12.2, or Figure number 6.3m depending on whether it is in a table or figure block. The caption inherits its styling from its parent structure.
<p class="caption-rw"><span class="num-ref-rw">XX</span> The caption text</p>
Attribution
An attribution normally occurs at the end of an extract or epigraph and is frequently set right (but that is presentation, not the attributes).
<p class="attribution-rw">The attribution text</p>
Contributor
IGP Production Note: A list of contributors will always have the word contributors before it.
<p>Contributors</p>
<p class="contributor-rw">Professor Chandwick Esthelwaith</p>
<p class="contributor-rw">Sir Pettigrew Sloan-Dipsthick</p>
Author
IGP Production Note: Author normally occurs with the term Author. Otherwise use attribution or source. It may be aligned left or right.
And of course it is important to know who wrote this amazing story.
<p class="author-rw>Anonymous Author</p>
Authornote
Authornote is applied where the term Authornote exists at the beginning of the line or group of paragraphs, otherwise use paragraph styling.
<p class="authornote-rw><b>Authornote:</b> I would just like to record my thanks to my mother, my father and everyone else in the world.</b>
Speech
Speech paragraphs are used in interview style reports and similar documents where a conversation is being recorded. Do not use speech for drama tagging. (Drama tagging is a separate genre.) In this example it is set as hanging paragraphs.
<p class="speech-rw"><span class="speaker-rw">John</span>. This is a short speech</p>
<p class="speech-rw"><span class="speaker-rw">Susan</span>. This would be too, but I have to keep talking at least until I get a line wrap otherwise it is difficult to see how speech is formatted.</p>
Source
The source statement is used at the bottom of the first page of academic journal articles only. They should be moved to just before the abstract in eText to prevent them occuring in the middle of a page paragraph.
<p class="source-rw">Source: The International Journal of Superior Book Tagging. Infogrid Pacific. 2004-8.</p>
Question
Use Question tagging where there are a mumber of questions in a sequence. If a question has multiple paragraphs use style tagging for all paragraphs except the first one. Use the Q&A genre for complex Q&A tagging when justified.
<p class="question-rw">Q 1. Which question follows question forty-eight and comes just before question 50? Expand your answer with a 500 word explanatory essay.</p>
Answer
Use Answer tagging where there are a lot of answers in a sequence. Note that options in multi-choice questions are NOT answers, only one of them is.
<p class="answer-rw">49. This is the answer to question forty-nine which follows question forty-eight and comes just before question 50</p>
If questions and answers have to be correlated, use Q&A tagging.
Headword
Headwords are found in glossaries and conceptually in dictionaries. Use Headword if you are not using Definition List tagging. Do not use it if you are tagging as Definition Lists.
<p class="headword-rw">Antidisestablishmentarianism</p>
This is a very long word and makes a very prominent headword. It would look nice in an encyclopedia of Irish Political History.
Summary
Paragraphs class values are extremely important in high value XML.
There is a general message from XML experts that presentation styling should not be carried in XML element names.
This strategy may apply to certain mechanically reproduced content such as academic, but it is a very wrong approach for worlds of other content. As a guideline under-tagging layout can lead to casual misinterpretation of content, and can result in dilution of content value or even misrepresentation of content... and always increases long-term ownership costs. I hereby declare that I stand at right-angles to generally stated XML consultant wisdom on the separation of structure and styles. The real world must always over-rule the rules.
There is a significant amount of content where presentation carries semantic meaning and content value that cannot be neglected or represented by semantic tags. This is especially true of center and left styles, but can apply to various indent and hanging patterns also.
Whether paragraph style values are applied must be defined by a specific content instance. Our general guideline is that applying them in free-form text is usually better than leaving them off. They can be ignored by a processor, but to include them at format presentation time will cost dearly. We have more, for example the paragraph offset group!
Descriptive paragraphs are self defining. The list given here is a reasonable standard set of descriptive paragraphs that would be found in academic, and non-fiction trade books. Other documents genres such as legal and corporate documents all have their own specialist paragraph class values.
Previous FX2. F1-The Galley
Next FX4. F3-Inline Content (phrasing) Coming soon
Comments