2 How it works
rtftohtml begins by reading html-trans and the character translation
files. The rest of the processing is a loop of reading your RTF file and
writing HTML. A high level overview
of this loop looks like this:
- Read the next character. In doing so, the filter also reads all of the RTF
markup that specifies the destination, paragraph and text styles of the next
- Process the destination
information. Normally, text is destined for the "body" of the document.
Sometimes, the text belongs in a header, footnote or footer. The filter
discards any text for headers, footers. For a footnote, the filter writes the
text at the end of the document and generates a link to it.
- Process any SPECIAL text styles
The filter compares the text style information to see if it matches any entries
in the .TMatch table (in html-trans, see
If there is a match and the entry is for "_Discard", "_Literal", "_Hot",
"_HRef", "_Name" or "_Footnote" then the text will be processed accordingly.
For example, "_Discard" text is discarded and "_Name" text will generate an
anchor using the text as a name.
- If the text was not SPECIAL, process the paragraph style
The filter takes the name of the paragraph style and looks it up in the list of
paragraph styles in html-trans (in the .PMatch table, see
If the paragraph style is not found in the table it uses the first entry :
"Normal". This entry has a nesting level and the name of the HTML "paragraph" markup to use. Using the HTML paragraph"
markup name, the filter (using the .PTag table) knows what tags to generate for
- If the text was not SPECIAL, process the text styles
again. The filter compares the text style information to see if it matches any
entries in the .TMatch table (in html-trans, see
In this step, it is possible to match more than one entry. For each matched
entry in the .TMatch table, the filter uses the HTML "text" markup name, the
filter (using the .TTag table, see
knows what tags to generate for the text.
Using this process, the filter can generate any HTML markup for any combination
of paragraph style and text style.