<< , >> , Title , Contents , Index

2 How it works

rtftohtml begins by reading html-trans and the character translation files. The rest of the processing is a loop of reading your RTF file and writing HTML. A high level overview  of this loop looks like this:

  1. Read the next character. In doing so, the filter also reads all of the RTF markup that specifies the destination, paragraph and text styles of the next character.
  2. Process the destination  information. Normally, text is destined for the "body" of the document. Sometimes, the text belongs in a header, footnote or footer. The filter discards any text for headers, footers. For a footnote, the filter writes the text at the end of the document and generates a link to it.
  3. Process any SPECIAL text styles . The filter compares the text style information to see if it matches any entries in the .TMatch table (in html-trans, see 5.1.4). If there is a match and the entry is for "_Discard", "_Literal", "_Hot", "_HRef", "_Name" or "_Footnote" then the text will be processed accordingly. For example, "_Discard" text is discarded and "_Name" text will generate an anchor using the text as a name.
  4. If the text was not SPECIAL, process the paragraph style . The filter takes the name of the paragraph style and looks it up in the list of paragraph styles in html-trans (in the .PMatch table, see 5.1.3). If the paragraph style is not found in the table it uses the first entry : "Normal". This entry has a nesting level and the name of the HTML "paragraph"[1] markup to use. Using the HTML paragraph" markup name, the filter (using the .PTag table) knows what tags to generate for the text.
  5. If the text was not SPECIAL, process the text styles  again. The filter compares the text style information to see if it matches any entries in the .TMatch table (in html-trans, see 5.1.4). In this step, it is possible to match more than one entry. For each matched entry in the .TMatch table, the filter uses the HTML "text" markup name, the filter (using the .TTag table, see 5.1.2) knows what tags to generate for the text.

Using this process, the filter can generate any HTML markup for any combination of paragraph style and text style.


<< , >> , Title , Contents , Index