Folio Flat File Utilities Requirement Specification

Summary

The Folio Flat File Language (FFFL), like most word processor representation languages, is so rich, flexible and forgiving that it is possible to create incoherent infobases. Folio provide the users with so much rope that they inevitably hang themselves or at least get tripped up.

The method chosen by Folio to address this problem is to impose discipline by running validation routines on the infobase. The Folio Flat File Validator is a useful tool and any file which we wish to process should already have been validated. However this process does not go far enough.

We need to go one step further. It would be particularly advantageous to be able to use the consistency checking inherent in an SGML parser.

Two issues need to be borne in mind during the design of such validators:

The ordering of tags in the infobase does not conform to SGML, specifically fields may overlap and paragraphs may end before character styles begun within them.
Formatting tags may occur in a level, paragraph style or as raw tags in the paragraph text.

The need for our own set of validation processes is brought about by failings in a number of Folio supplied facilities. Firstly, as mentioned, the language allows illogical or counterintuitive constucts. Secondly these inconsistencies are exploited by the Word6 filter. Finally, the Webserver uses such a bizarre mapping from FFF to HTML that it is necessary to write a webserver-specific infobase or convert existing infobases to the required format prior to mounting on the web.

Folio's Validator

The Folio Flat File Validator is a useful tool, and getting a clean pass though the validator should be the start and end point for any further validation.

Conflation Script

The flexibility of Folio Flat File Language allows Paragraph and Character level formatting codes (PLF and CLF) to occur at the Level, Paragraph and Text levels of the document. We need a much stricter discipline, partially for ease of programmatic manipulation, but also as part of our Quality Assurance endeavours.

The Ideal Infobase Structure

Level Definition
Containing no PLF or CLF.
- Paragraph Styles
  Containing PLF and all CLF which apply to the whole paragraph.
  - Character Styles
    Containing all CLF

Disadvantages

The problems which arise from this formalism are:

Editors need to apply both a Level and the appropriate Paragraph Style (PS).
Editors need to apply both fields and Character Styles (CS).

These can be overcome by allowing PLF/CLF on Levels and CLF on fields but separating the two programmatically when the Infobase has finished being Authored. Note however that subsequent editing will be less intuitive. Wherever possible the Infobase should not be manually editted after it has been processed.

Conflation

The script will read the definition file (DEF file), establishing the PLF and CLF for existing levels, paragraph styles and character styles.

The script will then create new PS representing the PLF applied to Levels.

The script then runs through the Flat File:

Accumulating, taking into account the override precedence, the PLF and CLF applied to each record.
Tracking any raw CLF which apply to the whole record.
Matching the PLF/CLF against existing Paragraph styles and inserting the appropriate one or creating a new PS and applying it.
Converting any raw CLF to appropriate CS.
Converting Fields to Field and CS codes.
Writing out the reformatted record

Finally the script writes a new DEF file.

The new FFF and DEF files should now still pass through the Folio Flat File Validator without error.

Webserver Preparation Script

The script needs to perform the following functions:

Convert Tab Settings to Netscape Table commands.
Convert Font Size commands to Netscape Font Size commands.
Convert Forground Colour commands to Netscape Font Color commands.

The script requires that the FFF has already been processed by the Conflation script.

Rainbow DTD SGML Export Filter

A further step in quality assurance would be to take advantage of the consistency checking inherent in an SGML parser.

We have the capacity, using the SDK, to write export filters. It should be possible to write an export filter such that the output was in accordance with a particular DTD. For these purposes I would recommend the Rainbow DTD from EBT.