Architecture document for MJCS/EBMH

Creating the Converter

The converter is based upon the SGMLSPM package which processes the output from NSGMLS when it processes a valid sgml document, (hence the effort to ensure that the input documents are valid).

SGMLSPM takes the input from an example which contains all the tags that will be encountered in the whole data set (exemplar.sgml) and generates a processing spec (ebmh_html_transform_spec.pl) on the basis of a processing spec it itself has been given (ebmh_skel.pl). Hence it is strongly recommended (read do it) to edit ebmh_skel.pl not the generated file (ebmh_html_transform.pl).

NGMLS needs to know where to find entities, this is contained in the filecatalog which should be in the converter directory. The DTD declaration is SYSTEM so the DTD and DCL should be in the same directory as the exemplar.

Creating the converter


>nsgmls exemplar.sgml | perl d:\sgmlspm\sgmlspl.pl ebmh_skel.pl > ebmh_html_transform_spec.pl

Clean the incoming HTML

Assumptions

The data for an issue is held in a single directory.
The input files have an extension of html.

Example where data directory is 9908


>perl html2sql.pl 9908

Convert Clean HTML (sgml) to SQL

Assumptions

The data for an issue is held in a single directory.
The cleaned files have an extension of sgml.

Example where data directory is 9908


>perl sgml2sql.pl 9908
   

Document Dated: Wed Dec 15 0:52:24 1999 by timp
Modify this document
Previous Version

Use functions relevant to this node eg create a new page