1 Developer Info

This document is intended for project members, developers who wish to use/modify the source code as well as interested users.

The project is managed with Maven 1. It is written in Java v1.5 and dependent on the following packages:

  • ant-1.6.5
  • commons-cli-1.0
  • commons-collections-3.1
  • commons-io-1.1
  • commons-lang-2.1
  • commons-logging-1.0.4
  • junit-3.8.1
  • log4j-1.2.13
  • velocity-1.4

1.1 Building the project

Building as Maven-Plugin

To build and install the project as a Maven plugin, just execute the Maven-goal plugin:install.

Building as commandline-application

Because the commandline version is run from a start script, there are two Maven-goals: dist.app.windows is used to build a commandline version with a windows batch script, dist.app.unix will build a version with a unix shell script.

Building as Ant-task

There is currently no Maven-goal to create an Ant-task. Building it manually is rather trivial though: Build the jar with the Maven-goal jar:jar, and use it as described in the usage instructions.

Building the documentation

The documentation can simply be built by executing the Maven-goal site.

1.2 Technical overview

The conversion can be grouped in 2 steps: First the LaTeX document will be converted into an object model, which in a second step is passed to a specific renderer class which creates the final output. This is to ensure that new renderers (new output formats) can be added more easily, and are independant of TeX syntax. Another benefit is that by using the intermediate object model, one could also create different parsers for different input formats.



Figure 1: Overview of the conversion

The object model is designed to represent the content and logical structure of the document, not to store layout information. The layout of the output is the responsibility of the renderers, and thus can be tailored to match the specific output formats. E.g. there could be 2 kinds of HTML renderers, one which will create a single long webpage, and another which spreads the document over several pages (e.g. one page per section) and links them together via a table of contents page.

Below you see an example of how a simple LaTeX document would be represented in the object model. The connecting lines do not represent inheritation, instead many object model classes can have child nodes (similar to the W3C DOM), which is the relation displayed here.



Figure 2: UML object model example

1.3 Architecture

There are two main modules, the reader (LaTeX document to object tree; package org.texconverter.reader.tex) and the renderer (object tree to final output; org.texconverter.renderer). The classes of the object model reside in the package org.texconverter.dom.

Reader

The reader module itself can be grouped into 4 important parts:

  • The builder module (org.texconverter.reader.tex.builder), which is responsible for constructing the object tree and storing context information.
  • The tokenizer (org.texconverter.reader.tex.parser.Tokenizer), which segments the LaTeX input into distinct tokens for easier processing.
  • The parser module (org.texconverter.reader.tex.parser), which processes the TeX tokens. Necessary changes to the context or the object tree are communicated to the builder module.
  • The commandhandler module (org.texconverter.reader.handler), which processes LaTeX or TeX commands. Necessary changes to the context or the object tree are communicated to the builder module.

Also of importance is the file cmdDefs.xml, residing in the folder plugin-resources. Here all handled (or ignored) commands are specified (their kind and count of arguments, the commandhandler class responsible for handling them, ..).

Renderer

There is currently only one renderer (well, one and a half), which is versatile enough to cover all currently supported output formats. It uses Jakarta Velocity to render the object model objects. The different Velocity template sets can be found in the folder plugin-resources/vm/<output format name>.

1.4 Documentation

The documentation you are currently reading resides in the folder "texdocs" in several LaTeX files. On executing the Maven-goal site, these files will get converted (by the TeXConverter-JUnit-tests) to XDoc and finally (by the Maven-xdoc-plugin) to HTML.

There are a few documentation files which use XDoc-specific features (like navigation.xml), these can be found in the folder site/xdocs/.