Next: Experience, Limitations, and Generalizations Up: The SUDA project: Collaborative Previous: Managing editors

Implementation method

The SOL package is written as CGI scripts that support forms. Typically, these scripts produce forms as their output. The scripts are written in Perl [WS90]. Once the scripts are stable, they will be available at ftp://ftp.cs.uky.edu/cs/software/suda.tgz so others can pursue similar projects based on our code.

These forms often need to display Greek. Displaying Greek in Beta code is straightforward but not particularly friendly to the reader. Java-enabled browsers can run an applet that presents Beta code as Greek letters. Unicode-aware browsers with a classical Greek font can display Greek more directly [Con96]. Finally, GIF images containing pictures of Greek can be placed in the forms. The choice of display methods is made at the SOL server. The display preferences of a participant are elicited at login time and used throughout the session.

All databases are currently flat files in Ascii. This choice of formats makes the data machine-independent and allows for attribute values with arbitrary length. In this regard, the databases are like those of Qddb [HF96]. Individual attributes are separated either by newlines or by the | character.

The central database holds translations. As the translation grows, the production translation database will be subdivided into multiple files. When a translator or editor modifies a translation, the software removes the entire entry from the translation database and places a new copy at the end. The cost of these operations is linear in the size of the database file involved. For this reason, we will limit individual database files to about 100KB. Search will require indices when the data become large; we will likely use Qddb for our database engine at that time.

The participant database is used to record each registered participant (a category that does not include guests), linking the translator identifier to the participant's full name, phone number, affiliation, e-mail address, and level of engagement. This database is used to produce the output format, which does not reveal translator identifiers.

The assignment database has an entry for each translator listing the Suda entries that translator has been assigned. It is parallel to the completion database, which has an entry for each translator showing what has been completed. Both these databases use entry lists, which are comma-separated lists of ranges. A range is in the form gamma,14-18, indicating the letter and a contiguous set of numbered Suda entries within that letter's Suda entries. These databases are used to guide managing editors as they give new assignments and translators as they choose a Suda entry to translate. They are also used to create the graphics depicting the current status of the SOL effort.

In order to prevent translators and editors from massively changing a translation, the software compares the modified translation with its previous version using a differencing program. If the number of words changed exceeds a threshold (20%, for instance), the modification is flagged as ``excessive''.

We do not currently implement a high level of security. Passwords are sent through the network and stored at the server in cleartext. The fact that a participant has logged in under some identifier is passed from form to form as a hidden value of the form. An intruder who knows the URL of the forms used by participants at higher engagement levels than guest and who can guess a participant id can likely fool SOL into taking actions appropriate to those levels of engagement. We do not think that such intrusion is likely, but we will move to using time-limited cookies as an alternative to passing hidden variables.

Next: Experience, Limitations, and Generalizations Up: The SUDA project: Collaborative Previous: Managing editors

Raphael Finkel
6/2/1998