Scout: a Web extraction tool
- Student: Anthony L. Borchers 11/1998, now at Lexmark International
- Purpose: Apply data extraction techniques to World Wide Web documents
- Method: A general-purpose WWW robot engine with an extension mechanism for
attaching data-extraction procedures at runtime. These procedures may then
apply domain- or format-specific extraction methods.
- What the student learned
- Managing a large software project over a long period of time.
- Multithreaded programming in Java, including synchronization methods.
(This part took considerable cleverness.)
- Details of the HyperText Transfer Protocol (HTTP).
- Technical writing skills in preparing the write up and packaging the resulting tools