Print friendly
UpCast, from infinity-loop (see, converts Microsoft Word documents into XML documents. It is Java-based, so works on any computer platforms that supports Java, and comes as an interactive application or as an API that can be used to enable batch processing of documents. It is especially good at extracting images (replacing them with links to separate images files in the XML document), retaining tabular structures, and converting heading levels into section structures.
Neil Bradley has developed batch processing applications that include and configure the UpCast API. He has experience of the output formats of UpCast, and has written XSLT stylesheets to convert this output into structures that conform to client-specific DTDs.