Parsing Appleworks and Clarisworks file formats

Over the past few years, when I have downtime, I sometimes like to reverse engineer abandoned file formats. It is kind of like working on a crossword puzzle with the bonus that any progress you make helps people out there who are trying to archive, index, or convert their old files.

I’ve spent a lot of time trying to figure out file format for Appleworks and Clarisworks. My latest efforts have been to take a file, make a small change, then use Hex Fiend to compare what has changed in the binary format.

After years of off and on tinkering and documenting I finally wrote a basic parser for Appleworks and Clarisworks word processor files. I ‘believe’ this is the first free and open parser for this file format, even if it is ten years too late. I figured out a lot about the format, but it still has a long way to go. You can view my current documenting status here and download source for the parser on GitHub.

The parser so far can read:

document version
page size
margins
document content

From what I have seen, most people trying to read Appleworks documents only really care about the document content, but I am very close to figuring out how to parse:

styles – (bold, italic, underline)
footnotes

I may not touch it again for another year, but who knows.

Link to file format research

Download source code at GitHub