How to decode Apple Binary Property List files

Portrait photo of Andreas Pehnack

Andreas Pehnack

Binary Property Lists everywhere…

Inspired by the blog post in Apple’s BookmarkData – exposed! by @mikeymikey I took some extra minutes to build a grammar for binary property list files. A “little” search revealed that there reside more than 605,000 binary plist files on my system:

find / -type f -name "*" -exec file '{}' \; | grep "Apple binary property list" | wc -l

These files come with a variety of file extensions, apparently they are convenient to use by developers. This is a small list of file extensions of files which actually are binary plist files: plist, nib, sfl, qtz, strings, stringsdict, xcplugindata, xcuserstate, xcspec, xcplugincache, xcrequiredplugins, xcplugindata, webloc, webhistory, tracetemplate, etypes, scnp, scn, sks, colortable, classdescriptions, ibsearchdata, defaults, pbfilespec, loopdata and others…

Some of them like .strings can also appear as XML files which is another representation of property lists.

Understanding the File Format

In order to start creating a Synalyze It! / Hexinator grammar I took one of the .sfl files in ~/Library/Application Support/com.apple.sharedfilelist. (actually com.apple.LSSharedFileList.RecentDocuments.sfl)

As described on Wikipedia there’s some documentation about the file format in Apple's source code (freely available open source). The header file ForFoundationOnly.h has some useful structure definitions for header and trailer.

Header

After reading the simple 8-byte header next the trailer structure at the end of the file has to be parsed.

typedef struct {
uint8_t _magic[6];
uint8_t _version[2];
} CFBinaryPlistHeader;

_magic always contains the string bplist, _version two zero chars: 00.

Trailer

The trailer is more interesting:

typedef struct {
uint8_t _unused[5];
uint8_t _sortVersion;
uint8_t _offsetIntSize;
uint8_t _objectRefSize;
uint64_t _numObjects;
uint64_t _topObject;
uint64_t _offsetTableOffset;
} CFBinaryPlistTrailer;

The last variable _offsetTableOffset holds the file offset of the so-called offset table. The offset table consists of _numObjects entries, each of size _offsetIntSize. This means that larger binary plist files which require larger offsets have larger _objectIntSize values.

Offset Table

Usually you find the offset table immediately before the trailer structure since it can only be built after writing the objects. The index of the root object of the property list is indicated by _topObject. _offsetRefSize is used inside the objects for references to other objects. It determines the length of object numbers ‐ an index in the offset table.

Objects

Now if you follow the file offsets in the offset table, one of 14 object types will be found. Each object can be identified by its first four or eight bits:

null 0000 0000
bool 0000 1000 // false
bool 0000 1001 // true
fill 0000 1111 // fill byte
int 0001 nnnn ... // # of bytes is 2^nnnn, big-endian bytes
real 0010 nnnn ... // # of bytes is 2^nnnn, big-endian bytes
date 0011 0011 ... // 8 byte float follows, big-endian bytes
data 0100 nnnn [int] ... // nnnn is number of bytes unless 1111 then int count follows, followed by bytes
string 0101 nnnn [int] ... // ASCII string, nnnn is # of chars, else 1111 then int count, then bytes
string 0110 nnnn [int] ... // Unicode string, nnnn is # of chars, else 1111 then int count, then big-endian 2-byte uint16_t
0111 xxxx // unused
uid 1000 nnnn ... // nnnn+1 is # of bytes
1001 xxxx // unused
array 1010 nnnn [int] objref* // nnnn is count, unless '1111', then int count follows
1011 xxxx // unused
set 1100 nnnn [int] objref* // nnnn is count, unless '1111', then int count follows
dict 1101 nnnn [int] keyref* objref* // nnnn is count, unless '1111', then int count follows
1110 xxxx // unused
1111 xxxx // unused

The Apple engineers used this little trick to keep the file format as compact as possible: For data, string, array, set and dictionary objects the number of following object numbers is contained in the object identifier byte if it’s < 15 (0x0F). For all larger numbers ‐ as mentioned above – an integer object with the number follows.

The collections (arrays, sets, dictionaries) reference objects only by their number in the offset table. This makes sense because one object could be contained in multiple collections.

Summary

This is basically all you need to know to fully understand binary property list files. With this knowledge it was pretty straight forward to create the grammar which produces this nice colored hex view:

Synalyze It! Screen Shot with opened binary property list file

The Binary Property List Grammar

The grammar facilitates the structure inheritance concept so all different objects are derived from one Object parent structure.

Like in the ZIP grammar a small Lua script is used to continue parsing at the file end with the trailer structure:

byteView = currentMapper:getCurrentByteView()
fileLength = byteView:getLength()
currentGrammar = currentMapper:getCurrentGrammar()
— get the structure we want to apply
structure = currentGrammar:getStructureByName(“Trailer”)
bytesProcessed = currentMapper:mapStructureAtPosition(structure, fileLength-32, 32)

Screen Shot of Grammar Editor with binary property list grammar

The grammar can be downloaded for free manually from the Synalyze It! grammar page but should also be suggested automatically once a binary plist file is opened.