Aymeric on Software

Because we needed another of these blogs...

A History of the DMG File Format

Disk images have always been popular on the Mac. They play a similar role to “ISO” images on other platforms, in the sense that they can be used to clone medias or can be mounted/unmounted as a virtual disks. However DMG images offer a LOT more functionalities than traditional ISO images:

  • They can hold a partition table, and therefore contain more than one volume. (Typically a partition table with a single partition).
  • They may contain any kind of filesystem. (Typically HFS+).
  • Images can be compressed (via bzip2, zlib, or Apple’s own format for the old ones).
  • Images can be made as read-only, but also as read-write. (Installer images are typically read-only).
  • Read-write images can be sparse. That means the image file only grows as more content is put inside, rather than be pre-allocated to the full capacity of the virtual disk.
  • Images can have checksums to verify their integrity (options: several CRCs, MD5, several SHAs).
  • Images can embed Mac specific behaviors: contain a EULA, have a custom icon, automatically mount when downloaded by Safari, etc…
  • And I probably forget a lot of other features (RTFM: man hdiutils).

One of the initial motivations for using disk images, during the MacOS classic era, is that application files could not be reliably transmitted over non-Mac specific networks like the Internet. This limitation stems from the fact that files were divided into two logical parts, called forks: the “data fork” and the “resource fork”. The data fork is what is traditionally considered to be a file. The structure of the data is entirely determined by the developer. The resource fork on the other hand, contains a collection of data organized in a standard way and accessible through a dedicated API. Although it is possible to define custom resources, the OS would also provide lots of default types to hold classical information: user interface elements, rich text, pictures, etc…

On MacOS X, resource forks are still supported for backward compatibility reasons, but all application now use bundles instead. A bundle appears as a single icon to the end user, but it is in reality a directory containing several files. The role that the resource fork used to play to organize data is now played by the file system: each resource appears as an individual file within a hierarchy of directories. Hence, a typical MacOS X application is composed of several files, and disk images remain useful to solve the problem of their distribution across networks.

In order to safely transport data, and installers, Apple chose to favor the usage of disk images rather self extracting installers. Since the early 90s, all Apple software installation disks have been distributed as either disk images or physical medias. Early Mac users would probably remember either using Apple’s DiskCopy, or Aladdin’s ShrinkWrap, for creating or mounting disk images. The current DMG file format is a direct descendant of the New Disk Image Format (aka NDIS) that Apple introduced with Disk Copy 6 in 1996. This file format used the resource fork to store all the meta data: file version, partition table, type of compression, checksums, etc… and more importantly where to find the data for the virtual disk sectors. The data fork only contained the raw data (compressed or not) of the virtual disk.

The astute reader may have noticed that, since the NDIS images used forks, it was not suitable for safe redistribution across the Internet. Disk images had typically to be encoded with either BinHex or UUencode. When Apple introduced MacOS X in 2001, they decided to solve the problem once and for all. They designed the DMG file format, which they called Universal Disk Image Format (aka UDIF). DMG is a flat file format in the sense that it does not contain any forks, but it is basically a wrapper around the traditional NDIS format. A typical DMG file is a concatenation of the data fork, followed by the resource fork of an NDIS disk image.

Apple has never published the specifications of either the NDIS or the UDIF file format. The private DiskImage framework plays a central role in the implementation of DMG parsing and the mounting of disk images in general. The framework apparently supports plugins (see hdiutil plugins) but there is no developer documentation available. The only non-Apple plugin known was made by Connectix for VirtualPC images, during the PowerPC era (see Unlocking FireVault). However it does not appear that either VMWare nor Parallels have been able to use the private framework in recent times.

Although it is sad the DiskImage framework is closed, the DMG file format itself, is a bit of an open secret. It has been successfully reverse engineered many times, and there are a few open source implementations:

  • Several commercial software support it: MacDrive, PowerISO, MagicISO.
  • Vultur’s dmg2iso is probably the older open source implementation. It is a quick and dirty perl script to convert a DMG file into an ISO. It is deprecated, and its replacement is a cleaner and more complete C implementation called dmg2img.
  • Erik Larsson provides a series of Java tools that can parse DMG files. There is the simple DMGExtractor, but also the more complete HFSExplorer which provides a graphical interface to browse HFS+ volumes and DMG files.
  • Another interesting source is libdmg from planetbeing. It is a C library with UDIF and HFS+ support. It can be used to convert DMG from/to ISOs; extract files; or create DMG images from * scratch. The project was initially started as a way to manipulate Apple’s software restore packages for iPhone devices.
  • Finally, 7-zip on Windows is able to open and extract DMG files.