Aymeric on Software

Because we needed another of these blogs...

HFS+ Bit Rot

HFS+ is a terribly old filesystem with serious flaws. I sincerely hope that Apple comes with an update of the filesystem for WWDC 2015. After all they have been working on Swift for the past 4 years and we have just learned about it last week.

HFS+ is seriously old

HFS+ was released in 1998, in the era of Mac OS Classic. It predates the current Unix based version of OS X by at least three years. It was created in a period where Apple’s business was in such a dire situation that Michael Dell’s uttered this now infamous quote “What would I do? I’d shut it down and give the money back to the shareholders”.

Technically HFS+ is small evolution of its predecessor HFS which dates back to 1985. The major change from HFS to HFS+ is the transition of block addresses from 16 bits to 32 bits. This change was really needed, as hard drives capacity exploded in the late 90s. On a 16 bit addressing scheme, a file containing a single byte would use 16KB on a 1GB hard drive, and around 16MB on a 1TB hard drive. The other changes included the transition to longer filenames (from 31 to 255 characters) and a switch to Unicode encoding.

By and large the rest of the design of HFS+ has remained unchanged since 1985.

HFS+ has serious limitations and flaws

When it was first released, HFS+ did not support hard links, journaling, extended attributes, hot files, and online defragmentation. These features were gradually added with subsequent releases of Mac OS X. But they are basically hacked to death, which leads to a complicated, slow and not so reliable implementation.

In the early days, the system had a hard limit to the number of files that could be written and deleted over the lifetime of the volume. It was 2,147,483,648 (i.e. 2^31). After that, the volume would stop being able to add any more files or directories. On HFS+, every entry in the filesystem is associated to a CNID (Catalog Number ID). The early implementations used a simple a global counter nextCatalogID stored in a volume header, that could only be incremented by one until the maximum value was reached. More recent versions of Mac OS X can now recycle old unused CNIDs, but this gives you an idea of the types of considerations that went into the design of HFS+.

More recently Apple added support for full disk encryption with FileVault 2 and Fusion Drive. But these features are implemented in a layer underneath the file system, by Core Storage, a logical volume manager. Additional features like Snapshotting and Versioning would probably require much tighter integration to the file system… and they would also make TimeMachine extremely efficient and reliable. Currently TimeMachine is built on top of the file system and relies on capturing I/O events, which adds overhead and complexity. Another hack.

Finally there is Bit Rot. Over time data stored on spinning hard disks or SSDs degrade and become incorrect. Modern file systems like ZFS, which Apple considered but abandoned as a replacement, include checksums of all meta data structures content [1]. That means that when the file is accessed, the filesystem detects the corruption and throws an error. This prevents incorrect data from propagating to backups. With ZFS, you can also scrub your disk on a regular basis and verify if existing files have been corrupted preemptively.

A concrete example of Bit Rot

I have a large collection of photos, which starts around 2006. Most of these files have been kept on HFS+ volumes since their existence.

In addition to TimeMachine backups, I also use two other backup solutions that I described in a previous blog post. I keep a copy of the photos on a Linux microserver using ext3, which I checksum and verify regularly using snapraid. I also keep off-site backups using ARQ and Amazon Glaciers.

Before I acquired the Linux Microserver, I used to keep a copy of all my photos on a Dreamhost account. I recently compared these photos against their current versions on the iMac and was a bit shocked by the results.

The photos were taken between 2006 and 2011, most of them after 2008. There are 15264 files, which represent a total of 105 GiB. 70% of these photos are CR2 raw files from my old EOS 350D camera. The other photos are regular JPEGs which come from the cameras of friends and relatives.

HFS+ lost a total of 28 files over the course of 6 years.

Most of the corrupted files are completely unreadable. The JPEGs typically decode partially, up to the point of failure. So if you’re lucky, you may get most of the image except the bottom part. The raw .CR2 files usually turn out to be totally unreadable: either completely black or having a large color overlay on significant portions of the photo. Most of these shots are not so important, but a handful of them are. One of the CR2 files in particular, is a very good picture of my son when he was a baby. I printed and framed that photo, so I am glad that I did not lose the original.

If you’re keeping all your files and backups on HFS+ volumes, you’re doing it wrong.

How to check for file corruptions

I used the following technique to compare the photos from the Dreamhost backup against my main HFS+ volume.

I ran the shasum commmand line tool to compute SHA1 hashes of every single file in the backup folder, except .DS_Store files. Then, I ran shasum in verify mode to check the files on my main volume against the hashes. Differences either indicate voluntary modifications (which did not apply in my case), or corruptions courtesy of HFS+ (which was my case).

# Compute checksums
find . -type f -a ! -name ".DS_Store" -exec  shasum '{}' \; > shasums.txt
# Check against checksums
shasum -c < shasums.txt  > check.txt
# Filter out differences
cat check.txt | fgrep -v OK

You can use the same technique to check for corruptions on a single volume. You need to compute checksums and verify against them from time to time. If you use clone backups, it is probably a good idea to check for corruptions before doing the clone.

Addendum – June 11th, 2014

Thanks to everyone who spent the time to send feedback. There are a few things I would like to add:

  • [1] Erratum. ZFS uses checksums for everything, not just the meta-data.
  • I understand the corruptions were caused by hardware issues. My complain is that the lack of checksums in HFS+ makes it a silent error when a corrupted file is accessed.
  • This not an issue specific to HFS+. Most filesystems do not include checksums either. Sadly…
  • Other people have written articles on similar topics. Jim Salter and John Siracusa for Ars Technica in particular.