On Thursday, I was downloading some files and preparing a Spring course for work when suddenly my disk was remounted read-only, with some pretty scary messages in the syslog. What exactly went wrong is unknown, but the filesystem was pretty much gone.
I did a raw disk backup using
dd if=/dev/sda
Then I booted a rescue cd (I just used an Ubuntu 9.04 installation cd) and ran fsck -n /dev/sda3, just to see if fsck could make it through. It turned out it couldn’t, it failed with a message like
“error while iterating over blocks in inode
And then fsck would stop. Of course, the filesystem couldn’t mount, and I didn’t have a recent backup, so I was somewhat screwed.
Google didn’t help much, other than the usual advice to format (or buy a new disk) and reinstall, which was not exactly an option. So I fired up debugfs and dug around a little. At this point, I just wanted to salvage as much as possible, and I was quite aware that some dataloss was inevitable.
The magic commands in debugfs turned out to be
clri
freei
kill_file
Running these 3 commands every time fsck failed got fsck all the way through the repair process. At some point, fsck began complaining about corrupted directories, but the same 3 debugfs commands also took care of this.
I was then able to mount the disk and copy all files to an external device. Of course, the most important dir, my home dir, was gone, and lost+found contained a couple of thousand files and dirs, which I have now gone through to find most of the important stuff I had on the disk. Now I just need to get a new disk, reinstall and copy as much as I can from the old disk.



