October Brings Horror
October 1st started off like a normal day. I went to move a 20 year old Celeron-based iPaq near my Teletype in order to finish setting it up for the party I was having a few days later. After getting everything plugged in, it refused to boot up insisting that /sbin/init and many other files were missing. I booted into a rescue Linux via PXE and sure enough, the files were missing. Since I had had so much trouble getting it setup I didn't think much of it and wiped it and started over with an older version of Debian.
Later when I was working on a project and was trying to pull changes from a git bundle using a working copy on my server, git kept insisting that the folder I was working in wasn't a git repo. That did seem a bit odd, but I had another copy in another folder and I just needed to get something done so I didn't really dig into it.
In the afternoon when I was ready to take a break from my project I remembered that it was the first day of October and it was time to put up all the Halloween content on my Plex server. I opened up the Plex interface in my browser and added the three folders I have full of Halloween content, waiting for everything to show up on the home page under "Recently Added". The spinner stopped but nothing showed up. What the heck? I looked to make sure that it had understood what I was trying to do and it was telling me the folders were empty. Why is Plex suddenly refusing to add my content?
Thinking there was a problem with Plex I googled for solutions but couldn't find anything relevant. Then I went to go look in the actual folders on my server and that's when I discovered the problem. All the files had been deleted. All of them. And I don't have any backups.
Haven't You Heard of Backups?
Not that I don't do any kind of backups or redundancy. I've been burned many times by a lot of drive failures and I have a pretty extreme backup routine:
- Main server root partition and user home directories are on a RAID1 across FOUR drives.
- Every night the root and home directories are copied to a dedicated backup server
- Each night the backup server writes changes to a tape backup in a 7 tape rotation
- Once a month a complete backup is done of the root and home partitions to tape
- Monthly tapes are in a 4 tape rotation
But the video content for Plex is just too much content (many terabytes), and since everything in there is replaceable I figured that in a catastrophe I could get everything back from the originals that were used to put it there. Plus the couple of times that I had a drive failure on the Plex partition I was able to recover most of it anyway.
It wasn't just the Halloween content that had been deleted though, it was every single file in the video folder. But it was only files that had been deleted, oddly all of the folders were still there. There were no problems with the disk drives, and the RAID status was fine. Thinking maybe I had some kind of NFS mount and bind mount combo problem messing things up from when I had been trying to build a Linux kernel on the iPaq, I rebooted my server. No good, the files were still gone.
Worst Thing to Do Is Panic
In a panic I started searching the interwebs to see if there was any hope of undeleting the files. I found a program called extundelete and it was already included with the distro. Unfortunately all it did was segfault. More searching and it seemed like extundelete was the only chance I might have. Thinking that maybe the ext2fs libraries in my distro were too old I spun up a docker container with the latest Debian and gave it permission to access the raw partition. No good, still segfaulting.
Got the source and compiled it, still no good. Got the latest source for the ext2 libs and compiled, still no good.
Dug into the source to try to figure out where it was crashing. Of course it was written in C++ so it was terrible spaghetti. Disabled optimization so that gdb would stop telling me everything was optimized out. Discovered that something was corrupting the blocks returned from malloc. Linked it against Electric Fence and finally narrowed it down to the problem area.
After quite a bit of work trying to understand what the program was doing and having to learn about the ext3 journaling system and structure, I patched extundelete and it stopped crashing and was finally able to load the journal.
And extundelete said the files were not in the journal.
Did You Really Need Any of Those Files Anyway?
By this point I had enough time to think about what had been deleted and what files were the most valuable. There was really only one thing: a two hour long holiday special that I had spent a couple of months editing down into a 25 minute show. I hadn't even had a chance yet to watch it with my family during Thanksgiving. Everything else I could restore. But the loss of this one file was devastating.
I thought that if I could at least recover the Adobe Premiere project file then I could recreate it. All of the footage I used came from other sources that I should still be able to get again.
The only way I could think of to get the file back was going to be to scan the entire 18 terabyte partition byte by byte. In order to do that I was going to need something to search for. Is there anything in a Premiere project file that makes it identifiable? Looking at a few .prproj files I have elsewhere, they all seemed to begin with the same byte pattern.
How to scan the drive? It turns out there's a way to make grep look for a binary pattern and print the byte offset.
grep -abo $'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'
No identifiable strings inside the prproj though, how will I know I found the right file? Turns out that prproj files are gzipped, and after unzipping they are XML. And inside the XML was more than enough information to figure out what the file originally was and even when it had been saved.
Help Me Python, You're My Only Hope
I hacked together a Python script that would read in the offsets as output by grep and check to see if what it had found was actually a Premiere project file. The script reads in 128k from the specified offset and passes it through gunzip. Fortunately gunzip was happy to ignore any extra garbage at the end so I didn't have to do anything tricky with trying to figure out where the end of the compressed file was, I could just feed it a chunk which was more than I needed.
After decompressing the 128k chunk the Python script checks to see if it's XML with the right beginning tags. If it is then it parses the XML to look for the filename and saves the XML out with the original filename as a prefix, along with a unique identifier in case I needed to run the script more than once.
It took two days for grep to finish scanning the entire partition. I anxiously fed in the list of offsets into my Python script with my fingers crossed that it would find the file I'm looking for. It found lots and lots of other Premiere files that I had completely forgotten about. Several minutes later it started printing what looked like correct filename for what I was looking for. After it was done I started loading the files into Premiere to see what I had recovered. The first file was from my project, but it was a very early one that had been saved probably not long after I had started the project. The second file I tried was pretty much the same. But a couple of files after that and I had found the one I wanted!
Now the only thing left is to get the original video file back!
I don't know that this hack to look for Premiere projects will be useful to anyone else, but I'm attaching it anyway. I'm also including the patch I made to extundelete.