PDFs are tricky critters, aren’t they? Not precisely e-books in the truest sense of the word, they are more like ”dehydrated” printed documents—simply a printed form with the paper removed. But sometimes they’re all you’ve got—so you have to convert them into something else.
A few years ago, Paul Biba wrote a column about a PDF-to-Word conversion tool he found on the web, PDFtoWord. More recently, this inspired a representative from another company, FormSwift, to contact us. He told us that his company had tried to create a better-looking, more efficient PDF converter, and he thought they had succeeded. He pointed us at the FormSwift conversion tool and recommended we try it out. So I did.
I wish I could say I was more impressed. First of all, I tried uploading several PDF documents to the tool, such as Cat Valente’s Six-Gun Snow White that I got in the 2014 Hugo nomination voter packet. It barfed on every document that was longer than a few pages (though it assured me that it was informing its administrators).
I was finally able to get it to take the PDF of a short story that I wrote in Scrivener and exported in multiple formats, including PDF. When it did, it turned out that this tool’s idea of “conversion” was converting the PDF into an image of each page, and letting me put text boxes on it to type new words into. I could export the result—into a Word document that came up as a page-sized series of images. And this is for a PDF that was created as nothing but text from the outset.
I tried out PDFtoWord, just for comparison’s sake. And while it wouldn’t let me edit the document online—it converted it and emailed me the conversion—it did convert it into an actual Word document, with actual text in it. I could highlight it, change the font, and so on. It didn’t convert italics properly, but still, that’s better than sending me a whole “document” full of page-sized pictures. That’s not exactly a great improvement over keeping it a PDF.
Of course, a better conversion method for changing PDF into something else might well be Calibre. Calibre isn’t very user-friendly, but it was made for converting one format into another, with about a zillion customizable options on how the conversion happens.
The problem with Calibre is, though, it’s not smart enough by itself to get rid of those obnoxious headers or footers that have page number, author, or title in them, and those can drive you spare if you have to read a whole document with those interlaced throughout. But never fear, Dear Author to the rescue! Jane Litte just made a very useful post about using regular expressions in Calibre’s conversion mode to get rid of those annoying headers, and on how to tweak the settings to unwrap text properly. This is extremely useful, and I wish I’d had it last year when Six-Gun Snow White made me extremely grumpy for being PDF-only while I was doing my pre-Hugo-voting reading.