27 Mar 2008

Gabriel Endorses Textbook Piracy, DjVu >? PDF

Well, basically he does. With the way textbook prices are these days, it is hard to disagree. However, I have a few comments on his recommendation for DjVu over CHM over PDF.

Let’s discuss CHM for a moment. All it is is a compilation of HTML data (images and html text) into a proprietary, hard to replicate, compressed archive. That doesn’t sound so bad, but due to the nature of this compression, you cannot search on the text in the file, you can only search on the index created at the time the file was created. You cannot touch this index. Want to find where a certain phrase was uttered. If it wasn’t indexed as that phrase, too bad. You won’t find it. There goes one of the major advantages of your electronic text. Just avoid CHM.

Now, PDF and DjVu are good at different things. Assuming you are grabbing pirated content, you are probably downloading scans (are there publisher insiders who dump electronic versions of these things on the pirating cyberworld?). In this case, Gabriel is right. DjVu is probably the better format. It has a very smart compression algorithm that can make a scanned document substantially smaller than what is possible with PDFs and its naive use of compression. Both support OCR overlay, which is important.

However, I am not aware of any software that allows you to markup your DjVu files. Unfortunately, I no longer have access to Acrobat Professional. But when I did, I would take the PDFs of the article I was reading, mark it as being user editable or something along those lines (for the life of me, I couldn’t figure out what was being changed in the underlying file format, otherwise I would have scripted this) and I could highlight and take notes all within my plain old acrobat reader, even on linux. These notes and highlights could be exported or printed, with or without their context. This was all very convenient, though integration with research software would have been better.

Now, for legitimate content, DjVu may fail clearly due to its advantage being based almost solely on its performance on raster graphics. If you have access to vector graphics and plain text for the content, you can put these into a PDF in a lossless manner, that is, with effectively infinite resolution. And you can take notes on it. And yes, using high compression on a PDF you want to read is a terrible idea. You have the harddrive space, I promise.
If you have another effective method for electronic note taking, especially if it links back to the source material in a relatively automatic way, definitely send me a note. EndNote is not supported on Linux. Try again.

As for torrent clients, if you are a slave to windows, I suppose being a slave to μTorrent is the way to be. But OSX and Linux users should use rTorrent (inside screen, of course!).

As for the legitimacy of ‘rational crime,’ I’m not going to touch that.

Try Rational Crime!

Textbook prices induced depression? Then it might be time for some rational crime. Remember, the fault is not with the one who asks but with the one who accepts. It’s like this with prices as with anything else.

First, you want to find out about DjVu. Regarding formats, DjVu is preferable to CHM which is preferable to PDF (as a general rule of thumb). Why people insist on maximum compression for PDFs, thus ruining their scan, is beyond my powers of understanding.

Second, you need a good BitTorrent client and basic knowledge of what torrents are. Try googling “utorrent tutorial” or something like that. Google can help you find torrent trackers too. As a starter, a search for “Great Science Textbooks DVD” should be productive.

Finally, you might also want to try DC++. It’s more likely to find older stuff on DC and newer releases on torrents, due to the nature of the systems.

There, you can potentially save up to thousands of dollars over your schooling years. More money for books you don’t find online. Or food and rent.

One Response to “Gabriel Endorses Textbook Piracy, DjVu >? PDF”

  1. Gabriel says:

    Thanks! I was not aware of that particular limitation of CHM.

    My recommendations were based on what’s out there, rather than what the formats could do in different circumstances.

Leave a Reply