We use ImageMagick to convert pdfs to jpgs to display them within the browser. Originally, pdf to jpg conversion was done with the RMagick gem for ruby, which was a nice api for ImageMagick. This worked fine until we started testing with pdf files containing hundreds of pages. The problem that went unnoticed until then was the amount of memory ImageMagick actually required to convert files. It was quickly gobbling up gigabytes of RAM and taking over the entire swap memory. Fortunately, we noticed this and managed to kill the process in time before our computers crashed.
After doing some research, i found a blog post at http://www.salas.com/2009/06/19/geeky-rmagick-and-memory-leaks/. This talks about the issues of using RMagick, namely the memory leaks it creates as ruby’s garbage collector doesn’t play nice with it. Upon further investigation, I found that this leak made MarkUs use 3 times as much memory as was needed for conversion. Furthermore, the post described a function designed to free memory. However due to the nature of RMagick, this meant that we would be using at least twice necessary memory used to convert the files by ImageMagick, as RMagick would save an object corresponding to an image, and create a new one for every manipulation performed on it. Thus there was always an unavoidable window after a modification and before the original image object can be destroyed to free its memory.
As a result, I have opted for issuing a direct call to ImageMagick’s convert function via ruby. This is a much better alternative, as it performs all the modifications in one line. The memory use issue still wasn’t solved here though. After more digging I found this formula on ImageMagick’s site http://www.imagemagick.org/script/advanced-unix-installation.php
Memory use = (5 * Quantum Depth * Rows * Columns) / 8
Where Rows and Columns were the size of the pdf file, and quantum depth was a setting configured at the pre-compilation phase of ImageMagick. It was impractical to reduce the quantum depth for several reasons. Firstly, having to recompile ImageMagick would not be very fun, especially since this would have to be done for every server MarkUs is hosted on. Secondly, picture quality would be severely compromised, as quantum depth is responsible for the number of colours in the image (setting it to 8 would impose a 256 colour restriction and we live in 2010 now…). Lastly, the quantum depth was currently set at 16, and its minimal value was 8. All that extra headache would only cut memory use in half, and still not prevent malicious users from crashing server simply by using a pdf file twice as big.
After more headaches, (I blame myself for them, as the solution should have been so obvious) I took a close look at all the optional arguments the convert command takes. The -limit option was exactly what I was looking for – being responsible developers, the ImageMagick team knew of its heavy memory requirements and created a solution for cases just like ours. The documentation can be found here http://www.imagemagick.org/script/command-line-options.php#limit. Essentially, this option allowed me to limit the amount of memory that the convert command was allowed to use. Now, instead of eating up all the server’s RAM and swap, convert gets all the memory it needs from the hard disk, where space is not an issue anymore.
The downside of course, is significantly slower conversion times when the hard disk is used. To counteract this, I have added an option in the markus configuration that will allow sysadmins to set the amount of RAM they want to be accessible by ImageMagick. We found that 100 megabytes will take care of most conversion needs, and should be enough to convert a 10 page US letter sized pdf, and the hard disk only needs to kick in for those extra big files.
This post is meant to be a guide to anyone who decides to tinker with the pdf conversion system in the future, as well as aiding admins in the (highly unlikely…. hopefully) case something goes wrong.