MarkUs Blog

MarkUs Developers Blog About Their Project

Annotating Images and PDF Files

without comments

As some of you probably already know, I’ve recently finished implementing the image annotation feature for MarkUs and already committed the work. MarkUs now supports the standard web image formats – jpg, gif and png. Currently I am in the process of adding the ability to annotate PDFs.

My approach to this task is to convert the PDF files to JPG format via the ImageMagick software and the RMagick gem designed for ruby and then display it in the browser. This seemed like a reasonable approach, as we already have the ability to annotate JPG’s. Currently, I successfully managed to integrate the PDFs and I am at a point where they are converted and displayed in the browser and the user is able to annotate them.

Great! So why not commit and ship these features right away you say? Right now, the main issue is the fact that PDF-JPG conversion is a very expensive operation on the order of ~2-3 seconds per page and we obviously need to accommodate for multiple-page submissions. For example, it takes a good 25 seconds to convert a 900kb 11-paged PDF file. Class sizes can easily range in the hundreds, so a 300 person class would require 7500 seconds – just over 2 hours to convert all the PDFs.

The length of the conversion is due to the need of preserving the file-content quality. On its default settings, an ImageMagick conversion gives a poor image quality, rendering standard-sized text unreadable and thus the submission is effectively useless. To combat this, I use supersampling – increasing the image resolution while decreasing the image size. This significantly lengthens the conversion process.

In light of this problem, we clearly want as little conversions happening as possible – 1 conversion per file is the best we can do. Thus I have decided to store the binary data of the images (post-conversion) in the database, for quick access. The alternative would be to store the converted file in the student repositories, but this is very undesirable for multiple reasons.

Thus we are faced with several new questions. When do we convert and how many files at a time do we convert? After speaking with Karen, we decided to kick-off a process to collect all submissions and convert them as soon as the first user logs in after an assignments submission date and grace periods have passed. I will create a queue-like object to handle this process. It will be responsible for accommodating any graders that try to mark a submission with a PDF in it before it has been converted, by bumping their job ahead of schedule. Thus waits for conversion of individual submissions may still occur, but they will happen less often. Colour coding the list of submissions may also prevent waits on the conversion process.

Written by c8braver

June 23rd, 2010 at 2:16 pm

Posted in Uncategorized

Leave a Reply