MarkUs Blog

MarkUs Developers Blog About Their Project

Archive for October, 2014

Status Report for October 3rd

without comments

Chris

Last Week:

  • Researched and created a blog post outlining the feasibility of PDF.js
  • Made some prototypes to prove PDF.js could work for annotations.

Road Blocks

  • Not much documentation outlining the actual PDF.js API so a lot of reverse engineering of the code needed to be done.

This Week

  • Start implementing PDF.js annotation system.
  • Continue work on CSV upload problem.

Yusi

Last Week:

  • Submitted pull request for Mark model RSpec test.
  • Researched and read some documentation about upgrading from rails 3.2 to rails 4, made a small to do list.

This Week

  • Start working on the upgrade by following checklist in the upgrade guide.
  • Find new issue to fix.

 

Written by Chris Kellendonk

October 3rd, 2014 at 2:03 pm

PDF Viewing and Annotation Improvements

with one comment

Current Problems with PDF Viewing and  Annotations

  • Collecting all submissions is slow because all the PDFs need to be converted into images first. This is a slow, expensive, and fragile operation.
  • Images are not resized and display at their native resolution. This can cause a poor user experience on smaller screens forcing the user to scroll around the page in order to view the entirety of the image.
  • Annotations don’t always seem as perfectly fluid as the screen is scrolled.

After research the state of online PDF viewers there does not appear to be many free options that would allow us to do all of the things we need to be able to do in a PDF. There is one library that stands out called PDF.js. It solves many but not all of these issues for us.

How PDF.js Can Help

  • There is no longer a need to do any type of pre-computation or conversion in order to view the PDFs. This cuts down significantly on the collection time of assignments that are mainly composed of PDF documents. As well as completely removes the fragile conversion process.
  • The library has the ability to control the zoom level and rotation of the document. Thus allowing large documents to be viewed easily on most devices.
  • It can print a document if need be.
  • As well as supports the native navigation options in PDF files. Such as paging, page previews, and the document outline (i.e. navigating through a table of contents).

All of the above features are supported natively, however the one major feature for us the library is missing is support for adding annotations to an existing document. Which is a core feature of the current system. This is something that would need to be implemented by hand as there are no existing solutions for this library or any other free online system.

Annotating PDFs

PDF.js does not support creating annotations on a document natively. It can display annotations that are already embedded in the PDF but cannot create new ones. I did investigate trying to use the native annotation system however there is not much documentation about how it actually is used so the code would need to be partially reverse engineered to determined exactly what it is doing. Using the native annotation system would be one option. The other could be achieved using the framework that the PDF.js library provides during its normal rendering process.

Technical Details

  1. A PDF could be annotated by creating a annotation layer that sits directly above the canvas and text layers that PDF.js renders. These elements would be rendered in the DOM to allow easy manipulation through Javascript. Such as “onHover” events to show the text when the user hovers their mouse over the annotation. Since each canvas sits inside a “page” container class with absolute positioning this allows any DOM elements to also be positioned so that they line up perfectly with the canvas beneath. Annotations could be positioned within this layer.
  2. The position and size of the annotations would be based off the current scale of the document when the annotation was added. All annotation size and positions would then be normalized so that when storing them in the database all coordinates are based off the document being at a 100% scale.
  3. The main hurdle with this system is keeping the position of the annotations in line with the document as it is zoomed and rotated. However this is not impossible and there are hooks into the API that would allow for this to be managed. Using `PDFView.currentScale` we can get the current scale of the document and adjust the annotation scaling and position to the document as the zoom level changes. The same technique could then be applied for rotation.

I believe using PDF.js is a viable option that could provide a superior benefit to the graders marking PDF files. It provides a much more natural experience viewing and annotating PDF files. As well as has the added benefit of removing the slow and fragile conversion process.

Written by Chris Kellendonk

October 3rd, 2014 at 10:29 am