Testing PDFs with Cucumber and Rails

On a recent Rails project we have been working a lot with PDF documents. The application generates PDF invoices and account statements, places markers and comments into existing documents and also merges multiple single page PDFs into one. We do all this with a few PDF tools:

  • the PDF::Writer library for Ruby to generate PDFs
  • pdftk – to merge multiple PDFs into one or overlay them
  • pdftotext part of the xpdf library, to extract texts from PDFs

On OSX all of these can be installed via MacPorts. Debian has packages as well.

While PDF::Writer is a Ruby library pdftk is a command line tool. We simple call it using Kernel.system and check the return code via the $? variable.

unless system (“pdftk #{source_pdf} … output #{target_pdf}”)
raise PdfError(“pdftk returned error code #{$?}”)
end

With all this PDF processing the need for testing the contents of the generated documents arose. We have factored out all the PDF processing into a bunch of extra classes which we simply unit test with RSpec: make sure the parameters are passed to the command line correctly, that the right exception is thrown for each return code etc.

In addition to unit testing we also write customer driven acceptance tests with Cucumber, where we assert on a high level the outcome of certain actions. With HTML pages we can simply use the built-in steps that in turn use Webrat to parse the HTML like this:

Given a purchase over 200 EUR
And an invoice
When I go to the start page
And I follow “Invoice”
Then I should see “200 EUR”

Now in our case the invoice link links to a PDF but we still want to know what’s inside the document. The solution we came up with looks like this:

Given a purchase over 200 EUR
And an invoice
When I go to the start page
And I follow the PDF link “Invoice”
Then I should see “200 EUR”

What this does in the background is follow the link as usual, write the response into a temporary file, convert that to text using pdftotext and write the result back into the response. This way we can make assertions about the contents of the PDF almost as if it were an HTML page (except for tags of course). Here is the implementation:

When ‘I follow the PDF link “$label”‘ do |label|
click_link(label)
temp_pdf = Tempfile.new(‘pdf’)
temp_pdf << response.body
temp_pdf.close
temp_txt = Tempfile.new(‘txt’)
temp_txt.close
`pdftotext -q #{temp_pdf.path} #{temp_txt.path}`
response.body = File.read temp_txt.path
end

Tags: , , , , , ,

6 Responses to “Testing PDFs with Cucumber and Rails”

  1. Turulcsirip - Peter Szinek Says:

    [...] PDFs with cucumber and rails http://upstream-berlin.com/2009/02/14/testing-pdfs-with-cucumber-and-rails/ « előző | következő » Peter Szinek — 2009. 02. 18. [...]

  2. Ennuyer.net » Blog Archive » 2009-03-06- Today’s Ruby/Rails Reading Says:

    [...] Testing PDFs with Cucumber and Rails | upstream agile – software [...]

  3. ADSystems » Receitas de Desenvolvimento Orientado a Estórias com Cucumber Says:

    [...] Há muitos usos diferentes para o Cucumber tais como testes de integração de aplicações Web para sysadmins ou para integrar sistemas distribuídos e testar messageria ou testes de escrita de PDF. [...]

  4. Chris Says:

    Do you get the below message with the ‘pdftotext – q … ‘ call? It seems to parse the pdf fine, but still outputs the message in console, which clutters my tests:

    Error: No paper information available – using defaults

  5. Alexander Lang Says:

    we also get this message but as far as i know it’s printed to stderr and hence doesn’t affect our tests. you could suppress this by directing stderr to /dev/null if you want to

  6. InVisible Blog » links for 2009-11-10 Says:

    [...] How to test file uploads with Cucumber « /* CODIFICANDO */ (tags: cucumber rails ruby bdd file) Testing PDFs with Cucumber and Rails | upstream agile – software see how to download files (tags: pdf rails testing bdd cucumber [...]

Leave a Reply