Testing PDFs with Cucumber and Rails
On a recent Rails project we have been working a lot with PDF documents. The application generates PDF invoices and account statements, places markers and comments into existing documents and also merges multiple single page PDFs into one. We do all this with a few PDF tools:
- the PDF::Writer library for Ruby to generate PDFs
- pdftk – to merge multiple PDFs into one or overlay them
- pdftotext part of the xpdf library, to extract texts from PDFs
On OSX all of these can be installed via MacPorts. Debian has packages as well.
While PDF::Writer is a Ruby library pdftk is a command line tool. We simple call it using Kernel.system and check the return code via the $? variable.
unless system (“pdftk #{source_pdf} … output #{target_pdf}”)
raise PdfError(“pdftk returned error code #{$?}”)
end
With all this PDF processing the need for testing the contents of the generated documents arose. We have factored out all the PDF processing into a bunch of extra classes which we simply unit test with RSpec: make sure the parameters are passed to the command line correctly, that the right exception is thrown for each return code etc.
In addition to unit testing we also write customer driven acceptance tests with Cucumber, where we assert on a high level the outcome of certain actions. With HTML pages we can simply use the built-in steps that in turn use Webrat to parse the HTML like this:
Given a purchase over 200 EUR
And an invoice
When I go to the start page
And I follow “Invoice”
Then I should see “200 EUR”
Now in our case the invoice link links to a PDF but we still want to know what’s inside the document. The solution we came up with looks like this:
Given a purchase over 200 EUR
And an invoice
When I go to the start page
And I follow the PDF link “Invoice”
Then I should see “200 EUR”
What this does in the background is follow the link as usual, write the response into a temporary file, convert that to text using pdftotext and write the result back into the response. This way we can make assertions about the contents of the PDF almost as if it were an HTML page (except for tags of course). Here is the implementation:
When ‘I follow the PDF link “$label”‘ do |label|
click_link(label)
temp_pdf = Tempfile.new(‘pdf’)
temp_pdf << response.body
temp_pdf.close
temp_txt = Tempfile.new(‘txt’)
temp_txt.close
`pdftotext -q #{temp_pdf.path} #{temp_txt.path}`
response.body = File.read temp_txt.path
end




February 18th, 2009 at 08:45
[...] PDFs with cucumber and rails http://upstream-berlin.com/2009/02/14/testing-pdfs-with-cucumber-and-rails/ « előző | következő » Peter Szinek — 2009. 02. 18. [...]
March 6th, 2009 at 20:11
[...] Testing PDFs with Cucumber and Rails | upstream agile – software [...]
March 10th, 2009 at 11:23
[...] Há muitos usos diferentes para o Cucumber tais como testes de integração de aplicações Web para sysadmins ou para integrar sistemas distribuídos e testar messageria ou testes de escrita de PDF. [...]
September 27th, 2009 at 08:53
Do you get the below message with the ‘pdftotext – q … ‘ call? It seems to parse the pdf fine, but still outputs the message in console, which clutters my tests:
Error: No paper information available – using defaults
September 28th, 2009 at 03:52
we also get this message but as far as i know it’s printed to stderr and hence doesn’t affect our tests. you could suppress this by directing stderr to /dev/null if you want to
November 10th, 2009 at 17:04
[...] How to test file uploads with Cucumber « /* CODIFICANDO */ (tags: cucumber rails ruby bdd file) Testing PDFs with Cucumber and Rails | upstream agile – software see how to download files (tags: pdf rails testing bdd cucumber [...]