Recently, I had a spec to produce PDF downloads/exports of a reporting suite. On the face of it, this doesn’t sound like a particularly complicated task, but after some research it turned out to be much more complicated than it needed to be.

The programming language was PHP, with some parts being converted to C++ using Facebook’s Hip-Hop. Ill therefore be talking about PHP in this post.

Googling ways of producing PDFs from HTML, using PHP brings up a number of solutions:

Most of these have some basic requirements, and some common limitations:

  • They need valid XHTML (unfortunately i was dealing with invalid XHTML)
  • None of them can replicate what JS would have done to a page
  • None of them can replicate what Flash would look like on the page
  • They are slow
  • Their XHTML & CSS support is limited (especially HTML5 & CSS3)
  • The solutions were slow to have to implement (I was going to have to write a lot of code)

To this end, i carried on looking – I know I needed a better solution than any of these offered. Eventually I stumbled upon WKHTMLTOPDF. This is basically just a server side install of WebKit, with a wrapper around to focus and extend some of its functionality. It’s key function – being able to save rendered web pages (including JavaScript execution), and save them as a PDF.

This actually makes generating PDFs absolutely easy. All you have to do, is send the HTML, CSS & JS that would have gone to the browser, to a file. Then pass this file into WKHTMLTOPDF, and serve the generated PDF to the browser!

Here’s a fe lines of PHP to show just how easy is to do!

  1.  
  2. // get the full html that would be sent to the browser.
  3. $html = ‘All the HTML of the page, with full server-side links to images & css & javascript eg. /var/www/css/global.css’;
  4.  
  5. // unique temporary name
  6. $tempFilename = ‘uniqueString’;
  7.  
  8. // write the html
  9. $fp = fopen($tempFilename.‘.html’, ‘w’);
  10. fwrite($fp, $html);
  11. fclose($fp);
  12.  
  13. // download and put WKHTMLTOPDF anywhere you like on your server
  14. // this is linux here. It must be executable by apache (or whatever web server youre using)
  15. $path = ‘/usr/local/bin/WKHTMLTOPDF’;
  16.  
  17. // this also adds in a delay of 800 miliseconds, to allow JS execution to finish
  18. $cmd = $path.‘ –enable-plugins –javascript-delay 800 /tmp/’.$tempFilename.‘.html /tmp/’.$tempFilename.‘.pdf’;
  19. exec($cmd);
  20.  
  21. // serve the PDF
  22. header("Content-type:application/pdf");
  23. header("Content-Disposition:attachment;filename=’export.pdf’");
  24. readfile($tempFilename.‘.pdf’);
  25.  
  26. // cleanup
  27. unlink($tempFilename.‘.html’);
  28. unlinke($tempFilename.‘.pdf’);
  29.  

And there you have it! Top quality PDFs, with minimal code!

FacebookTwitterShare

Good friend of mine has just done a short, 5 minute talk on writing effective user stories. This was filmed at PHP London yesterday.

Definitely worth a watch!

http://blog.mikepearce.net/2010/07/02/writing-effective-user-stories/

FacebookTwitterShare
OK, so the headline is a bit of an over statement, but for the vast majority of users on the net, this will be the case.

The theory is simple – use flash cookies for user tracking (in addition to normal cookies).

  • Flash cookies are not controlled by the browser
  • Flash cookies are CROSS browser
    • If I set a cookie in Internet Explorer, the value will instantly be available in Firefox, Chrome, Safari etc.
  • Flash cookies can hold a LOT of information (100kb be default, up to umlimited)
  • Flash cookies can have an unlimited life span
How many people do you know that have ever deleted their Flash Cookies?
(And how many of them are non-technical?)

For user tracking, this has immense capabilities. Providing you can create a UUID in JavaScript (you can), then you can track an unlimited number of people, and the vast majority of them will not have the knowledge to stop you tracking them.

This is nothing new, but I know that it’s not something that is normally employed.

Here are a few links for reading a bit more around the subject too:

FacebookTwitterShare

A little bit of history

When I left University, I was offered an opportunity at Jellyfish Online Marketing; and what an opportunity it was. I honestly do not believe I could have kicked off my career in IT at any better organisation. Learning, working, having fun – that company had the whole package for a university leaver.

I stayed for 4 years, working my way up from Developer, to Senior Developer, to Lead Developer eventually; Scrum mastering a team of very skilled developers (who I still miss like mad, both professionally and socially)

At the beginning of March (only 3 months ago) I accepted, and started a role at Bloomberg, as a Senior Developer. More specifically, I was to be working in a team, recently integrated into Bloomberg, called BNEF (Bloomberg New Energy Finance).  This was a small open-source web team, working on this new site, and bringing it in line with the Bloomberg standards, and way of life.

For those of you who do not know, Bloomberg are the worlds largest independent news organisation, owned by the Mayor of New York, Michael Bloomberg. They focus on providing news and investment information to stocks and shares traders, and well as investors.  Their core revenue stream comes from a single product, called The Terminal.

This is an organisation known world-wide for having an excellent IT infrastructure, based around their core product, The Terminal. This is a news system, designed for stock/shares traders, and thus far has cost over $100′s of million to make in total (over the past 20 odd years). In the UK alone, there are over 300 developers working on it every single day.

The future

A few weeks ago I was approached by a recruiter regarding a Senior Web Technologist/Software Engineer role for a large, well known company. Upon seeing the spec of the job, I entered a rather difficult interview process! Over the course of a few days, I had not 1, not 2, not 3, not 4, but 5 interviews with various members of staff at AOL.

AOL recently broke away from Time Warner, the media company which they merged with a few years ago.  Since then, they have repositioned themselves, changing their core business, and creating more revenue streams.

It’s therefore with great excitement that I will be starting at AOL in just over 2 days time, in a hugely important team for the business, working on some very large scale products – I am truly excited!

Here’s a few links of AOL being in the news recently:

Please note, that I do not represent any of the organisations mentioned in this post, and this is all opinionated work.

FacebookTwitterShare

Just another very quick blog post, this time about loading JavaScript files in web pages, and just a few quick tips, and gotcha’s about it.
This is by no means a definitive list, but if you can do most of these things, then you’re certainly on the right track!

  • Do not put ‘document.write’ into JS, the browser will have to to stop while it works out where to put this in the DOM, and how to render it. It will slow down your page.
    To make things worse, if you put the JS at the bottom of the page, and put document.write in, you make it even slow, by making it the last thing the browser does.
  • Load JS files from a CDN or a different domain. However, the general limit is 4 domains in total (XHTML, CSS, JS, IMAGES etc). However, most people find that 2 is the best amount for the trade off between DNS lookups and extra concurrency.
  • Make JavaScript external – allowing the browser to cache full requests, rather than including it in the XHTML, and increasing the page request size
  • Related to the previous point, but make the external domains contain no cookies
  • Minify the JS
  • Put JS files at the bottom of the page
    or
  • Use the DEFER attribute on SCRIPT tags (although remember this is not supported by Firefox)
  • Use tools to help you find any performance problems, bottlenecks:
    Google PageSpeed – http://code.google.com/speed/page-speed/docs/using.html
    Yahoo YSlow – http://developer.yahoo.com/yslow/

Some links, and related articles:

FacebookTwitterShare

About this blog

Blog of Jon Reed. I am Senior Software Engineer, at AOL UK. I believe in working had & playing hard. I love gadgets and technology.

  • Jon Reed: That's the great thing about trying to track people this way. These cookies are detected/set/get all [...]
  • Mike Pearce: Good post and some useful information. However, the biggest problem with use flash cookies is Apple. [...]
  • Paul M.: Congratulations! [...]
  • Mike: Woo! [...]
  • Mike Pearce: Great post mate, some advanced stuff. What do you suggest as an alternative to document.write(). [...]