I wanted to add image and text preview of trending PDFs on nuuz.io . I have developed my own URL information extraction component, but it could not handle PDF URLs yet. These show up often in HackerNews, which is an important source for nuuz.io/tech. So I built PDF preview using pdftotext for the text extraction (only the first 500 chars) and using imagemagick to extract a screenshot of the first page. Together they do exactly what I wanted.