Search Mailing List Archives

Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[liberationtech] [open-science] Removing watermarks from pdfs (pdfparanoia)

Peter Murray-Rust pm286 at
Tue Feb 5 13:47:34 PST 2013

On Tue, Feb 5, 2013 at 9:15 PM, Bryan Bishop <kanzure at> wrote:

> On Tue, Feb 5, 2013 at 3:09 PM, Peter Murray-Rust <pm286 at> wrote:
>> PDF2SVG should be able to do this (
>> It should also remove the side annotations about which library the PDF was
>> downloaded from. Send me one and I'll see.
> Is there a svg2pdf? The problem with using pdfquery is that it can only
> generate an xml format, and at first it looks like pdfxml, except Adobe
> came up with a "standard" called pdfxml that looks completely different. So
> getting things back into pdf seems to be difficult.
I use Apache FOP.  We should be able to:
* read PDF into SVG
* remove the rubbish
* write the primitives back into PDF. We might get font problems so you may
have to make do with PDF/ISO standard 14 fonts. That might screw some of
the microkerning occasionally. If you want to reformat running text and
lose the publishers layout (e.g. 2-col => 1-col then we will use SVGPlus.

Some of this is alpha, not production.

> - Bryan
> 1 512 203 0507

Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the liberationtech mailing list