Search Mailing List Archives

Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[liberationtech] [open-science] Removing watermarks from pdfs (pdfparanoia)

jvoisin julien.voisin at
Sun Feb 10 18:07:49 PST 2013

I am the developer behind the previously cited MAT
( I just want to add my 2 cents based on what I
learned by developing metadata-anonymisation processes.

Since visible metadata like lines of text, or pictures can be detected
visually and removed with the help of some pdfminer-fu, I rather speak
about hidden metadata/watermarks.

Since PDF is a pretty complex format to process, I'm doing a rendering
of it on a cairo[1] surface, and then saving this surface to a PDF file.
Since this produces a completely new PDF, this strips a large part of
(if not all) hidden wartermarks/metadata, without transforming the text
into pictures. The whole process is implemented in MAT [2].

This could be added in pdfparanoia to counter hidden threats.


More information about the liberationtech mailing list