add tag
4 years ago samcarter

It is often interesting to know if the output of two different documents (or the same document using different package/kernel versions) is the same.

Which different techniques can be used to test this?


Just as an example document for testing:

Top Answer
4 years ago samcarter

One possible approach is to make a visual diff of the resulting documents.

The following script is rather hacky and has a lot of external dependencies, but maybe it can be used as a starting point.

The main idea is that each page of the documents gets converted into a pixelate image and then the pixel values are compared. If they differ, they are highlighted in red.

Usage

It can be executed with

and will produce an image diff-<page number-1>.png for every page that has differences


Dependencies:

  • convert and compare from the ImageMagick library
  • pdfinfo
  • grep
  • awk
  • whatever I forgot in the list above

Disadvantages:

  • slow (I really mean it, don’t try with a very long document)

Advantages:

  • will not only show that something has changed, but provide visual context of what has changed. This way the user can for example judge if they mind that a word is 1 mm further down or not
  • will sort out which pages actually do have changes, so no need to check all of them

Output from the test document:

diff.png

Answer #2
4 years ago samcarter

Ulrike Fischer made me aware of the pdfpagediff package

This will produce a new document with the two previous ones overlayed. If you have adobe reader, you can use the layers menu at the left hand site to toogle between the documents to see if there are any significant changes.

Caveats:

  • If you have a document with opaque background (like beamer), you will only be able to see one layer at a time instead of both overlayed (but you will still be able to toggle between them)
  • only works with pdfTeX
4 years
Tejas Shetty replying to topnush — Wednesday, 19th May 2021 06:23

Thanks @topnush Diffpdf works quite well in Linux. Would have saved me a lot of time had I known about it earlier.

8 days
topnush — Tuesday, 11th May 2021 16:13

from the command line

topnush — Tuesday, 11th May 2021 16:13

it is literally “diffpdf doc1.pdf doc2.pdf”

samcarter replying to topnush — Tuesday, 11th May 2021 16:07

But maybe you could write up an answer that shows how to use this on linux?

samcarter replying to topnush — Tuesday, 11th May 2021 16:06

I tried that before, on my mac it crashed every time I try to open a pdf

topnush — Tuesday, 11th May 2021 16:02

I use it quite a lot

topnush — Tuesday, 11th May 2021 16:02

diffpdf works in linux

Enter question or answer id or url (and optionally further answer ids/urls from the same question) from

Separate each id/url with a space. No need to list your own answers; they will be imported automatically.