[Therion] Creating PDF/A map files

Bill Gee bgee at campercaver.net
Thu Oct 22 15:17:25 CEST 2020


Hello everyone -

I propose a new feature for Therion.  This will probably take some work, and I am sure there will be discussion about how to implement it.

It seems to me that the maps we produce with Therion are likely going to be stored for a very long time, perhaps running into multiple tens of years.  As we all know, computer technology over that amount of time will change drastically.  Just think about the contrast in both hardware and software in the last 25 years - from Windows 95 running on 486dx processors to Linux and Windows 10 running on i7 and i9 processors.

I think we have some obligation to make sure the cave maps we generate are still usable many years from now.  Saving them in PDF format is a large - but incomplete - step in that direction.

The new feature I propose is to modify the PDF creation code so that it produces files that are PDF/A version 1b (or possibly version 2) compliant.

https://en.wikipedia.org/wiki/PDF/A 

I have checked all of the PDF files I created in Therion, and none of them are flagged as PDF/A compliant.  It is possible that they are, in fact, compliant and simply do not have the necessary flag.  The experts can check that against the PDF/A specifications.

Existing PDF documents can be checked for PDF/A compliance with a command-line tool called "verapdf".  The web site for that tool is

https://openpreservation.org/products/verapdf/ 

It is possible to use GhostScript to transform an existing PDF into a PDF/A file.  The command line is daunting.

https://www.mcbsys.com/blog/2018/10/batch-convert-pdf-to-pdf-a-2018-edition/ 

I tried the GhostScript conversion on one of my Therion maps.  Immediately at startup it produced this message three times:

"GPL Ghostscript 9.53.3: UTF16BE text string detected in DOCINFO cannot be represented in XMP for PDF/A1, reverting to normal PDF output"

The process continued running and took about 10 minutes.  The resulting file failed verapdf analysis.  It also increased the file size from 4.3 megabytes to over 52 megabytes!  The output file displayed correctly in Okular.

I do not have any idea how Therion produces PDF files.  It probably uses some combination of TeX and GhostScript to get it done.  The new feature may be as simple as adding some additional parameters to the command lines that call the external programs.

Let the discussion begin!  :-)

-- 
Bill Gee

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.speleo.sk/pipermail/therion/attachments/20201022/ae9c0295/attachment.htm>


More information about the Therion mailing list