Results overview: Found 1 records in 0.04 seconds.
Academic theses (ETDs), 1 records found
Academic theses (ETDs) 1 records found  
Scripts for automated editing of fonts in PDF files
Gmitter, Jakub ; Zeman, Kryštof (referee) ; Hanák, Pavel (advisor)
Master's thesis deals with the issue of font encoding in PDF documents. Proper font encoding is necessary for searching and copying text from such documents. Thesis includes a description of the internal structure of PDF documents, page representation, font types and their encoding, and important objects needed for proper font representation. Understanding of these areas was necessary for development of scripts that are able to repair incorrect font encoding. Two Python scripts were developed as part of the thesis. The first one verifies the integrity of repaired PDF files using SHA-256 hashes computed from their contents. The second script repairs corrupted font encodings in the documents. The necessary information for the functionality of both scripts has been stored in the corresponding JSON structures. The repair script targets PostScipt fonts of type 1. Core function of the repair script is the generation of a ToUnicode object that correctly maps glyphs to Unicode codes within the font. The script has been tested on approximately 200 electronic issues of a Czech magazine that have been provided as sample data. From these sample files, only those that had completely corrupted font encodings were chosen for further work. Other sample magazines only had corrupt encoding of characters with diacritical marks. These magazines were analyzed, but the script is unable to repair them. Commented Python source code, compiled Windows executables and a user guide are available in the electronic attachment and in the author's GitHub repository.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.