AI Meets Astrology

Creating a Digital Edition of Anton Brelochs’s 1529 Practica with Transkribus and ChatGPT

Nathan R. Ericson, Library Director, Wisconsin Lutheran Seminary

Abstract: This paper explores the integration of AI tools like Transkribus and ChatGPT in creating a digital edition of Anton Brelochs’s 1529 Practica Teutsch. The research uses the rare intact copy at the Wisconsin Lutheran Seminary Library to assess the efficacy of AI in transcription, translation, and tagging processes. Key questions include AI’s role in identifying and correcting transcription errors, speeding up translations to enhance accessibility, and automating TEI XML tagging. The study highlights both the strengths and limitations of these AI tools, concluding that while they offer valuable assistance, human oversight remains crucial for accuracy and scholarly reliability. Additionally, the paper examines AI’s potential in interpreting historical illustrations, such as the astrological woodcut in Brelochs’s work, demonstrating its capabilities in suggesting plausible identifications of depicted figures.

Introduction

The impetus for this paper was the discovery of a unique volume in the rare books collection of the Wisconsin Lutheran Seminary (WLS) library: the only known intact, extant copy of the Practica Teutsch for 1529 by Anton Brelochs, who served as the city doctor of Schwäbisch Hall from 1517 to 1559 (Institut Deutsche Presseforschung 2024). (Another fragmentary copy is in the collection of the University of Kiel and was recently restored with the help of images from the WLS library copy.) At the WLS library we are planning projects that would create digital editions of the systematic theology works of Abraham Calov and Johannes Andreas Quenstedt. The discovery of the Practica by Brelochs presented the opportunity for a pilot project to create a digital edition of a much shorter work. Our hope was that in the process we would be able to explore the available tools, methods, and platforms to better inform our digital edition project plans.

Digital editions bring together facsimiles (images), transcriptions, and often translations of one or more manifestations of a work—usually of non-digital materials such as manuscripts, letters, and printed books—in an online interface that is suitable for scholarship. (For an example, see the Taylor Editions platform from the University of Oxford at https://editions.mml.ox.ac.uk/.) Digital editions are commonly expressed in TEI XML format, which is a text file that abides by the XML specification and makes use of the Text Encoding Initiative (TEI) standard for the representation of texts in digital form. The goal of the TEI standard is to produce files for digital editions that are both human readable as text files (and thereby suitable for long-term preservation) and machine readable through markup language (and therefore suitable for encoding meaning in a way that can be processed by computers). One strength of digital edition platforms is that multiple versions of facsimiles, transcriptions, and translations can be provided by creators and individually selected for comparison by users.

The recent advent of AI-based tools such as ChatGPT led us to ask how AI-based tools might assist in the creation of a digital edition, whether that would be through lightening the workload, improving the process, or advancing what is possible in creating a digital edition. Specifically, this paper explores how AI-based tools can help with the transcription, translation, and tagging that is necessary in the creation of a digital edition: Can AI identify and correct errors introduced in the transcription process? Can AI speed up the process of making digital editions more accessible through translation? Can AI automate the process of adding TEI XML tags to a transcription and accurately linking those tags to named entity authorities? As a bonus, in the case of the 1529 Practica, this paper also explores the usefulness of ChatGPT in interpreting the astrological woodcut featured on the pamphlet’s opening page.

The Practica Teutsch for 1529 by Anton Brelochs

The WLS library’s rare book collection contains about 4000 items across 2200 titles. Of these titles, two are from the 1400s, 400 are from the 1500s, 600 are from the 1600s, and 1200 are from the 1700s or later. Beginning in 2001, we sought to at least begin to answer the question of which of our rare books were truly rare and which were old but perhaps not so rare. A study of the 30 oldest books in the collection discovered several that had not yet been digitized, several for which there were less than ten known copies, and one—the 1529 Practica Teutsch by Anton Brelochs—for which the only other catalog record found, at the University of Kiel, noted that the bottom half of all pages was missing. A subsequent email conversation indicated that the catalog description was a bit generous regarding the current condition of the item. The WLS library copy is intact, though with slightly wrinkled pages and other wear; photographs of the item can be viewed in the Internet Archive at https://archive.org/details/brelochs-practica-1529/.

The production of German, Italian, and Dutch practicas, annual booklets of astrological calendar-based prognostication not dissimilar to the modern Old Farmer’s Almanac, began soon after the invention of the printing press and flourished throughout the sixteenth century. Their literary design is that of the overhearing of a “privileged conversation” (Green 2012, 82) between an astrologer and the rulers of a territory, where the buyer gains a glimpse of this otherwise-obscured knowledge. Most practicas follow a prototypical form, including elements such as the name practica, a title page containing a woodcut with representations of featured planets for the year, an identification of the author, a dedicatory epistle with biblical justification for astrological prognostication, and finally several chapters containing predictions for various spheres of earthly life for the coming year (Green 2012, 111–15).

The WLS library copy consists of 16 pages in two separately stitched quartos, four leaves each, held together by a tape binding. Pages are 15 cm wide by 20 cm tall with horizontal chain lines. The work and its instances at Kiel and Mequon (where Wisconsin Lutheran Seminary is located) are now listed in VD16 (Bibliography of Books Printed in the German Speaking Countries of the Sixteenth Century); in addition to the Internet Archive, a digitization of our copy is housed in the WLS institutional archive and digital library.

AI and Transcription

Can AI-based tools help with the transcription of written or printed documents? One type of tool is already in regular use for transcription. Handwritten Text Recognition applications such as Transkribus (https://www.transkribus.org/) and OCR4all (https://www.ocr4all.org/) use AI technology to determine likely bounding boxes and lines of text within both printed and handwritten documents and allow users to train models for the recognition of characters and words. Transkribus is hosted online and operates on a subscription model; OCR4all is a locally installable Docker image. (Potential users should note that as of this writing, OCR4all has not been ported to Apple silicon.)

I used Transkribus to perform initial transcription of the Practica using the Transkribus Print M1 model that recognizes, among other typefaces, the Schwabacher typeface used in the Practica. I sampled both color and grayscale images of the pages. Although the color images were easier for my human eye to read when checking the transcription, it seemed that Transkribus produced equal results with either type of image. The sixteen pages of the Practica (the first and last of which contained substantially less text than the others) took a total of 10 minutes, 10 seconds for Transkribus to process. I then proceeded to correct Transkribus’s transcription in the interface that Transkribus provides for this purpose, comparing the transcription with the original document and its images, but only after saving the raw transcription to a text file to experiment on it using another AI-based tool, ChatGPT. The edited transcription is currently viewable on Transkribus at https://app.transkribus.org/sites/brelochs1529/doc/2694510.

My first attempt to see how well ChatGPT could identify and fix transcription errors involved creating my own custom GPT with ChatGPT 4o, to which I provided instructions that it should take any text it received, identify and correct errors in the raw transcription, and highlight the changes in the response it produced. The interactive custom GPT design process produced the following set of instructions:

This GPT assists users in detecting possible errors in OCR transcriptions of historical German texts, specifically from a 1529 German pamphlet or practica teutsch. The GPT identifies and corrects transcription errors, highlighting the changes made in bold type while ignoring any text enclosed within angle brackets. It does not make orthographical corrections unless they are necessary to resolve a transcription error.

Role and Goal: The primary goal is to ensure accuracy in OCR transcriptions by identifying and correcting errors in the text. The role involves carefully reviewing the text, making corrections where needed, and highlighting these changes for easy identification.

Constraints: Avoid making unnecessary orthographical corrections. Ignore any text within angle brackets as these are locators for human readers. Focus only on identifying and correcting transcription errors that affect the text’s meaning or coherence.

Guidelines: Always highlight corrections in bold type. Review the text for inconsistencies, nonsensical parts, or obvious errors resulting from OCR transcription. Use historical knowledge of the text to make informed corrections where possible. Include corrections with the main body of the response as opposed to afterward.

Clarification: If the text is ambiguous or unclear, make an educated guess to correct the transcription errors. If unsure, err on the side of providing a coherent reading (rather than maintaining the original transcription unless it is clearly incorrect).

Personalization: Maintain a professional and helpful tone, ensuring clarity and precision in responses. Be patient and thorough when reviewing the text for errors. Automatically process pasted text according to these instructions, even if no explicit prompt is given.

The results using the custom GPT were not good. It wasn’t all bad; the GPT made several good corrections in the section that I examined. However, it also made unrealistic and unnecessary corrections, missed obvious corrections, and occasionally allowed nonsensical words to pass through unfiltered. Overall, it proposed very few corrections compared to the number that I identified. Ben and Sara Brumfield observed similar results when exploring the use of AI-assisted transcription for their crowdsourcing platform, FromThePage. They concluded that it was better to ask ChatGPT to highlight obvious errors (creatively using the “hmm” or “thinking face” emoji to highlight these locations for crowdsourcing participants) than to ask ChatGPT to supply corrections (Bastida 2024).

For my second attempt, I used a different approach. Recalling the late-2023 trend of ChatGPT sometimes refusing to follow explicit instructions (often referred to as the AI being “lazy”), I attempted a bit of misdirection: asking ChatGPT to perform a secondary task that would, in the process, accomplish the primary task I had in mind, that of transcription correction. Instead of using a custom GPT with complex instructions, I simply directed ChatGPT 4o, “The following is 1529 German. Please convert to modern orthography,” and then provided the raw text under consideration.

The results now were much better. ChatGPT’s fixes generally agreed with my fixes, and what ChatGPT produced generally made sense where the original lacked sense—and did so accurately. There were exceptions, of course, in which ChatGPT mistook a word and altered the overall meaning. Context may determine whether this is a critical problem. On the one hand, it may not detract from a casual reader’s ability to get an overview of the content of the work. On the other hand, it will not serve as an accurate or dependable basis for scholarly research. In addition, an obvious drawback to this approach is that once ChatGPT provided modern orthography for the 1529 German, the transcription was no longer a diplomatic transcription that accurately represented what was on the page. Therefore, I conclude that the best method for transcription improvement is not the use of ChatGPT, but rather the training of the recognition model used in Transkribus (or other HTR applications) so that it is more attuned to the nuances of the document being transcribed. This will be our library’s approach going forward as we seek to transcribe longer works, namely training a custom model on the first n pages of the work so that the remaining pages may be transcribed with greater accuracy.

Links to the transcriptions produced may be found on the project website at https://nericson.github.io/brelochs/.

AI and Translation

Given that the 1529 Practica is written in German and mein Deutsch ist nicht so gut, I next experimented with asking ChatGPT 4o to translate both the raw transcription and my edited transcription into English. The difference between the two results was difficult to measure because of ChatGPT’s “temperature” setting, which provides some randomness in the responses given when the same prompt is given multiple times. (Future experiments could make use of the ChatGPT API, where the temperature may be set “lower” by the user.) Accounting as best I could for “artistic license” in translation, the translations of the raw and edited transcriptions appeared overall to be similar, with several improvements in the translation of the edited transcription and an occasional better reading in the translation of the raw transcription.

My conclusion from this experiment is that AI-based tools may be very useful for improving accessibility to the content of works written in languages not familiar to the user, especially when an overview of the content is more important than specific detail.

As a side point, in my experimentation I also observed that it’s important to work against ChatGPT’s tendency to paraphrase. If I asked ChatGPT to translate “very idiomatically,” it would often substantially shorten the text under consideration, perhaps still conveying the overall meaning, but also eliminating much of the detail. I found a much more workable translation by asking ChatGPT to translate “rather idiomatically”—or even “rather literally” if I wanted a more word-for-word representation of the original language.

Links to the translations produced may be found on the project website at https://nericson.github.io/brelochs/.

AI and Tagging

I briefly explored using ChatGPT 4o to create the tags necessary for producing a TEI XML document, using instructions such as “Please encode this text in TEI XML, providing tagging for entities such as dates, people, places, and heavenly bodies. Use terms from LCSH and LCNAF.” While ChatGPT provided the shell of a TEI XML document and identified many named entities within the text, it didn’t always connect that named entity correctly with authority files. For example, it connected the term Astronomie with LCSH identifier sh85009152, Athletic shoes, rather than with identifier sh85009003, Astronomy. (Numerically close, but not quite.) It connected the person Ptholomei Pheludiani not with Wikidata identifier Q34943, Ptolemy, but with identifier Q303, Elvis Presley. Since many others are pursuing the use of AI in named entity recognition, I decided not to pursue my investigation of AI-based tagging further at this time.

AI and Image Analysis

Finally, I tested whether ChatGPT would be able to identify the figures pictured in the woodcut on the first page of the Practica (see figure 1). Using descriptions in Green (2012, esp. 112–13 and 125) of the woodcut images usually found in Practica, I am confident that the three figures are gods represented by planets due to their situation above the earthly sphere and due to each figure being equal in size to the others. Furthermore, I am confident that the figures are those of the gods Mars, Venus, and Saturn, not only due to Mars’s sword, Venus’s feminine form, and Saturn’s scythe, beard, and wool-clad feet, but also due to their prominence (Saturn) or specific mention (Venus, Mars) in the Practica’s text.

Image of the first page of the 1529 Practica Teutsch by Anton Brelochs, with a ruler showing its height of approximately 8 inches. The title page includes an approximately square woodcut featuring three celestial figures representing Mars, Venus, and Saturn. Venus holds a circle representing the earth, depicted by a small German city near the mountains, surrounded by stars and planets.

Figure 1: First page of the 1529 Practica Teutsch by Anton Brelochs

Interestingly, in November 2023, ChatGPT 4 correctly identified only the central figure of Venus, and with some uncertainty. The prompt I used at that time was rather exploratory: “There’s [a] picture in this document. The document is a scan of a practica teutsch from 1529 by Anton Brelochs. I have no idea what the picture is about. What can you tell me about who is in the picture or what they represent?” ChatGPT 4 responded:

The left figure, a man, is dressed in what looks like armor or attire befitting a soldier or knight, suggesting he may represent martial prowess or chivalry. He holds what appears to be a sundial, which could symbolize the importance of time or astronomy.

The central figure, which appears to be a partially unclothed woman, is surrounded by a circle filled with stars and a landscape, likely representing the celestial sphere or the universe. She might symbolize nature, virtue, or even a specific celestial body like Venus, often associated with femininity.

The right figure, another man, is holding a staff and book, wearing robes that suggest scholarly or clerical status, possibly representing wisdom or learning. He could be a scholar, a cleric, or an astrologer, given the context of the image.

That sounds plausible enough. However, it’s not quite the full picture—and, notably, ChatGPT improved its interpretation in its next version. In June 2024, ChatGPT 4o clearly identified all three figures in response to the prompt “Here is the opening woodcut from a 1529 practica teutsch, or astrological prognostication pamphlet. What is being explicitly or implicitly depicted or represented? Who are the figures depicted?”:

Left Figure: A soldier or warrior, depicted with armor and a sword. This figure represents Mars, the Roman god of war, symbolizing strength, conflict, and aggression.

Middle Figure: A naked woman, holding a book and with a peaceful expression. This figure likely represents Venus, the Roman goddess of love, beauty, and harmony.

Right Figure: An elderly man with a long beard, holding a scythe. This figure represents Saturn, the Roman god associated with time, aging, and harvest, often depicted as the grim reaper.

Based on this example, ChatGPT appears to hold great potential for helping to interpret artwork—at least providing suggestions for research, if not solid conclusions.

Conclusions and future research

Based on this exploration with the AI-based tools Transkribus and ChatGPT, I conclude that current AI-based tools have some usefulness in the preparation of digital editions and in making early printed works more accessible to modern readers. Transkribus quickly provides a base transcription that can be edited by a human editor; the application also allows for transcription models to be trained to improve future transcription. ChatGPT allows the end user to quickly assess the content of an early printed work, whether in the original language or in translation. For scholarly use, the oversight of a human editor remains crucial. There is potential for the use of AI in the creation of TEI XML documents and tags, and for its use in interpreting artwork of the era.

This paper has explored little about the content of the Practica itself. Future work might compare, for example, the defense of Christian use of astrology provided by Brelochs in the Practica’s preface with Martin Luther’s introduction to the 1527 Wittenberg edition of Johannes Lichtenberger’s 1488 Prognosticatio—the predecessor of and prototype for the practicas that followed—in which Luther softly pans the usefulness of astrology for knowing anything specific about the will of God. Indeed, in such an investigation, AI might play a different type of role, namely helping to identify the explicit and implicit arguments made by each author and aligning points of agreement or disagreement between them.

References

Bastida, Ana. 2024. “AI-Assist in FromThePage: Using HTR in People-Centered Transcription.” FromThePage. Accessed July 14, 2024. https://content.fromthepage.com/ai-assist-in-fromthepage-using-htr-in-people-centered-transcription/.

Green, Jonathan. 2012. Printing and Prophecy: Prognostication and Media Change 1450–1550. Ann Arbor: University of Michigan Press.

Institut Deutsche Presseforschung. 2024. “Brelochs, Anton.” Biobibliographisches Handbuch der Kalendermacher von 1550 bis 1750. [Biobibliographical handbook of calendar makers from 1550 to 1750.] Accessed July 14, 2024. https://www.presseforschung.uni-bremen.de/dokuwiki/doku.php?id=brelochs_anton.