May 25, 2022

As if a scholar in ancient Greek isn’t necessarily diligent enough, the primary texts they rely on are often thousands of years old, often causing irreparable damage. Historians may have a powerful new tool in Ithaca, a machine learning model built by DeepMind that predicts the location and date of missing words and text with amazing accuracy. This is an unusual application of AI, but it shows how useful it can be outside the world of technology.

The problem of incomplete ancient texts extends to many disciplines in which specialists work with substandard material. The original document may be made of stone, clay, or papyrus, written in Akkadian, Ancient Greek, or Linear A, and may describe everything from grocery bills to the hero’s journey. What unites them all, however, is the damage done over thousands of years.

Spaces where the text is frayed or torn are often referred to as gaps and can be as short as a missing letter, a chapter, or even an entire story. It may be trivial or impossible, but you need to start somewhere – and Ithaca will help you with this.

Ithaca (named after Odysseus’ home island) is trained in a vast library of ancient Greek texts and can not only tell what a missing word or phrase might be, but also try to find how old it is and where it was written. ? This will not fill the whole epic circle by itself – it should be a resource for those who work with these texts, not a solution.

An article published in the journal Nature demonstrates its effectiveness in some of the decrees of Pericles in Athens. Ithaca, thought to have been written around 445 BC. e., based on their textual analysis, suggested that they were actually written in 420 BC. e., which is consistent with later evidence. It may not seem like much, but imagine that the Bill of Rights was written 20 years later!

An illustration of Ithaca's derivation: word, place, and date.

image credit: deep mind

As for the text itself, the experts in the study guessed it by about 25 percent the first time, which is not bad at all, although, of course, searching for the text should not mean lunch, but a long-term project. However, when combined with Ithaca, they quickly reached an accuracy of 72 percent. This is often the case in other situations where people are more accurate at the end but can speed up their process by quickly clearing dead ends or suggesting a starting point. It can be easy to overlook an anomaly in medical data that AI can quickly detect, but in the end it is the human experience that sees the details and finds the right answer.

You can try a different version of Ithaca here if you have ancient Greek text with spaces, or use one of the samples they provided to see how it fills in the requested spaces. For longer parts or more than 10 missing letters try this in this Colab notepad. The code is available on this GitHub page.

While Ancient Greek is a clear and fertile area for Ithaca, the team is already hard at work on other languages. Akkadian, Demotic, Hebrew and Maya are all on the list and hopefully more will be added over time.

“Ithaca illustrates the potential contribution of natural language processing and machine learning to the humanities,” said Professor Ion Androtsopoulos of the University of Athens, who worked on the project. “We need more projects like Ithaca to demonstrate this potential, as well as relevant curricula and learning materials to train future researchers who have a better understanding of both the humanities and AI techniques.”

Leave a Reply

Your email address will not be published.