History Graduate Student's Submission to 'Humanity's Last Exam' Among Top Contributions
Earth Anderson, a Ph.D. candidate with the Department of History, has been given co-authorship with the Center for AI Safety's project, "Humanity's Last Exam." The goal of the Humanity's Last Exam project was to thoroughly test the capabilities of artificial intelligence and large language models against the most complex array of questions possible, pushing the frontier of artificial intelligence.
Co-authorship was extended to the top questions. Anderson’s name appears in the citation list of top contributors in the paper and the dataset on the website.
“I felt determined to have history and historiographical analysis represented in this project,” Anderson explained. “So, I chose a question that required both a nuanced understanding of medieval history and references as well as the ability to navigate precise language. Even with the additional benefit of providing the AI with five multiple choice options, all major learning models failed to properly engage with the question.”
Contributors were asked to keep their questions undisclosed to preserve the integrity of the dataset. But Anderson was able elaborate on his concerns about the AI’s capacity to understand the questions.
“With my questions, I tried to really dig at the source analysis and semantic capabilities of the AI. In the case of my historiographical question, the AI failed to recognize that the question inquired after the source of the epithet ‘Auguste’ in French royal tradition, not merely who gave Augustus this epithet originally. The source of the epithet is Caesar, as in CAESAR AUGUSTUS. Suetonius's biography of Caesar set the genre for other kingly literature for centuries to come.
“Thus, without revealing too much of the contents of the question, the AI was unable to distinguish between historical writers such as Rigord and Suetonius in regard to Philippe Auguste and royal chronicles. The distinction is very important, and the AI failed to recognize that. This was not merely some grammatical trick, but a legitimate semantic failure on the part of AI models, which remain unable to analyze a source in its context independently (that is, without regurgitating extant historiographical writing).”
So for now, Anderson feels historians can rest assured that AI will not be taking over their profession anytime soon.
A total prize pool of $500,000 saw over 1,400 researchers and contributors from top universities and AI companies compete in presenting questions to stump AI models.
Contacts
Earth Anderson, doctoral candidate
Department of History
501-293-4551,
mea015@uark.edu