Hehe Information Uses AI to Preserve Ancient Yi Script, Releases Industry's First Encoding Database
-
According to Yicai News, Hehe Information, in collaboration with Shanghai University and South China University of Technology, recently released the industry's first basic encoding database for ancient Yi script. This database employs artificial intelligence technology to digitally encode ancient Yi characters prevalent in the Yunnan and Guizhou regions, compiling them into a comprehensive "dictionary-like" database. This facilitates easier access to the pronunciation and meanings of ancient Yi characters for researchers and enthusiasts.
Image credit: Generated by AI, image licensed by MidjourneyAncient Yi script refers to the indigenous script used among the Yi people, comprising 87,046 characters—far exceeding the number of Chinese characters. Among these, The Annals of Southwest Yi is the longest and most extensive ancient Yi text discovered to date. However, the abundance of variant forms in ancient Yi script, where a single character can have dozens of different written forms, poses significant challenges for digitizing these texts.
To address these challenges, the project team utilized AI technologies such as intelligent image processing and text recognition, training on over 76,000 samples to establish a unified digital encoding system for ancient Yi script. Once the database is released, users can simply input an encoding string to retrieve information on pronunciation, definitions, and more, significantly lowering the barrier to accessing ancient texts.
Hehe Information stated that the release of this database is a foundational effort, helping more people understand and study ancient Yi script while also providing a new approach to preserving linguistic heritage. Currently, digitization has become a crucial method for cultural preservation. This database project demonstrates that AI can play a vital role in facilitating the digital transformation of traditional culture.