Harvard University
Kwok-leong Tang
Digital China Initiative, Harvard University
Digitizing Việt Nam: Vietnamese Studies in the Age of Digital Humanities and Artificial Intelligence
Columbia University · April 2026
Our Mission
A conference series designed to introduce humanities scholars to emerging digital tools and methodologies for research in China Studies.
In April 2023, DCI and the China Biographical Database (CBDB) co-organized the first generative AI workshop for humanities at Harvard.
Research
Augmenting Retrieval for Eurasian Languages
A multi-institutional team led by Peter K. Bol received a $600,000 Schmidt Sciences grant to develop multilingual AI tools that help scholars detect patterns across Eurasian historical documents written in eight underserved languages — examining how textual traditions spread, change, and compete with one another.
Over the past eighteen months, optical character recognition (OCR) and automatic speech recognition (ASR) have improved dramatically, driven by the rapid development of multimodal models — models capable of processing text alongside images or audio.
Many of these models can now run on local machines, making high-quality digitization accessible without requiring cloud infrastructure or costly API subscriptions.
Humanities researchers often face the prospect of investing weeks or months to learn a technical skill — GIS, web scraping, data visualization — that they may only use once. Vibe coding removes this barrier entirely.
Instead of spending time mastering tools, scholars can spend time asking better questions. The time saved on technical training can be redirected toward interpretation, analysis, and discovery.
Design intuitive interfaces, clear navigation, and rich visual presentations so that scholars and the public can engage meaningfully with digitized materials.
Structure data with clean metadata, open APIs, MCP servers, skills, and machine-readable formats so that AI systems can retrieve, process, and reason over our collections — amplifying their reach and longevity.
As creators of digitization projects, we have a responsibility to design for both audiences.
Kwok-leong Tang · Digital China Initiative, Harvard University