An initiative of Harvard University to prompt digital tools and methods for Chinese Studies.
PIs: Prof. Peter Bol and Prof. Michael Szonyi
Provide support to Harvard faculty, students, and researchers in Chinese Studies.
The GenAI Turn
In 2022, the release of ChatGPT (and other Large Language Models, LLMs) has brought generative AI turn to the usage of digital tools and methods in Chinese Studies.
In April 2023, CBDB and DCI held the first GenAI workshop for Chinese Studies and humanties at Harvard University.
We decided that our future projects should incline to the GenAI approach.
DCI has hosted a series of GenAI workshops for Chinese Studies since 2023.
Projects Involving GenAI
Nehru Papers
This is a NEH project that led by Tansen Sen (NYU), Gal Gvili (McGill), and Arunabh Ghosh (Harvard).
It is a project supported by Digital China Initiative.
The project has to digitize documents collected from the Indian archives which are related to Sino-Indian relations during the 60s and 70s.
In this stage, we are focusing on creating a catalog for the documents (~50K+?).
Nehru Papers
Most of the documents are in English, a few are in Indian languages and Chinese.
We decided to make the project as a RAG-based chatbot. Besides our original directory, we also include the data from the China Studies Digital Mapping Project.
n8n for api and interface (integrate with Slack and other tools)
Nocodb for data storage and collaboration
Scraping machanism
Monitoring GenAI Applications
LiteLLM: https://litellm.ai/
Monitoring the usage of LLM APIs.
Creating API keys for different projects, people, and apps.
Langfuse: https://langfuse.com/
Monitoring GenAI applications.
Recording tokens usage, model cost and other metrics.
Challenges
The cost for building a GenAI apps are different from “traditional” databases and platforms. Every actions (e.g., query, update, delete) will cost tokens/money.
Building and maintaining approaches. For example, fine-tune models vs. RAG or Agentic?
Rapid development of all aspect of GenAI. How to build in a flexible way?