Generative AI for East Asian Studies


Why GenAI for Humanities Studies:
Essential Concepts about LLMs

Kwok-leong Tang

Harvard University

A.I. in East Asian Studies Workshop · Ohio State University · April 2026

Today’s Agenda

9:30–10:30 Session 1 — Why GenAI for Humanities Studies
10:45–12:00 Session 2 — Hands-on Practice with LM Studio
1:00–2:30 Session 3 — Introduction to Vibe Coding
3:00–4:30 Session 4 — Agentic Approaches to Humanities Research

Disclaimer

Kwok-leong Tang and the Digital China Initiative (Harvard University) have no financial interest in, nor receive compensation from, any of the tools, models, or software used in this workshop.

References to specific products (LM Studio, Ollama, Antigravity, Claude Code) are for research and educational purposes only and do not constitute endorsement by Harvard University.

A Historian, Not a Computer Scientist

Three degrees in history and Asian Studies

Research focus: late imperial China, Confucian temples

Did not apply many digital methods in PhD dissertation

How I Got Here

2019 Digital China Fellow, Fairbank Center for Chinese Studies
2021 Lecturer, East Asian Languages & Civilizations
2023 Managing Director, Digital China Initiative

I make my living by teaching and helping others use digital tools and methods in their research — I am biased!

Why Did DCI Shift to LLMs?

2023 Tools of Trade Conference in Boston — saw the potential of LLMs for humanities research
Apr ’23 First GenAI workshop at Harvard
Since Workshops at different institutions. Support students and faculty in adoption of GenAI in their research.

LLMs, Generative AI, and AI

These terms are often used interchangeably — but they are not the same.

Artificial Intelligence

Machines that perform tasks requiring human-like intelligence

Machine Learning

Systems that learn from data rather than explicit rules

Large Language Models

Trained on massive text to predict the next token

Generative AI

AI that generates new content: text, images, code, audio

For Decades, AI Meant This…

A.I. Artificial Intelligence (2001) movie poster

Spielberg’s A.I. Artificial Intelligence (2001) — a robot child who wants to become “real.”

HAL 9000. Skynet. Ex Machina.
Popular culture imagined AI as sentient machines.

The reality of AI in 2026 is very different:
not sentient, but remarkably useful.

ImageNet and the GPU Revolution

AlexNet wins the ImageNet competition by a wide margin — deep neural networks, powered by GPUs, suddenly outperform all traditional methods.

The lesson: scale + compute = breakthrough

Machine Learning Takes Off

Image recognition, speech recognition, recommendation systems, self-driving cars

For humanities scholars, the barrier remained: you needed labeled data and programming skills — both scarce in our fields.

The Transformer

“Attention Is All You Need”

Google Brain, 2017

A new architecture that processes entire sequences in parallel — the foundation of every modern LLM: GPT, Claude, Gemini, DeepSeek, Qwen.

ChatGPT

100M

users in two months — the fastest adoption of any technology in history

For the first time, anyone could interact with AI through natural language.

From Chatbot to AI Agent

2022

Chatbot

ChatGPT — ask questions, get answers in a chat window

2023–24

Copilot

GitHub Copilot, Cursor — AI assists you inside an editor

2025–26

AI Agent

Claude Code, OpenClaw, Antigravity — acts autonomously, uses tools

From conversationcollaborationdelegation

Why AI Matters for Humanities Studies

Adopting digital tools in East Asian Studies has faced persistent barriers

No Rice, No Meal

巧婦難爲無米之炊

“Even the cleverest housewife cannot cook without rice”

OCR for woodblock prints and manuscripts was unreliable for decades. Without digitized text, computational analysis was impossible — no matter how good the method.

Not Everyone Does Quantitative Research

Scholars of close reading have long viewed digital tools as irrelevant to their interpretive work.

When “digital humanities” meant counting words and drawing network graphs, it was hard to imagine how it applied to close reading.

But what if the tool could do more than count?

Platforms Built for Someone Else’s Question

Most DH platforms in Chinese Studies — CBDB, CHGIS, MARKUS, DocuSky — grew from the research interests of their creators.

Each platform has its own interface, its own logic, its own assumptions about what you’re trying to do — then was gradually generalized to cover other problems.

What if you could build your own?

Cutting the Foot to Fit the Shoe

削足適履

“Cutting one’s foot to fit the shoe”

Every research project has unique workflows, unique data, unique questions. Scholars need custom tools — but developer hours have always been too expensive.

The Opportunity Cost

Time spent labeling data & training models

Time spent learning tools & debugging code

vs.

Time spent reading sources & writing papers

Digital tools are not essential to humanities research. The rational choice was clear — most scholars chose to read more and write more instead.

The Jagged Frontier of AI

AI has a jagged frontier — it is good at some things that seem very hard and bad at some things that seem really easy for humans.

— Ethan Mollick, Wharton

Surprisingly Good At

  • Writing code from natural language
  • Translating classical Chinese
  • Synthesizing 100-page documents
  • Complex reasoning & analysis
  • Drafting grant proposals
vs.

Surprisingly Bad At

  • Counting words or letters
  • Simple spatial reasoning
  • Basic arithmetic with large numbers
  • Following rigid formatting rules
  • Knowing what it doesn’t know

The frontier is invisible — you must always verify.

Dell’Acqua et al., “Navigating the Jagged Technological Frontier,” Harvard Business School, 2023

Improvement in OCR

New OCR models in 2025–2026 have dramatically improved recognition of historical CJK texts:

Dots OCR

Rednote

PaddleOCR-VL

Baidu

DeepSeek OCR

DeepSeek

Chandra

Datalab

Vision models (e.g., Gemini 2.5 Pro) can also perform OCR when you upload an image.

Think of Tony Stark and Jarvis

Tony Stark using natural language to command Jarvis to transform holographic data

“Jarvis, scan the model”

Data retrieval

“Color code every causeway, footpath…”

Cleaning & categorization

“Lose the landscaping, the shrubbery…”

Filtering & analysis

“Structure the protons and neutrons”

Visualization & discovery


The full cycle of knowledge production — through natural language alone.
That’s not science fiction anymore.

Vibe Coding

“There’s a new kind of coding I call vibe coding, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists.”

— Andrej Karpathy, February 2025

Collins Dictionary Word of the Year 2025

Describe what you need → AI generates the code → you verify with your expertise

Example: a student-built China Map Quiz App — created entirely through vibe coding

We will explore this in depth in Session 3 this afternoon.

Not Replacing Humans —
Restructuring Work

The Common Fear

“AI will take away jobs”

What’s Actually Happening

GenAI is breaking down the division of labor and the establishments built to serve the pre-2022 means of production

A single scholar can now do what used to require a developer, a designer, and a data analyst.

The question is not “who loses their job?” but “what can each person now accomplish?”

What Data Centers Actually Power

Total data center electricity: ~415 TWh (2024) — comparable to the United Kingdom. But AI is only a slice:

Streaming & Media — ~125 TWh

Netflix, YouTube, TikTok, Spotify — comparable to Belgium

Cloud & Enterprise — ~145 TWh

AWS, Azure, Google Cloud, email, file storage — comparable to the Netherlands

Social, Email, etc. — ~85 TWh

Facebook, Instagram, Gmail, Slack — comparable to Portugal

AI / ML — ~40–60 TWh

Training, inference, chatbots, AI search — comparable to New Zealand

AI today uses about as much electricity as New Zealand — not the UK.

Sources: IEA (2025); Patterson et al. (2022), Google & UC Berkeley

Cryptocurrency Mining: The Elephant in the Room

175 TWh

per year — comparable to Argentina

That’s 3–4× more than all AI

67%

from fossil fuels

Coal alone provides 45%

12+ yrs

largely unchecked

Since 2013. Largely unquestioned.

95%+ of crypto transactions are speculation. No scientific advancement, no educational tools, no accessibility gains.

Sources: Cambridge CBNSI; IEA; United Nations University, 2023

Redirect Energy & Think Local

Repurpose Crypto Infrastructure for AI

CoreWeave: mining → GPU cloud = 60× revenue growth
Core Scientific: 100 MW mining → AI = $8B contract
10 MW of AI GPUs = revenue of 100 MW of Bitcoin mining

Run Local Models on Your Own Machine

Tools like LM Studio and Ollama let you run models on a laptop — less energy than your desk lamp, with full data privacy.

The real question is not how much energy — but what does humanity get back?

Questions?


Ask about anything from Session 1 — concepts, concerns, or what’s coming this afternoon.