Why GenAI for Humanities Studies:
Essential Concepts about LLMs
Kwok-leong Tang
Harvard University
A.I. in East Asian Studies Workshop · Ohio State University · April 2026
Kwok-leong Tang and the Digital China Initiative (Harvard University) have no financial interest in, nor receive compensation from, any of the tools, models, or software used in this workshop.
References to specific products (LM Studio, Ollama, Antigravity, Claude Code) are for research and educational purposes only and do not constitute endorsement by Harvard University.
About Me
Three degrees in history and Asian Studies
Research focus: late imperial China, Confucian temples
Did not apply many digital methods in PhD dissertation
About Me
I make my living by teaching and helping others use digital tools and methods in their research — I am biased!
These terms are often used interchangeably — but they are not the same.
Machines that perform tasks requiring human-like intelligence
Systems that learn from data rather than explicit rules
Trained on massive text to predict the next token
AI that generates new content: text, images, code, audio
A Brief History
Spielberg’s A.I. Artificial Intelligence (2001) — a robot child who wants to become “real.”
HAL 9000. Skynet. Ex Machina.
Popular culture imagined AI as sentient machines.
The reality of AI in 2026 is very different:
not sentient, but remarkably useful.
2012
AlexNet wins the ImageNet competition by a wide margin — deep neural networks, powered by GPUs, suddenly outperform all traditional methods.
The lesson: scale + compute = breakthrough
Mid-2010s
Image recognition, speech recognition, recommendation systems, self-driving cars
For humanities scholars, the barrier remained: you needed labeled data and programming skills — both scarce in our fields.
2017
“Attention Is All You Need”
Google Brain, 2017
A new architecture that processes entire sequences in parallel — the foundation of every modern LLM: GPT, Claude, Gemini, DeepSeek, Qwen.
November 2022
100M
users in two months — the fastest adoption of any technology in history
For the first time, anyone could interact with AI through natural language.
2022
ChatGPT — ask questions, get answers in a chat window
2023–24
GitHub Copilot, Cursor — AI assists you inside an editor
2025–26
Claude Code, OpenClaw, Antigravity — acts autonomously, uses tools
From conversation → collaboration → delegation
Adopting digital tools in East Asian Studies has faced persistent barriers
Challenge 1
巧婦難爲無米之炊
“Even the cleverest housewife cannot cook without rice”
OCR for woodblock prints and manuscripts was unreliable for decades. Without digitized text, computational analysis was impossible — no matter how good the method.
Challenge 2
Scholars of close reading have long viewed digital tools as irrelevant to their interpretive work.
When “digital humanities” meant counting words and drawing network graphs, it was hard to imagine how it applied to close reading.
But what if the tool could do more than count?
Challenge 3
Most DH platforms in Chinese Studies — CBDB, CHGIS, MARKUS, DocuSky — grew from the research interests of their creators.
Each platform has its own interface, its own logic, its own assumptions about what you’re trying to do — then was gradually generalized to cover other problems.
What if you could build your own?
Challenge 4
削足適履
“Cutting one’s foot to fit the shoe”
Every research project has unique workflows, unique data, unique questions. Scholars need custom tools — but developer hours have always been too expensive.
Challenge 5
Time spent labeling data & training models
Time spent learning tools & debugging code
vs.
Time spent reading sources & writing papers
Digital tools are not essential to humanities research. The rational choice was clear — most scholars chose to read more and write more instead.
A Critical Concept
AI has a jagged frontier — it is good at some things that seem very hard and bad at some things that seem really easy for humans.
— Ethan Mollick, Wharton
The frontier is invisible — you must always verify.
Dell’Acqua et al., “Navigating the Jagged Technological Frontier,” Harvard Business School, 2023
Breakthrough
New OCR models in 2025–2026 have dramatically improved recognition of historical CJK texts:
Rednote
Baidu
DeepSeek
Datalab
Vision models (e.g., Gemini 2.5 Pro) can also perform OCR when you upload an image.
The Vision
“Jarvis, scan the model”
Data retrieval
“Color code every causeway, footpath…”
Cleaning & categorization
“Lose the landscaping, the shrubbery…”
Filtering & analysis
“Structure the protons and neutrons”
Visualization & discovery
The full cycle of knowledge production — through natural language alone.
That’s not science fiction anymore.
Preview: Session 3
“There’s a new kind of coding I call vibe coding, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists.”
— Andrej Karpathy, February 2025
Collins Dictionary Word of the Year 2025
Describe what you need → AI generates the code → you verify with your expertise
Example: a student-built China Map Quiz App — created entirely through vibe coding
We will explore this in depth in Session 3 this afternoon.
A Different Perspective
“AI will take away jobs”
→
GenAI is breaking down the division of labor and the establishments built to serve the pre-2022 means of production
A single scholar can now do what used to require a developer, a designer, and a data analyst.
The question is not “who loses their job?” but “what can each person now accomplish?”
Energy & AI
Total data center electricity: ~415 TWh (2024) — comparable to the United Kingdom. But AI is only a slice:
Netflix, YouTube, TikTok, Spotify — comparable to Belgium
AWS, Azure, Google Cloud, email, file storage — comparable to the Netherlands
Facebook, Instagram, Gmail, Slack — comparable to Portugal
Training, inference, chatbots, AI search — comparable to New Zealand
AI today uses about as much electricity as New Zealand — not the UK.
Sources: IEA (2025); Patterson et al. (2022), Google & UC Berkeley
For Comparison
175 TWh
per year — comparable to Argentina
That’s 3–4× more than all AI
67%
from fossil fuels
Coal alone provides 45%
12+ yrs
largely unchecked
Since 2013. Largely unquestioned.
95%+ of crypto transactions are speculation. No scientific advancement, no educational tools, no accessibility gains.
Sources: Cambridge CBNSI; IEA; United Nations University, 2023
A Way Forward
CoreWeave: mining → GPU cloud = 60× revenue growth
Core Scientific: 100 MW mining → AI = $8B contract
10 MW of AI GPUs = revenue of 100 MW of Bitcoin mining
Tools like LM Studio and Ollama let you run models on a laptop — less energy than your desk lamp, with full data privacy.
The real question is not how much energy — but what does humanity get back?
Ask about anything from Session 1 — concepts, concerns, or what’s coming this afternoon.