Digital China Worldwide

Exploring RAG for Resource Recommendation

Kwok-leong Tang

Harvard University

2025-03-13

The Origin of the Digital China Worldwide

  • The 2023 Tools of the Trade conference at Harvard University
  • Featured over 130 innovative projects
  • Many intriguing projects remained unknown to the Chinese Studies community
  • Prof. Peter Bol initiated to build a directory of these projects.

Another Directory Project?

Luce/ACLS China Studies Digital Mapping Project

Turn Digital China Worldwide into a Chatbot

Retreival-Augmented Generation

flowchart TB
    subgraph Direct["Direct LLM Query"]
        direction LR
        U1[User Query] --> L1[LLM] --> R1[Response]
        style L1 fill:#f9f,stroke:#333
    end

flowchart TB

    subgraph RAG["RAG-Enhanced Query"]
        direction LR
        U2[User Query] --> RC[Retrieval Component]
        RC --> |Search| DB[(Knowledge Base)]
        DB --> |Relevant Documents| RC
        RC --> CP[Context Processor]
        CP --> |Enhanced Prompt| L2[LLM]
        L2 --> R2[Response]
        style L2 fill:#f9f,stroke:#333
        style DB fill:#bef,stroke:#333
        style RC fill:#fbf,stroke:#333
        style CP fill:#fbf,stroke:#333
    end

    classDef default fill:#fff,stroke:#333,stroke-width:2px;

Knowledge Base

Tech Stack

  • GenAI Workflow:Flowise (Self-hosted)
  • GenAI Model: OpenAI GPT-4o-mini through Harvard endpoints
  • Resource Data Collaboration: NocoDB (Self-hosted)
  • Vector Database: Qdrant (Self-hosted)

Demo of the DCW Chatbot

https://dcw.fairbank.fas.harvard.edu

Problems

  • “Not smart enough”, “Does’t not respond well”
  • Scraping and upsert mechanism
  • Data quality (multilingual problems)

Next Steps

  • Expand the knowledge base
  • Please suggest more resources, expecially the projects from your institutions
  • Please give us feedback
  • Exploration of collaborations

Thank You Very Much!