Students move beyond simple "chat with a PDF" scripts to building a production-ready Knowledge Engine. Focus on handling high-volume, unstructured document sets (10,000+ pages) while maintaining high accuracy.
Data engineers and AI practitioners aged 16+ building scalable enterprise knowledge solutions.
Vector Databases (Pinecone, Milvus), Data Ingestion (Unstructured.io), Orchestration (LlamaIndex), and Evaluation (Ragas/TruLens).
JSON & Dicts, API Fundamentals, Basic Database Logic.
Session 1: July 6 - July 17, 2026
Session 2: August 3 - August 14, 2026 (EN)
This course is offered in both English or Chinese. Check the dates for your language preference.
Tuition: NT$XX,000
Deposit: NT$2,000
Early Bird Deal: Save 15% (Book by
March 1st)
Pinecone, Milvus, or Qdrant for scalable, high-speed similarity search.
Unstructured.io or Docling for parsing complex PDFs, tables, and nested headers.
LlamaIndex (specialized for data-heavy RAG) or LangChain.
Ragas or TruLens to mathematically score retrieval accuracy and faithfulness.
| Day | Topic | Hands-On Activity |
|---|---|---|
| 01 | Data Ingestion | The Parser Lab: Using Unstructured.io to extract clean text. |
| 02 | Semantic Chunking | Context-Aware Splitting: Recursive Character Splitting. |
| 03 | Metadata Enrichment | The Tagging Engine: Tagging chunks with source data. |
| 04 | Table & Image Parsing | Vision-RAG: Converting diagrams/tables to text summaries. |
| 05 | The Vector Infrastructure | Scaling the Index: Bulk upserts of 10k+ documents. |
| Day | Topic | Hands-On Activity |
|---|---|---|
| 06 | Hybrid Search | Keyword + Semantic: Combining Vector Search with BM25. |
| 07 | Hierarchical Indexing | Parent-Child Retrieval: Searching small chunks, feeding larger context. |
| 08 | Query Transformation | HyDE & Multi-Query: Expanding user questions. |
| 09 | Production Optimization | Reranking & Caching: Cross-Encoders and Redis/GPTCache. |
| 10 | Evaluation & Grounding | The Accuracy Audit: Using Ragas or TruLens. |