Enterprise RAG Architect

The Enterprise RAG Architect

What Is This Program?

Students move beyond simple "chat with a PDF" scripts to building a production-ready Knowledge Engine. Focus on handling high-volume, unstructured document sets (10,000+ pages) while maintaining high accuracy.

Who Is It For?

Data engineers and AI practitioners aged 16+ building scalable enterprise knowledge solutions.

What Will I Learn?

Vector Databases (Pinecone, Milvus), Data Ingestion (Unstructured.io), Orchestration (LlamaIndex), and Evaluation (Ragas/TruLens).

Pre-Requisites

JSON & Dicts, API Fundamentals, Basic Database Logic.

Dates

Session 1: July 6 - July 17, 2026
Session 2: August 3 - August 14, 2026 (EN)

Language

This course is offered in both English or Chinese. Check the dates for your language preference.

Tuition & Fees

Tuition: NT$XX,000
Deposit: NT$2,000
Early Bird Deal: Save 15% (Book by March 1st)

Core Tech Stack

Vector Databases

Pinecone, Milvus, or Qdrant for scalable, high-speed similarity search.

Data Ingestion

Unstructured.io or Docling for parsing complex PDFs, tables, and nested headers.

Orchestration

LlamaIndex (specialized for data-heavy RAG) or LangChain.

Evaluation

Ragas or TruLens to mathematically score retrieval accuracy and faithfulness.

Curriculum

Week 1: Data Engineering & Indexing

Day	Topic	Hands-On Activity
01	Data Ingestion	The Parser Lab: Using Unstructured.io to extract clean text.
02	Semantic Chunking	Context-Aware Splitting: Recursive Character Splitting.
03	Metadata Enrichment	The Tagging Engine: Tagging chunks with source data.
04	Table & Image Parsing	Vision-RAG: Converting diagrams/tables to text summaries.
05	The Vector Infrastructure	Scaling the Index: Bulk upserts of 10k+ documents.

Week 2: Advanced Retrieval & Production

Day	Topic	Hands-On Activity
06	Hybrid Search	Keyword + Semantic: Combining Vector Search with BM25.
07	Hierarchical Indexing	Parent-Child Retrieval: Searching small chunks, feeding larger context.
08	Query Transformation	HyDE & Multi-Query: Expanding user questions.
09	Production Optimization	Reranking & Caching: Cross-Encoders and Redis/GPTCache.
10	Evaluation & Grounding	The Accuracy Audit: Using Ragas or TruLens.