Customer Support AI Agent with RAG & LLMOps
Production-style RAG agent with hybrid retrieval, tracing, and automated evaluation.
A support question arrives over a FastAPI endpoint, containerised with Docker for consistent deployment.
Support teams waste hours answering repetitive questions from scattered documentation, and naive LLM answers hallucinate.
A grounded, observable agent that answers from a knowledge base with citations and measurable accuracy.
Built a RAG agent on the Cohere API and ChromaDB with semantic chunking and hybrid retrieval. Added response tracing and observability to surface hallucination patterns, then applied metadata filtering and prompt versioning to improve grounding. Deployed via FastAPI in a containerised (Docker) architecture with automated evaluation and cost monitoring.
Diagnosing where hallucinations entered the chain, then tuning chunking and metadata filters to fix grounding without hurting latency.
80%+ answer accuracy on a 200-query evaluation set, with traceable, cost-monitored responses.
Multi-agent routing for specialised domains and continuous evaluation in CI.