Customer Support AI Agent with RAG & LLMOps

Production-style RAG agent with hybrid retrieval, tracing, and automated evaluation.

How it works

Result

80%+ accuracy

Data flows left to right · click a stage

01 User Query

FastAPI

A support question arrives over a FastAPI endpoint, containerised with Docker for consistent deployment.

80%+

Answer accuracy

200 queries

Eval set

Dockerised

Deploy

Stack

Cohere APIChromaDBFastAPIDockerLangChainLLMOps

The Problem

Support teams waste hours answering repetitive questions from scattered documentation, and naive LLM answers hallucinate.

Objective

A grounded, observable agent that answers from a knowledge base with citations and measurable accuracy.

Approach

Built a RAG agent on the Cohere API and ChromaDB with semantic chunking and hybrid retrieval. Added response tracing and observability to surface hallucination patterns, then applied metadata filtering and prompt versioning to improve grounding. Deployed via FastAPI in a containerised (Docker) architecture with automated evaluation and cost monitoring.

Challenges

Diagnosing where hallucinations entered the chain, then tuning chunking and metadata filters to fix grounding without hurting latency.

Results

80%+ answer accuracy on a 200-query evaluation set, with traceable, cost-monitored responses.

What's Next

Multi-agent routing for specialised domains and continuous evaluation in CI.

Want something similar built?

Start a conversation