RAG and agentsProduction2025AI Engineer

BearcatGPT

Campus scale AI assistant and agent platform

Azure AI FoundryAzure OpenAIAzure AI SearchRAGMCPMulti agent

Users: 46,000+
Latency reduction: 30%

One line summary

BearcatGPT is the production generative AI assistant for the University of Cincinnati, serving 46,000 plus students and faculty across multiple specialized agents grounded in enterprise knowledge.

Problem

Students and faculty needed a single, trusted way to ask academic, administrative, and program specific questions and get answers backed by real university documents, not generic LLM output. A general purpose chatbot would have been worse than useful: it would have hallucinated policy, missed the Socratic intent of tutoring, and offered no path to integrate with the actual systems that hold the answers.

Users

Undergraduate and graduate students looking for academic help and program guidance.
Faculty members who deploy course specific agents for their classes.
Operations staff who use BearcatGPT to query internal data through tool calls.
Leadership stakeholders who depend on grounded answers with citations.

My role

AI Engineer on the Digital Technology Solutions team. I work on the agent platform: building and tuning RAG pipelines, designing the multi agent layer and routing, shipping specialized Socratic tutors, and wiring MCP servers and API integrations so agents can reach systems that matter (TAMALE, the meeting intelligence list, and Microsoft Graph).

Architecture

The system is organized as four layers: a user layer, an agent layer on Azure AI Foundry, a retrieval layer on Azure AI Search, and a systems layer reached through MCP servers and APIs. Hybrid retrieval combines BM25 keyword search with vector similarity from text embedding 3 large, followed by a re ranking step. Specialized agents share retrieval but apply their own behavior contracts.

Sanitized architecture

AI method

Grounded retrieval first. Every answer is tied back to retrieved chunks, and agents are configured to refuse rather than fabricate when grounding is weak. Three Socratic tutors (Bearcat Study Pal, Bearcat Genius, Bearcat Test Prepper) each enforce their own behavioral guardrails, calculation verification protocols where relevant, and source bound responses. A router agent handles intent classification and controlled handoffs.

Tool use is structured. Agents call MCP servers and tool functions with explicit schemas, never raw text dumps. This is what lets the assistant talk to systems like Advent TAMALE, the meeting intelligence SharePoint list, and Microsoft Graph in a way the platform can serialize and audit.

Hardest technical challenge

The hardest part was not the model. It was making retrieval precise enough that the agents could safely commit to grounded answers without slowing the experience.

Latency vs precision tradeoffHybrid retrieval, larger contexts, and re ranking improve precision but each step adds latency. The win came from tightening chunking strategy, narrowing the candidate set before re ranking, and tuning the prompt router so easy queries skip expensive paths. Net result: 30 percent reduction in response latency without losing grounding.

Results

46,000 plus students and faculty have access through the campus AI assistant.
30 percent reduction in response latency after retrieval and prompt router tuning.
Multiple Socratic tutoring agents deployed for live courses, including an ACCT3073 Exam Practice Coach.
MCP servers connect agents to enterprise systems that previously required manual lookup.

What I learned

Production RAG is mostly retrieval and evaluation work. Prompt tuning is the last 10 percent.
Behavior contracts beat clever prompting. A Socratic tutor that will not give the final answer holds up better under adversarial use than one that politely tries.
Structured tool calling is the part that makes agents safe to expose to real enterprise systems. Free form responses to integrations age badly.
Cost and latency are part of the product. A 4 second answer that cites three documents is more valuable than a 12 second answer that cites ten.

Screenshots

Sanitized onlyReal screenshots show enterprise data and are not published here. Diagrams are sanitized reconstructions of the production architecture.

Future improvements

Expand MCP coverage so more systems are reachable directly by tool call.
Add task scoped memory for multi turn workflows like advising and academic planning.
Continue benchmarking against external study tools as new model deployments come online.
Tighten evaluation harness for grounding quality and refusal calibration.