Sapio — data-driven AI
EN
C2

How to deploy an AI agent for customer support

By Vlad TudorLast updated: June 2026

An AI customer-support agent handles repetitive questions with a documented answer, using RAG on your knowledge base, and escalates to a human anything ambiguous, sensitive, or high-stakes. You design it with escalation and human-in-the-loop from day one, then measure it by true deflection rate, not by how many messages it sent.

  • In short: an AI support agent handles repetitive questions with a documented answer well, and escalates to a human anything ambiguous, sensitive, or high-stakes.
  • The technical foundation is RAG on your knowledge base, so the agent answers from your documentation, not from what the model "thinks".
  • Design escalation and the human-in-the-loop step from day one; an agent with no exit to a human is a risk, not a saving.
  • The right measure is true deflection rate (cases fully resolved without a human), not the number of messages the bot replied to.

What can and cannot an AI customer-support agent do?

A well-built AI support agent answers frequent questions quickly and consistently, searches your documentation and returns the right answer with its source, walks the customer through standard steps, and works around the clock, in several languages. What it cannot do, and it is honest to say so: it does not make decisions for which there is no clear rule, it does not handle emotional situations or delicate complaints well, it should not be allowed to make commercial promises or change accounts without validation, and it cannot invent information it does not have. A vendor selling you "the magic bot that solves everything 24/7" has either never built one or is not telling you the truth. The real value comes from making it do very well the part it actually can do.

How does RAG on your knowledge base work?

RAG (retrieval-augmented generation) is the mechanism by which the agent answers from your documents, not from the model's general memory. Instead of "guessing", the agent first searches your knowledge base (manuals, policies, FAQs, support articles), finds the relevant passages, and only then forms the answer from them, ideally with a link to the source. This solves two big problems: answers stay current, because you update documents not the model, and the risk of hallucination drops sharply, because the agent is anchored in real text. Answer quality depends directly on knowledge-base quality: tidy, correct documentation means a good agent; documentary chaos means a weak agent, whatever the model.

We have been building RAG on large corpora since we launched ai-aflat.ro, our AI assistant for Romanian law, on 500,000+ indexed legislative texts. That is where we learned in practice what it takes to answer correctly, in Romanian, from a huge volume of documents — see the ai-aflat.ro case study.

How do I design escalation and human-in-the-loop?

Escalation is not a fallback feature but part of the design. The simple rule: the agent resolves what it knows for sure and hands off quickly to a human for everything else, with the conversation history attached so the customer does not repeat themselves. The table below shows the case types and who should handle them.

Case typeWho resolves itWhy
Frequent question, documented answerAI agent, fullyFast, consistent answer, from the source
Standard steps (reset, order status)AI agent, with guidanceClear procedure, no subjective decision
Ambiguous or incomplete requestAgent clarifies, then escalates if still unclearAvoids wrong answers with false confidence
Complaint, emotional situationHuman, immediatelyRequires empathy and judgement
High-stakes decision (refund, contract, account)Human, with validation (human-in-the-loop)Financial/legal risk too high to automate

Human-in-the-loop means, in practice, that the agent can prepare a reply or an action but a human approves it before it reaches the customer, for sensitive cases. It is the safety net that lets you automate more, without shifting the risk onto the customer.

How do I handle Romanian correctly?

Romanian brings quirks that break agents built for English: diacritics written inconsistently by customers (or dropped entirely), formal and informal forms of address, regionalisms, and a more indirect way of asking questions. An agent that works well in Romanian must be tested on real messages, with and without diacritics, and tuned to search the knowledge base even when the question is phrased naturally and colloquially. From the experience of building ai-aflat.ro on Romanian legal language, the Romanian-language retrieval is what decides quality, more than the choice of model itself.

How do I measure whether the agent actually works (deflection)?

The central metric is true deflection rate: the share of conversations fully resolved by the agent, without a human having to step in and without the customer returning with the same problem. Beware the vanity metric: "the bot answered 80% of messages" means nothing if half those customers then called the contact centre. Track, alongside deflection, satisfaction (CSAT) on agent-handled conversations and the repeat-contact rate for the same issue. Set a threshold from the start, exactly like a pilot: if true deflection misses the target, you tune the knowledge base and the escalation rules before you expand.

What is the next step?

If you want a support agent that genuinely reduces volume to your team without damaging your customer relationships, start from a tidy knowledge base and a clear map of what escalates. See how we approach AI automation and agents through our AI services, then book a free initial call. In that call we define the agent's scope, what it can realistically take on, and how we measure deflection on your case. The initial call is free.

On ai-aflat.ro we index 500,000+ legislative texts and answer from them in Romanian — our direct experience with RAG on a large corpus and correct Romanian-language retrieval.

Frequently asked questions

Can an AI agent fully replace the support team?

No, and it should not. A good agent takes on repetitive questions with a documented answer and frees the human team for cases that need judgement, empathy, or high-stakes decisions. Complaints, emotional situations, and financial decisions stay with people. The realistic goal is to reduce routine volume, not to eliminate human support.

What is RAG and why does it matter for a support agent?

RAG (retrieval-augmented generation) makes the agent answer from your documentation, not from the model's general memory. It first searches the knowledge base, finds the relevant passage, and forms the answer from it. This keeps answers current and sharply reduces the risk of hallucination, because the agent is anchored in real text.

How do I measure whether the support agent actually helps?

The central metric is true deflection rate: conversations fully resolved by the agent, with no human intervention and no customer returning with the same problem. Also track CSAT on agent conversations and the repeat-contact rate. "The bot answered 80% of messages" is a vanity metric if those customers then call the contact centre.

Does the AI agent work well in Romanian?

Yes, if it is built and tested for Romanian. The real challenges are inconsistently written diacritics, formal and informal address, and colloquially phrased questions. Romanian-language retrieval decides quality more than the model does. From the ai-aflat.ro experience, on Romanian legal language, we learned exactly what it takes to answer correctly in Romanian from a large volume of documents.

What does human-in-the-loop mean for a support agent?

It means that, for sensitive cases, the agent can prepare a reply or an action, but a human approves it before it reaches the customer. It is the safety net for decisions with financial or legal risk, for example a refund or an account change. It lets you automate more without shifting the risk onto the customer.

Want to discuss a project?

Book a free discovery call with the Sapio team.