A RAG chatbot (retrieval-augmented generation) answers from your company documents, not from the model's general memory: it first retrieves the relevant chunks, then formulates the answer with sources. You build it by preparing documents, splitting them into chunks, generating embeddings in a vector database, then retrieving and generating.
- RAG (retrieval-augmented generation) means the model first searches your documents, then formulates an answer, with sources.
- The steps: prepare documents, split them into chunks, turn them into embeddings, store them in a vector database, then retrieve and generate the answer.
- For data that changes often, RAG beats fine-tuning: you update the corpus without retraining the model.
- In Romanian, diacritics, inflected forms and domain jargon matter — they are real causes of wrong answers if you do not handle them.
What exactly is a RAG chatbot?
A RAG chatbot is an assistant that answers not from what the model "knows" in general, but from your documents. RAG stands for retrieval-augmented generation: before generating an answer, the system searches ("retrieval") the relevant chunks from your knowledge base and hands them to the model as context, and the model formulates the answer from them ("generation"). The practical difference is huge. A generic chatbot will confidently invent a law article or a contract clause that does not exist — the hallucination problem. A properly built RAG chatbot answers only from what it found and can cite the source, and when it finds nothing relevant it says it does not know rather than improvising. At Sapio we build such systems, and our clearest example is ai-aflat.ro, the AI assistant for Romanian legislation, where we indexed over 500,000 legislative texts. At that scale, the difference between "search then answer" and "answer from memory" is not a nuance, it is the difference between a usable tool and a dangerous one.
What are the steps to build a chatbot on your company data?
A RAG system has a clear backbone. The steps below are the same whether you have 500 documents or 500,000, but each gets harder as the corpus grows — that is where it shows whether the team has built at scale.
- Prepare the sources: collect the documents, clean them, remove duplicates, and extract text from PDFs and scans (this is where OCR comes in for non-editable documents).
- Split the text into chunks: small enough to be precise, large enough to keep their meaning. Bad chunking is the most common cause of bad answers.
- Generate embeddings: turn each chunk into a numeric vector that encodes its meaning, so it can be searched by sense, not only by keyword.
- Store the vectors in a vector database, over which semantic search runs on every question.
- On each question, retrieve the relevant chunks, hand them to the model as context, and ask for an answer that cites its sources.
- Evaluate and tune: test on real questions, measure how many answers are correct, and adjust chunking, retrieval and prompts until you are satisfied.
RAG or fine-tuning — which do you choose?
There are two different ways to make a model work with your knowledge, and people often confuse them. RAG adds the knowledge at question time, from outside the model. Fine-tuning rewrites the model's behaviour by retraining it on examples. In most "chatbot on company documents" projects, RAG is the right choice, because your data changes and you want citable sources. Fine-tuning has its place when you need a fixed tone or format, not new knowledge.
| Criterion | RAG | Fine-tuning |
|---|---|---|
| What it solves | Brings in knowledge from your documents | Changes the tone, format, style of answers |
| Frequently changing data | Ideal — update the corpus, not the model | Poor — every change needs retraining |
| Citable sources in the answer | Yes, by design | No |
| Cost and time to production | Lower, weeks | Higher, plus training-data effort |
| When you choose it | Document assistant, internal FAQ, support | Fixed tone/format in a narrow domain |
In practice, the two are not mutually exclusive: you can use RAG for knowledge and fine-tuning for tone in the same system. But if you start with a single bet, for a chatbot that knows your company data, RAG is almost always the starting point.
What is special about a RAG chatbot in Romanian?
Many RAG systems work well in English and stumble in Romanian, and the problem shows up precisely at the retrieval step. Diacritics are the first trap: if the user types without them and the documents have them (or vice versa), the search misses relevant chunks. The second is Romanian's rich inflection — one word has many forms (contract, contractului, contractelor), and a naive search treats them as different terms. The third is domain jargon: Romanian legal, medical or technical language has terms that a model trained mostly on English does not represent well. From our experience on ai-aflat.ro, where the corpus is entirely Romanian legal text, these three things make the difference between an assistant that finds the right text and one that answers beside the point. We handle them through diacritic normalisation, embeddings that understand Romanian, and a testing stage on real questions phrased the way people actually write them. See what it looks like at scale in the ai-aflat.ro case study.
How do I get a RAG chatbot started for my company?
Start with a small corpus and a clear question: what you want the assistant to answer and for whom. A pilot on a few hundred well-chosen documents tells you more than a large project built on assumptions. Then measure accuracy on real questions before you hand it to users. If you want to see how we would do this on your data, read about our AI services, where a document assistant becomes part of a workflow, then book a free initial call with the Sapio team. In that call we work out which documents go into the corpus, which questions it must cover, and whether a pilot makes sense before a full system.
On ai-aflat.ro, the AI assistant for Romanian legislation, we indexed over 500,000 legislative texts — Sapio's direct proof for RAG on a large Romanian-language corpus.
Frequently asked questions
What does RAG mean in a chatbot?
RAG stands for retrieval-augmented generation. Before answering, the chatbot searches your documents for the relevant chunks and hands them to the model as context, and the model formulates the answer from them. That way it answers from your data, can cite the source, and avoids inventing information that does not exist in the corpus.
RAG or fine-tuning for a chatbot on company data?
For knowledge that changes and answers with sources, RAG is the right choice: you update the corpus without retraining the model. Fine-tuning solves fixed tone and format, not new knowledge, and costs more. The two can be combined in one system, but RAG is usually the starting point.
Does RAG work well in Romanian?
It does, if you handle three things: diacritics, Romanian's rich inflected forms, and domain jargon. Ignored, they make the retrieval step miss relevant chunks. We handle them with diacritic normalisation, embeddings that understand Romanian, and testing on real questions — from our experience on ai-aflat.ro, a corpus entirely in Romanian legal text.
How many documents do I need to start a RAG chatbot?
Fewer than you think for a pilot. A few hundred well-chosen documents, on a clear question, show you whether the system answers correctly before a large investment. On ai-aflat.ro we run over 500,000 texts, so large scale is feasible, but it is not required to validate the idea.
Can a RAG chatbot make up answers?
A properly built RAG system greatly reduces this risk, because it answers only from the retrieved chunks and can cite the source. When it finds nothing relevant, it should say it does not know rather than improvise. Hallucinations remain possible if chunking and retrieval are badly tuned, which is why the evaluation stage matters.
Want to discuss a project?
Book a free discovery call with the Sapio team.