Sapio — data-driven AI
EN
C1

How to choose between a custom model and an API

By Vlad TudorLast updated: June 2026

To choose between an API, RAG, fine-tuning and a custom model, climb the ladder only as far as needed: start from a pre-trained API, add RAG when answers must draw on your data, fine-tuning for a fixed tone or format, and a custom model only for a niche, proprietary data, or full control.

  • Start from an API (a pre-trained model). It covers most cases and reaches production in days, not months.
  • Add RAG when answers must draw on your own data, which changes often.
  • Choose fine-tuning for a fixed tone, format, or behaviour that prompting cannot hold reliably.
  • Train a custom model only when you have a niche need, proprietary data, and a control or cost reason that genuinely justifies the effort.

What do API, RAG, fine-tuning and a custom model actually mean?

The four options are not mutually exclusive; they are layers you add one at a time, in order of cost and complexity. An API means calling a pre-trained model (from OpenAI, Anthropic, Google or another) through an endpoint, without owning the model. RAG (retrieval-augmented generation) puts a search layer over your data: for each question, the system pulls the relevant fragments from your documents and hands them to the model as context, so it answers from them. Fine-tuning means continuing to train an existing model on your examples, to lock in its tone, format, or a repeatable behaviour. A custom model (trained from scratch, or starting from an open-source model you own and run yourself) gives you full control over the weights, the data, and where it runs. In our projects, the practical rule is simple: you move up the ladder only when the layer below does not solve the problem.

API vs RAG vs fine-tuning vs custom model — how do they differ on cost, control and data?

The table below puts the four approaches side by side on the criteria that matter when you decide: cost, control, what you do with your data, latency, and the right moment. Read it top to bottom — each extra row of control comes with extra cost and complexity.

CriterionAPI (pre-trained model)RAGFine-tuningCustom model
Setup costLow (pay per call)Low–medium (indexing + storage)Medium (labelled data + training run)High (data, infrastructure, team)
Control over behaviourLow (prompt only)Medium (you control the sources)High (tone and format locked in)Maximum (you own the weights)
How it uses your dataDoes not (prompt only)Reads them at runtime, always currentBakes them into the model at trainingFully owned, on-premise if you want
LatencyLow–mediumMedium (one extra retrieval step)Low–mediumDepends on your hardware
When to choose itGeneral case, fast launchAnswers on frequently changing dataFixed tone/format, narrow domainNiche, proprietary data, control needs

On ai-aflat.ro we index over 500,000 legislative texts, so we have built RAG on a large corpus in production and we know from practice where it breaks: at the quality of the retrieved fragments, not at the model. That is why we recommend RAG before fine-tuning whenever the real problem is "the model does not know my data", not "the model does not speak in my style".

When do you need RAG, and when fine-tuning?

The most common confusion is between RAG and fine-tuning, because they seem to solve the same problem. They do not. RAG changes what the model knows at answer time: you give it the right context, pulled from your data. Fine-tuning changes how the model behaves: you teach it a response pattern. If your data updates weekly (prices, stock, contracts, legislation), fine-tuning ages fast and would need retraining at every change, whereas RAG always reads the current version. If, instead, you want the model to always answer in a strict format (for example, to classify a ticket into fixed categories), fine-tuning is more stable than a long prompt. In many real projects we combine them: RAG for knowledge, fine-tuning for behaviour.

When is a custom model actually worth it?

A custom model, trained or hosted by you, is the right answer less often than it seems. It makes sense when the data cannot leave your infrastructure for confidentiality or regulatory reasons, when the call volume is so high that the per-call API cost becomes the main expense, when you need a small specialised model that runs cheaply on your own hardware, or when the model itself is a strategic differentiator, not just a component. At Sapio we do model training, so we will not talk you out of it; we will talk you out of starting with it. A custom model needs clean, labelled data, training infrastructure, and ongoing maintenance. Our practical recommendation: validate the case's value with an API plus RAG, then move to a custom model only if the cost, control, or confidentiality numbers clearly demand it.

What does the decision flow look like, step by step?

  1. Does a plain API solve the case with a good prompt? If yes, stop here — it is the cheapest, fastest option.
  2. Does the model need to answer from your specific data, which changes? Add RAG.
  3. Is there still a tone, format, or repeatable-behaviour problem that prompting cannot hold? Add fine-tuning on top.
  4. Do you have a clear confidentiality, volume, or cost constraint that demands full control? Only then evaluate a custom model.

If you want us to work out together which layer fits your case, you can learn more about our AI services and then book a free discovery call. In that call we look at your data, your volume, and your control requirements, then tell you honestly where you should stop on the ladder. The initial call is free; if the project calls for a deeper assessment, the next step is the AI Technical Audit, our paid service.

On ai-aflat.ro, our legal-assistant product, we index over 500,000 legislative texts — so we have built RAG on a large corpus in production.

Frequently asked questions

What is the difference between RAG and fine-tuning?

RAG changes what the model knows at answer time: it pulls fragments from your data and gives them as context, so it always stays current. Fine-tuning changes how the model behaves: you teach it a fixed tone or format, baked into the model at training. For frequently changing data, choose RAG; for repeatable behaviour, fine-tuning.

Do I need a custom model, or is an API enough?

For most cases, an API plus, where needed, RAG is enough and reaches production in days. A custom model is only worth it when the data cannot leave your infrastructure, when call volume makes the per-call cost the main expense, or when the model itself is a strategic differentiator. Validate the value with an API first.

When does it make sense to combine RAG with fine-tuning?

When you have both a knowledge problem and a behaviour problem. You use RAG so the model answers from your current data, and fine-tuning so it always responds in the required format or tone. In real projects the combination is common: RAG for knowledge, fine-tuning for behaviour, each solving a different half of the problem.

How long does it take to reach production with each approach?

An API with a good prompt can reach production in days. RAG adds time for indexing and structuring the data, usually a few weeks. Fine-tuning needs labelled data and a training run. A custom model is the longest — months, plus infrastructure and maintenance. That is why we recommend climbing the ladder only as far as necessary.

Does fine-tuning go stale if my data changes?

Yes. A fine-tuned model "knows" the data from training time; if prices, contracts, or legislation change, it would need retraining. That is why, when content updates often, RAG is the right choice: it always reads the current version of the documents, with no retraining. Fine-tuning stays suited to behaviour, which is stable over time.

Want to discuss a project?

Book a free discovery call with the Sapio team.