Version: v2

Training & Fine-Tuning

This page answers the most common question about AI assistants: "Can we train this to know more / behave better?"

Short answer: you don't train GPT-4.1 itself — but there are three progressively more powerful ways to make the assistant smarter for CCG, and the two cheapest ones require no model changes at all.

Option 1: Improve the Index (Free, Highest ROI)

What it does: Feeds the model better, more complete documentation to reason from.

The assistant retrieves documentation from Azure AI Search and passes it as context to GPT-4.1. If the retrieved content is missing details, incomplete, or stale — the model has nothing to work with regardless of how good the model is.

What to improve

Problem symptom	Root cause	Fix
"I don't have enough information…" for a documented feature	Page is missing from the index	Run `yarn build` then `node scripts/uploadToAzureSearch.js` after adding the doc
Answer omits important fields	Doc doesn't mention them	Add the fields to the relevant `.md` file or OpenAPI spec
Gives v1 answers for v2 questions	v1 entries outrank v2 in search	Two-pass search already prioritises v2; if still wrong, improve the v2 operation summary in the spec
Wrong answer for a niche integration	No doc exists for it	Write the doc under `docs/03-developers/` and re-index

Re-index after doc changes

node scripts/ai-search/localAiAssistant.js index

Or run the full upload directly via curl:

curl --request POST \
  --url http://localhost:8000/api/index \
  --header 'Content-Type: application/json' \
  --data '{
    "docs_dir": ["/path/to/wallet-ConvenientCheckout/docs"],
    "yaml_dir": ["/path/to/wallet-ConvenientCheckout/build/redocusaurus"],
    "index_name": "wallet-docs-index-v2",
    "overwrite": true
  }'

Option 2: Improve the System Prompt (Free, Fast Iteration)

What it does: Changes how the model interprets the retrieved context and formats responses.

See ai-assistant-prompts.md for full details. In short:

Add a new audience section for a new consumer group
Tighten hallucination controls if the model invents fields
Change the response format (sections, order, tone)
Add example responses to steer output quality

Prompt changes take effect immediately on server restart — no re-indexing needed.

Option 3: Fine-Tuning GPT-4.1 (Paid, Complex, Rarely Needed)

What it is: Fine-tuning trains a copy of a base model on a dataset of your own example conversations. The result is a new deployment that has baked-in CCG knowledge — it no longer needs the full RAG context for common questions.

Can we do it?

Yes. Azure OpenAI supports fine-tuning for GPT-4.1 as of 2025. Steps:

Prepare a dataset — minimum ~50–200 conversation examples in JSONL format:

{"messages": [
  {"role": "system",    "content": "<your system prompt>"},
  {"role": "user",      "content": "How do I create a checkout session?"},
  {"role": "assistant", "content": "**Summary**\nTo create a checkout session call POST /v2/sessions..."}
]}

Upload the dataset via Azure OpenAI Studio or the REST API
Trigger a fine-tuning job — takes 30 minutes to several hours depending on dataset size
Deploy the fine-tuned model under a new deployment name (e.g. gpt-4-1-ccg)
Update AZURE_OPENAI_DEPLOYMENT in walletAIService.js or the explicit server config

Should we?

Probably not — yet. Fine-tuning is the right tool when:

The index-and-prompt approach has been exhausted and quality is still poor
You need the model to respond in a highly specific, rigid format that the system prompt can't enforce
You are seeing the same hallucinations repeatedly despite prompt tightening
You want to reduce input tokens by removing the need to pass context (the model "knows" the docs)

For CCG right now, improving documentation coverage and keeping the system prompt up to date will outperform any fine-tuning effort at a fraction of the cost.

Cost warning

Fine-tuning GPT-4.1 costs:

Training: approximately $25 per 1M training tokens
Hosting: fine-tuned deployments have a dedicated capacity charge (~$3–6/hour in addition to per-token usage)

A fine-tuned deployment is only cost-effective if request volume is high enough (>10 000 requests/day) to justify the always-on hosting fee.

Option 4: Retrieval Augmentation — Expanding the Knowledge Base

What it does: Adds new content types to the index — not just markdown and OpenAPI, but also videos, PDFs, external pages, or GitHub issues — so the model can draw from a wider source of truth.

The upload script (scripts/uploadToAzureSearch.js) is designed to be extended. To add a new content source:

Write a new load*Entries() function in the upload script
Produce an array of objects with the same schema as loadDocEntries() returns
Append to the allDocs array before the upload loop

Example: adding internal Confluence pages

async function loadConfluenceEntries() {
  // Fetch from Confluence REST API, parse HTML → plain text, return doc objects
}
const allDocs = [...docEntries, ...specEntries, ...await loadConfluenceEntries()];

Summary: Which Option for Which Problem?

Goal	Best option	Effort
Model doesn't know about a feature	Add/improve the doc → re-index	Low
Model invents fields or endpoints	Tighten the system prompt	Low
Model uses wrong tone or format	Update the system prompt format section	Low
Model gives outdated answers	Re-index after updating docs	Low
Model needs to understand a completely new domain	Add new content source or fine-tune	Medium–High
Reduce latency for common questions	Fine-tune (baked-in knowledge, shorter context)	High
Make the assistant understand audio/images	Multimodal model upgrade	High

Option 1: Improve the Index (Free, Highest ROI)​

What to improve​

Re-index after doc changes​

Option 2: Improve the System Prompt (Free, Fast Iteration)​

Option 3: Fine-Tuning GPT-4.1 (Paid, Complex, Rarely Needed)​

Can we do it?​

Should we?​

Cost warning​

Option 4: Retrieval Augmentation — Expanding the Knowledge Base​

Summary: Which Option for Which Problem?​

Related​