Training & Fine-Tuning
This page answers the most common question about AI assistants: "Can we train this to know more / behave better?"
Short answer: you don't train GPT-4.1 itself β but there are three progressively more powerful ways to make the assistant smarter for CCG, and the two cheapest ones require no model changes at all.
Option 1: Improve the Index (Free, Highest ROI)β
What it does: Feeds the model better, more complete documentation to reason from.
The assistant retrieves documentation from Azure AI Search and passes it as context to GPT-4.1. If the retrieved content is missing details, incomplete, or stale β the model has nothing to work with regardless of how good the model is.
What to improveβ
| Problem symptom | Root cause | Fix |
|---|---|---|
| "I don't have enough informationβ¦" for a documented feature | Page is missing from the index | Run yarn build then node scripts/uploadToAzureSearch.js after adding the doc |
| Answer omits important fields | Doc doesn't mention them | Add the fields to the relevant .md file or OpenAPI spec |
| Gives v1 answers for v2 questions | v1 entries outrank v2 in search | Two-pass search already prioritises v2; if still wrong, improve the v2 operation summary in the spec |
| Wrong answer for a niche integration | No doc exists for it | Write the doc under docs/03-developers/ and re-index |
Re-index after doc changesβ
node scripts/ai-search/localAiAssistant.js index
Or run the full upload directly via curl:
curl --request POST \
--url http://localhost:8000/api/index \
--header 'Content-Type: application/json' \
--data '{
"docs_dir": ["/path/to/wallet-ConvenientCheckout/docs"],
"yaml_dir": ["/path/to/wallet-ConvenientCheckout/build/redocusaurus"],
"index_name": "wallet-docs-index-v2",
"overwrite": true
}'
Option 2: Improve the System Prompt (Free, Fast Iteration)β
What it does: Changes how the model interprets the retrieved context and formats responses.
See ai-assistant-prompts.md for full details. In short:
- Add a new audience section for a new consumer group
- Tighten hallucination controls if the model invents fields
- Change the response format (sections, order, tone)
- Add example responses to steer output quality
Prompt changes take effect immediately on server restart β no re-indexing needed.
Option 3: Fine-Tuning GPT-4.1 (Paid, Complex, Rarely Needed)β
What it is: Fine-tuning trains a copy of a base model on a dataset of your own example conversations. The result is a new deployment that has baked-in CCG knowledge β it no longer needs the full RAG context for common questions.
Can we do it?β
Yes. Azure OpenAI supports fine-tuning for GPT-4.1 as of 2025. Steps:
-
Prepare a dataset β minimum ~50β200 conversation examples in JSONL format:
{"messages": [
{"role": "system", "content": "<your system prompt>"},
{"role": "user", "content": "How do I create a checkout session?"},
{"role": "assistant", "content": "**Summary**\nTo create a checkout session call POST /v2/sessions..."}
]} -
Upload the dataset via Azure OpenAI Studio or the REST API
-
Trigger a fine-tuning job β takes 30 minutes to several hours depending on dataset size
-
Deploy the fine-tuned model under a new deployment name (e.g.
gpt-4-1-ccg) -
Update
AZURE_OPENAI_DEPLOYMENTinwalletAIService.jsor the explicit server config
Should we?β
Probably not β yet. Fine-tuning is the right tool when:
- The index-and-prompt approach has been exhausted and quality is still poor
- You need the model to respond in a highly specific, rigid format that the system prompt can't enforce
- You are seeing the same hallucinations repeatedly despite prompt tightening
- You want to reduce input tokens by removing the need to pass context (the model "knows" the docs)
For CCG right now, improving documentation coverage and keeping the system prompt up to date will outperform any fine-tuning effort at a fraction of the cost.
Cost warningβ
Fine-tuning GPT-4.1 costs:
- Training: approximately $25 per 1M training tokens
- Hosting: fine-tuned deployments have a dedicated capacity charge (~$3β6/hour in addition to per-token usage)
A fine-tuned deployment is only cost-effective if request volume is high enough (>10 000 requests/day) to justify the always-on hosting fee.
Option 4: Retrieval Augmentation β Expanding the Knowledge Baseβ
What it does: Adds new content types to the index β not just markdown and OpenAPI, but also videos, PDFs, external pages, or GitHub issues β so the model can draw from a wider source of truth.
The upload script (scripts/uploadToAzureSearch.js) is designed to be extended. To add a new content source:
- Write a new
load*Entries()function in the upload script - Produce an array of objects with the same schema as
loadDocEntries()returns - Append to the
allDocsarray before the upload loop
Example: adding internal Confluence pages
async function loadConfluenceEntries() {
// Fetch from Confluence REST API, parse HTML β plain text, return doc objects
}
const allDocs = [...docEntries, ...specEntries, ...await loadConfluenceEntries()];
Summary: Which Option for Which Problem?β
| Goal | Best option | Effort |
|---|---|---|
| Model doesn't know about a feature | Add/improve the doc β re-index | Low |
| Model invents fields or endpoints | Tighten the system prompt | Low |
| Model uses wrong tone or format | Update the system prompt format section | Low |
| Model gives outdated answers | Re-index after updating docs | Low |
| Model needs to understand a completely new domain | Add new content source or fine-tune | MediumβHigh |
| Reduce latency for common questions | Fine-tune (baked-in knowledge, shorter context) | High |
| Make the assistant understand audio/images | Multimodal model upgrade | High |