Skip to main content
Version: v2

Training & Fine-Tuning

This page answers the most common question about AI assistants: "Can we train this to know more / behave better?"

Short answer: you don't train GPT-4.1 itself β€” but there are three progressively more powerful ways to make the assistant smarter for CCG, and the two cheapest ones require no model changes at all.


Option 1: Improve the Index (Free, Highest ROI)​

What it does: Feeds the model better, more complete documentation to reason from.

The assistant retrieves documentation from Azure AI Search and passes it as context to GPT-4.1. If the retrieved content is missing details, incomplete, or stale β€” the model has nothing to work with regardless of how good the model is.

What to improve​

Problem symptomRoot causeFix
"I don't have enough information…" for a documented featurePage is missing from the indexRun yarn build then node scripts/uploadToAzureSearch.js after adding the doc
Answer omits important fieldsDoc doesn't mention themAdd the fields to the relevant .md file or OpenAPI spec
Gives v1 answers for v2 questionsv1 entries outrank v2 in searchTwo-pass search already prioritises v2; if still wrong, improve the v2 operation summary in the spec
Wrong answer for a niche integrationNo doc exists for itWrite the doc under docs/03-developers/ and re-index

Re-index after doc changes​

node scripts/ai-search/localAiAssistant.js index

Or run the full upload directly via curl:

curl --request POST \
--url http://localhost:8000/api/index \
--header 'Content-Type: application/json' \
--data '{
"docs_dir": ["/path/to/wallet-ConvenientCheckout/docs"],
"yaml_dir": ["/path/to/wallet-ConvenientCheckout/build/redocusaurus"],
"index_name": "wallet-docs-index-v2",
"overwrite": true
}'

Option 2: Improve the System Prompt (Free, Fast Iteration)​

What it does: Changes how the model interprets the retrieved context and formats responses.

See ai-assistant-prompts.md for full details. In short:

  • Add a new audience section for a new consumer group
  • Tighten hallucination controls if the model invents fields
  • Change the response format (sections, order, tone)
  • Add example responses to steer output quality

Prompt changes take effect immediately on server restart β€” no re-indexing needed.


Option 3: Fine-Tuning GPT-4.1 (Paid, Complex, Rarely Needed)​

What it is: Fine-tuning trains a copy of a base model on a dataset of your own example conversations. The result is a new deployment that has baked-in CCG knowledge β€” it no longer needs the full RAG context for common questions.

Can we do it?​

Yes. Azure OpenAI supports fine-tuning for GPT-4.1 as of 2025. Steps:

  1. Prepare a dataset β€” minimum ~50–200 conversation examples in JSONL format:

    {"messages": [
    {"role": "system", "content": "<your system prompt>"},
    {"role": "user", "content": "How do I create a checkout session?"},
    {"role": "assistant", "content": "**Summary**\nTo create a checkout session call POST /v2/sessions..."}
    ]}
  2. Upload the dataset via Azure OpenAI Studio or the REST API

  3. Trigger a fine-tuning job β€” takes 30 minutes to several hours depending on dataset size

  4. Deploy the fine-tuned model under a new deployment name (e.g. gpt-4-1-ccg)

  5. Update AZURE_OPENAI_DEPLOYMENT in walletAIService.js or the explicit server config

Should we?​

Probably not β€” yet. Fine-tuning is the right tool when:

  • The index-and-prompt approach has been exhausted and quality is still poor
  • You need the model to respond in a highly specific, rigid format that the system prompt can't enforce
  • You are seeing the same hallucinations repeatedly despite prompt tightening
  • You want to reduce input tokens by removing the need to pass context (the model "knows" the docs)

For CCG right now, improving documentation coverage and keeping the system prompt up to date will outperform any fine-tuning effort at a fraction of the cost.

Cost warning​

Fine-tuning GPT-4.1 costs:

  • Training: approximately $25 per 1M training tokens
  • Hosting: fine-tuned deployments have a dedicated capacity charge (~$3–6/hour in addition to per-token usage)

A fine-tuned deployment is only cost-effective if request volume is high enough (>10 000 requests/day) to justify the always-on hosting fee.


Option 4: Retrieval Augmentation β€” Expanding the Knowledge Base​

What it does: Adds new content types to the index β€” not just markdown and OpenAPI, but also videos, PDFs, external pages, or GitHub issues β€” so the model can draw from a wider source of truth.

The upload script (scripts/uploadToAzureSearch.js) is designed to be extended. To add a new content source:

  1. Write a new load*Entries() function in the upload script
  2. Produce an array of objects with the same schema as loadDocEntries() returns
  3. Append to the allDocs array before the upload loop

Example: adding internal Confluence pages

async function loadConfluenceEntries() {
// Fetch from Confluence REST API, parse HTML β†’ plain text, return doc objects
}
const allDocs = [...docEntries, ...specEntries, ...await loadConfluenceEntries()];

Summary: Which Option for Which Problem?​

GoalBest optionEffort
Model doesn't know about a featureAdd/improve the doc β†’ re-indexLow
Model invents fields or endpointsTighten the system promptLow
Model uses wrong tone or formatUpdate the system prompt format sectionLow
Model gives outdated answersRe-index after updating docsLow
Model needs to understand a completely new domainAdd new content source or fine-tuneMedium–High
Reduce latency for common questionsFine-tune (baked-in knowledge, shorter context)High
Make the assistant understand audio/imagesMultimodal model upgradeHigh