6.5 KiB
Usage and Token Budgeting
How Tokens Are Spent
Tokens are consumed based on input length, output length, and tool usage. Long prompts and repeated context increase usage quickly.
Simple Example
If you paste a large file and ask for a rewrite, you pay for:
- The pasted file (input tokens)
- The model reasoning about it
- The full rewritten output (output tokens)
If you do that several times in a row, you can burn a large portion of your daily or monthly allowance quickly.
Chat Context (Why New Chats Matter)
Each chat keeps a running memory of what you said. That memory grows over time and gets sent back to the model, which costs more tokens and can slow responses.
Best Practice
- Start a new chat for each topic or task.
- Do not keep one chat open for weeks and switch subjects.
Why This Matters
- Bigger context = more tokens used per message.
- Larger contexts can slow response time.
- Old context can confuse the model and reduce answer quality.
Use Summaries To Reset Context
When a chat gets long, ask for a short summary and start a new chat using that summary. This keeps context small and saves tokens.
Example: Resetting A Long Chat
- In the long chat, ask: "Summarize the current state and decisions in 8 bullets."
- Copy the summary into a new chat.
- Continue from there with a smaller context.
Model Choice Matters
Think of each model as a different "speed and cost" setting. Some models are cheap and fast. Some are smarter but cost more for the same question. If you pick a higher-cost model, you can burn through your daily or monthly allowance much faster.
Models Available (From The Copilot Picker)
- Auto (10% discount)
- GPT-4.1 (0x)
- GPT-4o (0x)
- GPT-5 mini (0x)
- Grok Code Fast 1
- Claude Haiku 4.5 (0.33x)
- Claude Opus 4.5 (3x)
- Claude Opus 4.6 (3x)
- Claude Sonnet 4 (1x)
- Claude Sonnet 4.5 (1x)
- Gemini 2.5 Pro (1x)
- Gemini 3 Flash (Preview) (0.33x)
- Gemini 3 Pro (Preview) (1x)
- GPT-5 (1x)
- GPT-5-Codex (Preview) (1x)
- GPT-5.1 (1x)
- GPT-5.1-Codex (1x)
- GPT-5.1-Codex-Max (1x)
- GPT-5.1-Codex-Mini (Preview) (0.33x)
- GPT-5.2 (1x)
- GPT-5.2-Codex (1x)
Practical Guidance
- Use cheaper models for summaries, quick questions, and small edits.
- Use expensive models only when the task is truly complex or high-stakes.
- If you are unsure, start with Auto or a 0.33x or 1x option, then move up only if needed.
Chat Modes And Cost
Copilot chat has four modes. The lighter the mode, the less work you ask it to do.
Ask: Questions, summaries, or explanations.
Example prompt:
Summarize this file in 5 bullets and list 2 risks.
Edit: Small, targeted changes with clear constraints.
Example prompt:
Update this function to return nil on empty input. Keep behavior the same otherwise.
Plan: Get steps before edits.
Example prompt:
Give me a 5-step plan to refactor this module. Wait for approval before edits.
Agent: Multi-step work across files or tools.
Example prompt:
Refactor these two files, update tests, run the test task, and summarize results.
Example: Choosing A Model
- Task: "Summarize this file in 5 bullets." Use a 0.33x or 1x model.
- Task: "Refactor three files and update tests." Start with a 1x model. Move to 3x only if the 1x model fails.
- Task: "Explain a confusing production issue with lots of context." Start with 1x, and only move up if needed.
Quick Glossary
- Model: The "brain" Copilot uses to answer your question.
- Multiplier: A cost factor. Higher number = faster token usage.
- Tokens: The units that count your AI usage (roughly input + output size).
Best Practices to Reduce Usage
- Use clear, bounded requests with specific goals.
- Prefer targeted edits over full rewrites.
- Reuse context by referencing earlier outputs instead of re-pasting.
- Ask for summaries before requesting changes.
Before And After Example
Bad: "Rewrite this entire module and update all tests."
Better: "Only refactor the validation functions in this module. Keep existing behavior. List tests to update."
Examples of Efficient Prompts
- "Summarize this file in 5 bullets. Then propose a refactor plan."
- "Update only the functions in this file that handle validation."
- "List risks in this change and suggest tests to add."
Daily and Monthly Budgeting Tips
- Batch related questions in a single prompt.
- Timebox explorations and stop when enough info is gathered.
- Avoid repeated retries without changing the prompt.
Example: Timeboxed Session
- Ask for a 5-step plan.
- Approve or adjust.
- Ask for just step 1 or 2.
- Stop and summarize before moving on.
Budgeting Routine
- Start with a plan-first request for large tasks.
- Limit each request to one output type.
- End sessions with a short summary for easy follow-up.
Example: One Output Type
Instead of: "Refactor the file, explain it, and add tests." Use: "Refactor the file only. Do not explain or add tests." Then follow up with a separate request if needed.
Red Flags That Burn Tokens Quickly
- Large file pastes with no clear ask.
- Multiple full rewrites in one session.
- Repeated "start over" requests.
How You Can Burn A Full Day Fast (Example Scenarios)
- You paste multiple large files and ask a 3x model to rewrite everything plus tests.
- You keep asking a high-cost model to "start over" with a new approach.
- You do a long debugging session on a big codebase using a 3x model for every step.
- You ask for full architecture diagrams and long explanations from a high-cost model in one session.
Realistic "New User" Scenario
You open a single chat and do this all day:
- Paste a large file and ask for a full rewrite.
- Ask for a different rewrite using another approach.
- Ask for full tests.
- Ask for a full explanation of the changes.
- Repeat with another file.
If each step is done with a 3x model and a growing chat context, your token use can spike quickly and slow down responses.
Team Habits That Help
- Capture reusable prompts in a shared doc.
- Standardize request templates.
- Agree on when to use agents vs chat.
Use The Right Chat Mode
Using the lightest chat mode for the job keeps outputs smaller and cheaper.
Example prompts:
Ask: Summarize this file in 5 bullets.
Plan: Give me a 5-step plan to refactor this module. Wait for approval.
Edit: Update this function to handle empty input. Keep behavior the same otherwise.
Agent: Refactor these two files, update tests, and summarize results.
Quick Checklist
- Is the request specific and scoped?
- Do I need the whole file or just a section?
- Can I ask for a plan first?