162 lines
6.5 KiB
Markdown
162 lines
6.5 KiB
Markdown
# Usage and Token Budgeting
|
|
|
|
## How Tokens Are Spent
|
|
Tokens are consumed based on input length, output length, and tool usage. Long prompts and repeated context increase usage quickly.
|
|
|
|
### Simple Example
|
|
If you paste a large file and ask for a rewrite, you pay for:
|
|
1. The pasted file (input tokens)
|
|
2. The model reasoning about it
|
|
3. The full rewritten output (output tokens)
|
|
|
|
If you do that several times in a row, you can burn a large portion of your daily or monthly allowance quickly.
|
|
|
|
## Chat Context (Why New Chats Matter)
|
|
Each chat keeps a running memory of what you said. That memory grows over time and gets sent back to the model, which costs more tokens and can slow responses.
|
|
|
|
### Best Practice
|
|
- Start a new chat for each topic or task.
|
|
- Do not keep one chat open for weeks and switch subjects.
|
|
|
|
### Why This Matters
|
|
- Bigger context = more tokens used per message.
|
|
- Larger contexts can slow response time.
|
|
- Old context can confuse the model and reduce answer quality.
|
|
|
|
### Use Summaries To Reset Context
|
|
When a chat gets long, ask for a short summary and start a new chat using that summary. This keeps context small and saves tokens.
|
|
|
|
#### Example: Resetting A Long Chat
|
|
1. In the long chat, ask: "Summarize the current state and decisions in 8 bullets."
|
|
2. Copy the summary into a new chat.
|
|
3. Continue from there with a smaller context.
|
|
|
|
## Model Choice Matters (Plain Language)
|
|
Think of each model as a different "speed and cost" setting. Some models are cheap and fast. Some are smarter but cost more for the same question. If you pick a higher-cost model, you can burn through your daily or monthly allowance much faster.
|
|
|
|
### Models Available (From The Copilot Picker)
|
|
- Auto (10% discount)
|
|
- GPT-4.1 (0x)
|
|
- GPT-4o (0x)
|
|
- GPT-5 mini (0x)
|
|
- Grok Code Fast 1
|
|
- Claude Haiku 4.5 (0.33x)
|
|
- Claude Opus 4.5 (3x)
|
|
- Claude Opus 4.6 (3x)
|
|
- Claude Sonnet 4 (1x)
|
|
- Claude Sonnet 4.5 (1x)
|
|
- Gemini 2.5 Pro (1x)
|
|
- Gemini 3 Flash (Preview) (0.33x)
|
|
- Gemini 3 Pro (Preview) (1x)
|
|
- GPT-5 (1x)
|
|
- GPT-5-Codex (Preview) (1x)
|
|
- GPT-5.1 (1x)
|
|
- GPT-5.1-Codex (1x)
|
|
- GPT-5.1-Codex-Max (1x)
|
|
- GPT-5.1-Codex-Mini (Preview) (0.33x)
|
|
- GPT-5.2 (1x)
|
|
- GPT-5.2-Codex (1x)
|
|
|
|
### Practical Guidance (Plain Language)
|
|
- Use cheaper models for summaries, quick questions, and small edits.
|
|
- Use expensive models only when the task is truly complex or high-stakes.
|
|
- If you are unsure, start with Auto or a 0.33x or 1x option, then move up only if needed.
|
|
|
|
## Chat Modes And Cost (Plain Language)
|
|
Copilot chat has four modes. The lighter the mode, the less work you ask it to do.
|
|
|
|
Ask: Questions, summaries, or explanations. Example: "Summarize this file in 5 bullets and list 2 risks."
|
|
|
|
Edit: Small, targeted changes with clear constraints. Example: "Update this function to return nil on empty input. Keep behavior the same otherwise."
|
|
|
|
Plan: Get steps before edits. Example: "Give me a 5-step plan to refactor this module. Wait for approval before edits."
|
|
|
|
Agent: Multi-step work across files or tools. Example: "Refactor these two files, update tests, run the test task, and summarize results."
|
|
|
|
#### Example: Choosing A Model
|
|
- Task: "Summarize this file in 5 bullets." Use a 0.33x or 1x model.
|
|
- Task: "Refactor three files and update tests." Start with a 1x model. Move to 3x only if the 1x model fails.
|
|
- Task: "Explain a confusing production issue with lots of context." Start with 1x, and only move up if needed.
|
|
|
|
### Quick Glossary
|
|
- Model: The "brain" Copilot uses to answer your question.
|
|
- Multiplier: A cost factor. Higher number = faster token usage.
|
|
- Tokens: The units that count your AI usage (roughly input + output size).
|
|
|
|
## Best Practices to Reduce Usage
|
|
- Use clear, bounded requests with specific goals.
|
|
- Prefer targeted edits over full rewrites.
|
|
- Reuse context by referencing earlier outputs instead of re-pasting.
|
|
- Ask for summaries before requesting changes.
|
|
|
|
### Before And After Example
|
|
Bad: "Rewrite this entire module and update all tests."
|
|
|
|
Better: "Only refactor the validation functions in this module. Keep existing behavior. List tests to update."
|
|
|
|
## Examples of Efficient Prompts
|
|
- "Summarize this file in 5 bullets. Then propose a refactor plan."
|
|
- "Update only the functions in this file that handle validation."
|
|
- "List risks in this change and suggest tests to add."
|
|
|
|
## Daily and Monthly Budgeting Tips
|
|
- Batch related questions in a single prompt.
|
|
- Timebox explorations and stop when enough info is gathered.
|
|
- Avoid repeated retries without changing the prompt.
|
|
|
|
### Example: Timeboxed Session
|
|
1. Ask for a 5-step plan.
|
|
2. Approve or adjust.
|
|
3. Ask for just step 1 or 2.
|
|
4. Stop and summarize before moving on.
|
|
|
|
## Budgeting Routine
|
|
- Start with a plan-first request for large tasks.
|
|
- Limit each request to one output type.
|
|
- End sessions with a short summary for easy follow-up.
|
|
|
|
### Example: One Output Type
|
|
Instead of: "Refactor the file, explain it, and add tests."
|
|
Use: "Refactor the file only. Do not explain or add tests."
|
|
Then follow up with a separate request if needed.
|
|
|
|
## Red Flags That Burn Tokens Quickly
|
|
- Large file pastes with no clear ask.
|
|
- Multiple full rewrites in one session.
|
|
- Repeated "start over" requests.
|
|
|
|
## How You Can Burn A Full Day Fast (Example Scenarios)
|
|
- You paste multiple large files and ask a 3x model to rewrite everything plus tests.
|
|
- You keep asking a high-cost model to "start over" with a new approach.
|
|
- You do a long debugging session on a big codebase using a 3x model for every step.
|
|
- You ask for full architecture diagrams and long explanations from a high-cost model in one session.
|
|
|
|
### Realistic "New User" Scenario
|
|
You open a single chat and do this all day:
|
|
1. Paste a large file and ask for a full rewrite.
|
|
2. Ask for a different rewrite using another approach.
|
|
3. Ask for full tests.
|
|
4. Ask for a full explanation of the changes.
|
|
5. Repeat with another file.
|
|
|
|
If each step is done with a 3x model and a growing chat context, your token use can spike quickly and slow down responses.
|
|
|
|
## Team Habits That Help
|
|
- Capture reusable prompts in a shared doc.
|
|
- Standardize request templates.
|
|
- Agree on when to use agents vs chat.
|
|
|
|
### Use The Right Chat Mode
|
|
Using the lightest chat mode for the job keeps outputs smaller and cheaper.
|
|
|
|
Example:
|
|
Ask: "Summarize this file in 5 bullets."
|
|
Plan: "Give me a 5-step plan to refactor this module. Wait for approval."
|
|
Edit: "Update this function to handle empty input. Keep behavior the same otherwise."
|
|
Agent: "Refactor these two files, update tests, and summarize results."
|
|
|
|
## Quick Checklist
|
|
- Is the request specific and scoped?
|
|
- Do I need the whole file or just a section?
|
|
- Can I ask for a plan first?
|