Matt Bruce efb59096e7 docs: remove plain language headings

2026-02-10 15:34:12 -06:00

6.5 KiB

Raw Blame History

Usage and Token Budgeting

How Tokens Are Spent

Tokens are consumed based on input length, output length, and tool usage. Long prompts and repeated context increase usage quickly.

Simple Example

If you paste a large file and ask for a rewrite, you pay for:

The pasted file (input tokens)
The model reasoning about it
The full rewritten output (output tokens)

If you do that several times in a row, you can burn a large portion of your daily or monthly allowance quickly.

Chat Context (Why New Chats Matter)

Each chat keeps a running memory of what you said. That memory grows over time and gets sent back to the model, which costs more tokens and can slow responses.

Best Practice

Start a new chat for each topic or task.
Do not keep one chat open for weeks and switch subjects.

Why This Matters

Bigger context = more tokens used per message.
Larger contexts can slow response time.
Old context can confuse the model and reduce answer quality.

Use Summaries To Reset Context

When a chat gets long, ask for a short summary and start a new chat using that summary. This keeps context small and saves tokens.

Example: Resetting A Long Chat

In the long chat, ask: "Summarize the current state and decisions in 8 bullets."
Copy the summary into a new chat.
Continue from there with a smaller context.

Model Choice Matters

Think of each model as a different "speed and cost" setting. Some models are cheap and fast. Some are smarter but cost more for the same question. If you pick a higher-cost model, you can burn through your daily or monthly allowance much faster.

Models Available (From The Copilot Picker)

Auto (10% discount)
GPT-4.1 (0x)
GPT-4o (0x)
GPT-5 mini (0x)
Grok Code Fast 1
Claude Haiku 4.5 (0.33x)
Claude Opus 4.5 (3x)
Claude Opus 4.6 (3x)
Claude Sonnet 4 (1x)
Claude Sonnet 4.5 (1x)
Gemini 2.5 Pro (1x)
Gemini 3 Flash (Preview) (0.33x)
Gemini 3 Pro (Preview) (1x)
GPT-5 (1x)
GPT-5-Codex (Preview) (1x)
GPT-5.1 (1x)
GPT-5.1-Codex (1x)
GPT-5.1-Codex-Max (1x)
GPT-5.1-Codex-Mini (Preview) (0.33x)
GPT-5.2 (1x)
GPT-5.2-Codex (1x)

Practical Guidance

Use cheaper models for summaries, quick questions, and small edits.
Use expensive models only when the task is truly complex or high-stakes.
If you are unsure, start with Auto or a 0.33x or 1x option, then move up only if needed.

Chat Modes And Cost

Copilot chat has four modes. The lighter the mode, the less work you ask it to do.

Ask: Questions, summaries, or explanations.

Example prompt:

Summarize this file in 5 bullets and list 2 risks.

Edit: Small, targeted changes with clear constraints.

Example prompt:

Update this function to return nil on empty input. Keep behavior the same otherwise.

Plan: Get steps before edits.

Example prompt:

Give me a 5-step plan to refactor this module. Wait for approval before edits.

Agent: Multi-step work across files or tools.

Example prompt:

Refactor these two files, update tests, run the test task, and summarize results.

Example: Choosing A Model

Task: "Summarize this file in 5 bullets." Use a 0.33x or 1x model.
Task: "Refactor three files and update tests." Start with a 1x model. Move to 3x only if the 1x model fails.
Task: "Explain a confusing production issue with lots of context." Start with 1x, and only move up if needed.

Quick Glossary

Model: The "brain" Copilot uses to answer your question.
Multiplier: A cost factor. Higher number = faster token usage.
Tokens: The units that count your AI usage (roughly input + output size).

Best Practices to Reduce Usage

Use clear, bounded requests with specific goals.
Prefer targeted edits over full rewrites.
Reuse context by referencing earlier outputs instead of re-pasting.
Ask for summaries before requesting changes.

Before And After Example

Bad: "Rewrite this entire module and update all tests."

Better: "Only refactor the validation functions in this module. Keep existing behavior. List tests to update."

Examples of Efficient Prompts

"Summarize this file in 5 bullets. Then propose a refactor plan."
"Update only the functions in this file that handle validation."
"List risks in this change and suggest tests to add."

Daily and Monthly Budgeting Tips

Batch related questions in a single prompt.
Timebox explorations and stop when enough info is gathered.
Avoid repeated retries without changing the prompt.

Example: Timeboxed Session

Ask for a 5-step plan.
Approve or adjust.
Ask for just step 1 or 2.
Stop and summarize before moving on.

Budgeting Routine

Start with a plan-first request for large tasks.
Limit each request to one output type.
End sessions with a short summary for easy follow-up.

Example: One Output Type

Instead of: "Refactor the file, explain it, and add tests." Use: "Refactor the file only. Do not explain or add tests." Then follow up with a separate request if needed.

Red Flags That Burn Tokens Quickly

Large file pastes with no clear ask.
Multiple full rewrites in one session.
Repeated "start over" requests.

How You Can Burn A Full Day Fast (Example Scenarios)

You paste multiple large files and ask a 3x model to rewrite everything plus tests.
You keep asking a high-cost model to "start over" with a new approach.
You do a long debugging session on a big codebase using a 3x model for every step.
You ask for full architecture diagrams and long explanations from a high-cost model in one session.

Realistic "New User" Scenario

You open a single chat and do this all day:

Paste a large file and ask for a full rewrite.
Ask for a different rewrite using another approach.
Ask for full tests.
Ask for a full explanation of the changes.
Repeat with another file.

If each step is done with a 3x model and a growing chat context, your token use can spike quickly and slow down responses.

Team Habits That Help

Capture reusable prompts in a shared doc.
Standardize request templates.
Agree on when to use agents vs chat.

Use The Right Chat Mode

Using the lightest chat mode for the job keeps outputs smaller and cheaper.

Example prompts:

Ask: Summarize this file in 5 bullets.
Plan: Give me a 5-step plan to refactor this module. Wait for approval.
Edit: Update this function to handle empty input. Keep behavior the same otherwise.
Agent: Refactor these two files, update tests, and summarize results.

Quick Checklist

Is the request specific and scoped?
Do I need the whole file or just a section?
Can I ask for a plan first?

6.5 KiB Raw Blame History

Usage and Token Budgeting

How Tokens Are Spent

Simple Example

Chat Context (Why New Chats Matter)

Best Practice

Why This Matters

Use Summaries To Reset Context

Example: Resetting A Long Chat

Model Choice Matters

Models Available (From The Copilot Picker)

Practical Guidance

Chat Modes And Cost

Example: Choosing A Model

Quick Glossary

Best Practices to Reduce Usage

Before And After Example

Examples of Efficient Prompts

Daily and Monthly Budgeting Tips

Example: Timeboxed Session

Budgeting Routine

Example: One Output Type

Red Flags That Burn Tokens Quickly

How You Can Burn A Full Day Fast (Example Scenarios)

Realistic "New User" Scenario

Team Habits That Help

Use The Right Chat Mode

Quick Checklist

6.5 KiB

Raw Blame History