# Usage and Token Budgeting ## How Tokens Are Spent Tokens are consumed based on input length, output length, and tool usage. Long prompts and repeated context increase usage quickly. ### Simple Example If you paste a large file and ask for a rewrite, you pay for: 1. The pasted file (input tokens) 2. The model reasoning about it 3. The full rewritten output (output tokens) If you do that several times in a row, you can burn a large portion of your daily or monthly allowance quickly. ## Chat Context (Why New Chats Matter) Each chat keeps a running memory of what you said. That memory grows over time and gets sent back to the model, which costs more tokens and can slow responses. ### Best Practice - Start a new chat for each topic or task. - Do not keep one chat open for weeks and switch subjects. ### Why This Matters - Bigger context = more tokens used per message. - Larger contexts can slow response time. - Old context can confuse the model and reduce answer quality. ### Use Summaries To Reset Context When a chat gets long, ask for a short summary and start a new chat using that summary. This keeps context small and saves tokens. #### Example: Resetting A Long Chat 1. In the long chat, ask: "Summarize the current state and decisions in 8 bullets." 2. Copy the summary into a new chat. 3. Continue from there with a smaller context. ## Model Choice Matters (Plain Language) Think of each model as a different "speed and cost" setting. Some models are cheap and fast. Some are smarter but cost more for the same question. If you pick a higher-cost model, you can burn through your daily or monthly allowance much faster. ### Models Available (From The Copilot Picker) - Auto (10% discount) - GPT-4.1 (0x) - GPT-4o (0x) - GPT-5 mini (0x) - Grok Code Fast 1 - Claude Haiku 4.5 (0.33x) - Claude Opus 4.5 (3x) - Claude Opus 4.6 (3x) - Claude Sonnet 4 (1x) - Claude Sonnet 4.5 (1x) - Gemini 2.5 Pro (1x) - Gemini 3 Flash (Preview) (0.33x) - Gemini 3 Pro (Preview) (1x) - GPT-5 (1x) - GPT-5-Codex (Preview) (1x) - GPT-5.1 (1x) - GPT-5.1-Codex (1x) - GPT-5.1-Codex-Max (1x) - GPT-5.1-Codex-Mini (Preview) (0.33x) - GPT-5.2 (1x) - GPT-5.2-Codex (1x) ### Practical Guidance (Plain Language) - Use cheaper models for summaries, quick questions, and small edits. - Use expensive models only when the task is truly complex or high-stakes. - If you are unsure, start with Auto or a 0.33x or 1x option, then move up only if needed. #### Example: Choosing A Model - Task: "Summarize this file in 5 bullets." Use a 0.33x or 1x model. - Task: "Refactor three files and update tests." Start with a 1x model. Move to 3x only if the 1x model fails. - Task: "Explain a confusing production issue with lots of context." Start with 1x, and only move up if needed. ### Quick Glossary - Model: The "brain" Copilot uses to answer your question. - Multiplier: A cost factor. Higher number = faster token usage. - Tokens: The units that count your AI usage (roughly input + output size). ## Best Practices to Reduce Usage - Use clear, bounded requests with specific goals. - Prefer targeted edits over full rewrites. - Reuse context by referencing earlier outputs instead of re-pasting. - Ask for summaries before requesting changes. ### Before And After Example Bad: "Rewrite this entire module and update all tests." Better: "Only refactor the validation functions in this module. Keep existing behavior. List tests to update." ## Examples of Efficient Prompts - "Summarize this file in 5 bullets. Then propose a refactor plan." - "Update only the functions in this file that handle validation." - "List risks in this change and suggest tests to add." ## Daily and Monthly Budgeting Tips - Batch related questions in a single prompt. - Timebox explorations and stop when enough info is gathered. - Avoid repeated retries without changing the prompt. ### Example: Timeboxed Session 1. Ask for a 5-step plan. 2. Approve or adjust. 3. Ask for just step 1 or 2. 4. Stop and summarize before moving on. ## Budgeting Routine - Start with a plan-first request for large tasks. - Limit each request to one output type. - End sessions with a short summary for easy follow-up. ### Example: One Output Type Instead of: "Refactor the file, explain it, and add tests." Use: "Refactor the file only. Do not explain or add tests." Then follow up with a separate request if needed. ## Red Flags That Burn Tokens Quickly - Large file pastes with no clear ask. - Multiple full rewrites in one session. - Repeated "start over" requests. ## How You Can Burn A Full Day Fast (Example Scenarios) - You paste multiple large files and ask a 3x model to rewrite everything plus tests. - You keep asking a high-cost model to "start over" with a new approach. - You do a long debugging session on a big codebase using a 3x model for every step. - You ask for full architecture diagrams and long explanations from a high-cost model in one session. ### Realistic "New User" Scenario You open a single chat and do this all day: 1. Paste a large file and ask for a full rewrite. 2. Ask for a different rewrite using another approach. 3. Ask for full tests. 4. Ask for a full explanation of the changes. 5. Repeat with another file. If each step is done with a 3x model and a growing chat context, your token use can spike quickly and slow down responses. ## Team Habits That Help - Capture reusable prompts in a shared doc. - Standardize request templates. - Agree on when to use agents vs chat. ### Use The Right Chat Mode Using the lightest chat mode for the job keeps outputs smaller and cheaper. Example: Ask: "Summarize this file in 5 bullets." Plan: "Give me a 5-step plan to refactor this module. Wait for approval." Edit: "Update this function to handle empty input. Keep behavior the same otherwise." Agent: "Refactor these two files, update tests, and summarize results." ## Quick Checklist - Is the request specific and scoped? - Do I need the whole file or just a section? - Can I ask for a plan first?