Context Window
Date Published

Context Window - One of the ways of providing data to a model is by keeping the conversation you have with it in its short term memory and this is known as the Context Window. It is the amount of information in the form of tokens that a model can remember.
Think of a context window like a desk you’re working on. The desk represents how much information the model can “see” at once.
Everything on the desk can be referenced, connected, and reasoned about and anything not on the desk is effectively out of sight.
A small desk (small context window):
Great for focused tasks
Forces you to be concise
Gets cluttered quickly
A large desk (large context window):
Lets you spread out documents
Compare ideas across long texts
Handle complex, multi-part tasks
But: a bigger desk doesn’t automatically make you smarter — it just lets you see more at the same time.
Why context window matters when choosing an LLM?
Different tasks need different “desk sizes.”
Example 1: Short, precise tasks
Task: writing a SQL query, fixing a function, summarizing a paragraph
Needs: focus, precision
Context: small
Model choice: smaller or faster LLMs work perfectly
Using a massive context window here is like using a conference table to write a sticky note.
Example 2: Working with long documents
Task: summarizing a research paper, comparing multiple articles, analyzing contracts
Needs: memory across long text
Context: large
Model choice: LLMs with large context windows
Here, a small context window forces you to chunk aggressively and risks losing connections between sections.
A lot of Nano and Mini models have a smaller context window because of the number of tokens they use which is around 2000 to 4000. This translates to around 1500-3000 words in the english literature. Whereas GPT-4.1, Google Gemini 2.5 Pro has around 1M tokens which is equivalent to 750k words of english literature.
Important takeaways
A larger context window:
≠ deeper understanding
≠ better reasoning
≠ more intelligent
It simply means the model can condition its output on more tokens at once.
Choosing the “best” LLM is really about matching the model’s strengths (context size, reasoning, speed, cost) to the task.
