2025-06-25 • Guides
When working with large language models (LLMs) like GPT-4, Claude, or Gemini, you might notice something strange — responses get cut off, or the model forgets something you said earlier. The reason often comes down to a concept called token limits.
A token is a chunk of text that a language model processes. Tokens can be as short as one character or as long as one word, depending on the language and complexity. For example:
Hello
is one tokenLLMs
might be split into LL
and Ms
Artificial intelligence
is usually two tokensMost English words are 1 to 3 tokens long. The important part? Every prompt you write, and every word the AI generates, eats into the model’s token limit.
Token limits define how much input and output the model can process at once. If you go over the limit, the model will start forgetting earlier parts of the conversation or stop generating output altogether.
Think of it like a whiteboard. You can only fit so much text on it before you have to start erasing to make room for new words.
Model | Max Token Limit | Input + Output Combined? |
---|---|---|
GPT-3.5 Turbo | 4,096 | Yes |
GPT-4 | 8,192 / 128,000 (variant) | Yes |
Claude 2 | 100,000 | Yes |
Gemini 1.5 Pro | 1 million+ | Yes |
Keep in mind: both your prompt and the model’s response count toward this total.
If your total token count exceeds the model’s limit:
Here are some ways to manage your token usage:
For developers using the OpenAI or Anthropic APIs, tools like tiktoken
or the tokenizer built into Claude and Gemini SDKs can help estimate token usage before you send a prompt.
Understanding token limits is key to getting the most out of large language models. Whether you’re building a chatbot, writing code, or generating content, keeping your prompts efficient and within the model’s token boundaries ensures smoother, more reliable results.
If you're experimenting with AI tools, token awareness isn't just technical — it's practical. It helps you build smarter, faster, and more responsibly with the models at your fingertips. For more practical guides on working with AI, prompt design, and model performance, explore the rest of our articles at AI Shortlist.