OpenAI Size Estimator
Description
OpenAISizeEstimator measures text size with tiktoken encodings, making
chunk boundaries closer to LLM token budgets than plain words or characters.
Use it when your downstream model has token limits and you need a more realistic
size metric for prompt construction.
Note
Install the optional dependency first:
pip install "chunkipy[tiktoken]"
API / Documentation
- class chunkipy.size_estimators.OpenAISizeEstimator(encoding='cl100k_base')[source]
Bases:
BaseSizeEstimatorEstimate size using a
tiktokenencoding compatible with OpenAI models.- Parameters:
encoding (str)
- estimate_size(text)[source]
Estimate the size of the given text using OpenAI’s tokenization.
Example
This example is included in examples/size_estimators/openai_size_estimator.py.
1from chunkipy.size_estimators import OpenAISizeEstimator
2from chunkipy.utils import MissingDependencyError
3
4
5if __name__ == "__main__":
6 text = "Token-aware estimation with tiktoken."
7
8 try:
9 estimator = OpenAISizeEstimator()
10 print(f"Estimated token size: {estimator.estimate_size(text)}")
11 except MissingDependencyError as error:
12 print(error)