chunkipy.size_estimators.openai_size_estimator

Classes

OpenAISizeEstimator([encoding])

Size estimator that uses OpenAI's tokenization to estimate the size of the text.

class chunkipy.size_estimators.openai_size_estimator.OpenAISizeEstimator(encoding='cl100k_base')[source]

Bases: BaseSizeEstimator

Size estimator that uses OpenAI’s tokenization to estimate the size of the text.

Parameters:

encoding (str)

estimate_size(text)[source]

Estimate the size of the given text using OpenAI’s tokenization.

Parameters:

text (str) – The text to estimate the size of.

Returns:

The estimated size of the text in tokens.

Return type:

int