chunkipy.size_estimators.openai_size_estimator

Classes

OpenAISizeEstimator([encoding])

Estimate size using a tiktoken encoding compatible with OpenAI models.

class chunkipy.size_estimators.openai_size_estimator.OpenAISizeEstimator(encoding='cl100k_base')[source]

Bases: BaseSizeEstimator

Estimate size using a tiktoken encoding compatible with OpenAI models.

Parameters:

encoding (str)

estimate_size(text)[source]

Estimate the size of the given text using OpenAI’s tokenization.

Parameters:

text (str) – The text to estimate the size of.

Returns:

The estimated size of the text in tokens.

Return type:

int

segment(text)[source]

Generate token segments from the given text using OpenAI’s tokenization. :type text: str :param text: The text to segment. :type text: str

Yields:

str – A single token as segmented by the tokenizer.

Return type:

Generator[str, None, None]

Parameters:

text (str)