chunkipy.size_estimators

class chunkipy.size_estimators.BaseSizeEstimator[source]

Bases: ABC

Base class for size estimators.

estimate_size(text)[source]

Estimate the size of the given text.

Parameters:

text (str) – The text to estimate the size of.

Returns:

The estimated size of the text in bytes.

Return type:

int

class chunkipy.size_estimators.CharSizeEstimator[source]

Bases: BaseSizeEstimator

Size estimator that counts the number of characters in the text.

estimate_size(text)[source]

Estimate the size of the given text by counting the number of characters.

Parameters:

text (str) – The text to estimate the size of.

Returns:

The estimated size of the text in characters.

Return type:

int

class chunkipy.size_estimators.OpenAISizeEstimator(encoding='cl100k_base')[source]

Bases: BaseSizeEstimator

Size estimator that uses OpenAI’s tokenization to estimate the size of the text.

Parameters:

encoding (str)

estimate_size(text)[source]

Estimate the size of the given text using OpenAI’s tokenization.

Parameters:

text (str) – The text to estimate the size of.

Returns:

The estimated size of the text in tokens.

Return type:

int

class chunkipy.size_estimators.WordSizeEstimator[source]

Bases: BaseSizeEstimator

Size estimator that counts the number of words in the text.

estimate_size(text)[source]

Estimate the size of the given text by counting the number of words.

Parameters:

text (str) – The text to estimate the size of.

Returns:

The estimated size of the text in words.

Return type:

int

Modules

base_size_estimator

char_size_estimator

openai_size_estimator

word_size_estimator