chunkipy.size_estimators.base_size_estimator

Classes

BaseSizeEstimator()

Base class for strategies that measure and segment text size.

class chunkipy.size_estimators.base_size_estimator.BaseSizeEstimator[source]

Bases: ABC

Base class for strategies that measure and segment text size.

abstract estimate_size(text)[source]

Estimate the size of the given text.

Parameters:

text (str) – The text to estimate the size of.

Returns:

Estimated size in units defined by the concrete estimator.

Return type:

int

segment(text)[source]

Segment the text into smaller parts for size estimation. This method allows dividing the text into manageable segments, which can be processed individually for size estimation purposes by downstream methods.

Parameters:

text (str) – The text to be divided into smaller parts.

Yields:

str – A segment of the text for estimation.

Raises:

NotImplementedError – If a subclass does not implement this method.

Return type:

Generator[str, None, None]