chunkipy.text_chunker.base_text_chunker

Classes

BaseTextChunker([chunk_size, size_estimator])

Base class for all chunker implementations.

class chunkipy.text_chunker.base_text_chunker.BaseTextChunker(chunk_size=None, size_estimator=None)[source]

Bases: ABC

Base class for all chunker implementations.

Parameters:
  • chunk_size (int) – Maximum size allowed for a single chunk in the units defined by size_estimator.

  • size_estimator (BaseSizeEstimator) – Strategy used to measure text size. Defaults to WordSizeEstimator.

abstract chunk(text)[source]

Chunk the provided text into Chunks objects.

Return type:

Chunks

Parameters:

text (str)