FixedSizeTextChunker

Description

FixedSizeTextChunker splits text into deterministic chunks based on the configured size estimator strategy. It is the most predictable chunker when you need stable chunk boundaries and straightforward overlap behavior. By default, it uses the word estimator and works with no optional dependencies.

API / Documentation

class chunkipy.text_chunker.FixedSizeTextChunker(chunk_size=None, size_estimator=None, overlap_ratio=0.0)[source]

Bases: BaseOverlapTextChunker

Chunk text into fixed-size slices using the configured size estimator.

Each segment emitted by size_estimator.segment is treated as a unit of size 1 during chunk assembly.

Parameters:

chunk_size (int)
size_estimator (BaseSizeEstimator)
overlap_ratio (float)

split_text(text)[source]

Split the provided text into smaller parts based on size estimator. Size Estimator is used to cut the text into segments and every segment has size equal to 1.

Parameters:: text (str) – The text to be split.
Yields:: Generator [TextPart, None, None] – A generator yielding TextPart objects, each containing a piece of text and its estimated size.
Return type:: Generator[TextPart, None, None]

Example

This example is included in examples/chunkers/fixed_size/prebuilt_text_splitter.py.

from chunkipy import FixedSizeTextChunker


if __name__ == "__main__":
    text_chunker = FixedSizeTextChunker(
        chunk_size=200,
        overlap_ratio=0.25
    )

    text = "This is a sample text that will be split into chunks based on word boundaries."
    chunks = text_chunker.chunk(text)

    for i, chunk in enumerate(chunks):
        print(f"Chunk {i + 1}: {chunk}")

More examples are available under examples/chunkers/fixed_size/.