RecursiveTextChunker

Description

RecursiveTextChunker progressively applies text splitters from coarse to fine granularity until each piece satisfies chunk_size. It is useful when you want chunk boundaries to follow natural separators (phrases, clauses, words) instead of fixed windows while still preserving overlap support.

API / Documentation

class chunkipy.text_chunker.RecursiveTextChunker(chunk_size=None, size_estimator=None, overlap_ratio=0.0, text_splitters=None)[source]

Bases: BaseOverlapTextChunker

Chunk text by recursively applying increasingly fine-grained splitters.

The chunker tries each splitter in order until a text part fits within the configured chunk_size. Custom splitters are attempted before the default fallback splitters.

Parameters:
split_text(text)[source]

Split the provided text into smaller parts based on the configured text splitters and chunk size. This method uses a recursive approach to apply different text splitters until the text fits properly within the chunk size (based on the size estimator).

Parameters:

text (str) – The text to be split.

Yields:

Generator [TextPart, None, None] – A generator yielding TextPart objects, each containing a piece of text and its estimated size.

Return type:

Generator[TextPart, None, None]

Example

This example is included in examples/chunkers/recursive/custom_text_splitter.py.

 1from chunkipy import RecursiveTextChunker
 2from chunkipy.text_splitters.base_text_splitter import BaseTextSplitter
 3
 4
 5if __name__ == "__main__":
 6    text = "This is a small text -> with custom split strategy."
 7
 8    class ArrowTextSplitter(BaseTextSplitter):
 9        def _split(self, text):
10            return [t for t in text.split("->") if t != '' and t != ' ']
11
12    # Create a TextChunker object with custom text splitter (using WordSizeEstimator by default)
13    arrow_text_splitter = ArrowTextSplitter()
14    text_chunker = RecursiveTextChunker(chunk_size=8, text_splitters=[arrow_text_splitter])
15    chunks = text_chunker.chunk(text)
16
17    # Print the resulting chunks
18    for i, chunk in enumerate(chunks):
19        print(f"Chunk {i + 1}: {chunk}")

More examples are available under examples/chunkers/recursive/.