RecursiveTextChunker
Description
RecursiveTextChunker progressively applies text splitters from coarse to
fine granularity until each piece satisfies chunk_size. It is useful when
you want chunk boundaries to follow natural separators (phrases, clauses, words)
instead of fixed windows while still preserving overlap support.
API / Documentation
- class chunkipy.text_chunker.RecursiveTextChunker(chunk_size=None, size_estimator=None, overlap_ratio=0.0, text_splitters=None)[source]
Bases:
BaseOverlapTextChunkerChunk text by recursively applying increasingly fine-grained splitters.
The chunker tries each splitter in order until a text part fits within the configured
chunk_size. Custom splitters are attempted before the default fallback splitters.- Parameters:
chunk_size (int)
size_estimator (BaseSizeEstimator)
overlap_ratio (float)
text_splitters (List[BaseTextSplitter])
- split_text(text)[source]
Split the provided text into smaller parts based on the configured text splitters and chunk size. This method uses a recursive approach to apply different text splitters until the text fits properly within the chunk size (based on the size estimator).
Example
This example is included in examples/chunkers/recursive/custom_text_splitter.py.
1from chunkipy import RecursiveTextChunker
2from chunkipy.text_splitters.base_text_splitter import BaseTextSplitter
3
4
5if __name__ == "__main__":
6 text = "This is a small text -> with custom split strategy."
7
8 class ArrowTextSplitter(BaseTextSplitter):
9 def _split(self, text):
10 return [t for t in text.split("->") if t != '' and t != ' ']
11
12 # Create a TextChunker object with custom text splitter (using WordSizeEstimator by default)
13 arrow_text_splitter = ArrowTextSplitter()
14 text_chunker = RecursiveTextChunker(chunk_size=8, text_splitters=[arrow_text_splitter])
15 chunks = text_chunker.chunk(text)
16
17 # Print the resulting chunks
18 for i, chunk in enumerate(chunks):
19 print(f"Chunk {i + 1}: {chunk}")
More examples are available under examples/chunkers/recursive/.