FixedSizeTextChunker
Description
FixedSizeTextChunker splits text into deterministic chunks based on the
configured size estimator strategy. It is the most predictable chunker when you
need stable chunk boundaries and straightforward overlap behavior. By default,
it uses the word estimator and works with no optional dependencies.
API / Documentation
- class chunkipy.text_chunker.FixedSizeTextChunker(chunk_size=None, size_estimator=None, overlap_ratio=0.0)[source]
Bases:
BaseOverlapTextChunkerChunk text into fixed-size slices using the configured size estimator.
Each segment emitted by
size_estimator.segmentis treated as a unit of size1during chunk assembly.- Parameters:
chunk_size (int)
size_estimator (BaseSizeEstimator)
overlap_ratio (float)
- split_text(text)[source]
Split the provided text into smaller parts based on size estimator. Size Estimator is used to cut the text into segments and every segment has size equal to 1.
Example
This example is included in examples/chunkers/fixed_size/prebuilt_text_splitter.py.
1from chunkipy import FixedSizeTextChunker
2
3
4if __name__ == "__main__":
5 text_chunker = FixedSizeTextChunker(
6 chunk_size=200,
7 overlap_ratio=0.25
8 )
9
10 text = "This is a sample text that will be split into chunks based on word boundaries."
11 chunks = text_chunker.chunk(text)
12
13 for i, chunk in enumerate(chunks):
14 print(f"Chunk {i + 1}: {chunk}")
More examples are available under examples/chunkers/fixed_size/.