chunkipy.text_chunker
Public chunker classes and data models exposed by chunkipy.text_chunker.
- class chunkipy.text_chunker.BaseOverlapTextChunker(chunk_size=None, size_estimator=None, overlap_ratio=0.0)[source]
Bases:
BaseTextChunker,ABCBase class for chunkers that assemble chunks with overlap from text parts.
- Parameters:
chunk_size (int)
size_estimator (BaseSizeEstimator)
overlap_ratio (float)
- class chunkipy.text_chunker.BaseTextChunker(chunk_size=None, size_estimator=None)[source]
Bases:
ABCBase class for all chunker implementations.
- Parameters:
chunk_size (
int) – Maximum size allowed for a single chunk in the units defined bysize_estimator.size_estimator (
BaseSizeEstimator) – Strategy used to measure text size. Defaults toWordSizeEstimator.
- class chunkipy.text_chunker.Chunk(overlap=<factory>, content=<factory>)[source]
Bases:
objectSingle chunk returned by a text chunker.
A chunk is composed of two ordered collections:
overlap: text parts repeated from the previous chunk to preserve contextcontent: text parts that are unique to the current chunk
The
textandsizeproperties are computed over the combinedtext_partsview.- property size: int
Calculates and returns the total size of all TextPart objects within text_parts.
- Returns:
The total size of all TextPart objects.
- Return type:
- class chunkipy.text_chunker.Chunks(iterable=(), /)[source]
-
List-like collection of
Chunkobjects returned by chunkers.
- class chunkipy.text_chunker.FixedSizeTextChunker(chunk_size=None, size_estimator=None, overlap_ratio=0.0)[source]
Bases:
BaseOverlapTextChunkerChunk text into fixed-size slices using the configured size estimator.
Each segment emitted by
size_estimator.segmentis treated as a unit of size1during chunk assembly.- Parameters:
chunk_size (int)
size_estimator (BaseSizeEstimator)
overlap_ratio (float)
- class chunkipy.text_chunker.Overlap[source]
Bases:
TextPartsMixin,deque[TextPart]Deque-like collection used to carry overlap between consecutive chunks.
- class chunkipy.text_chunker.RecursiveTextChunker(chunk_size=None, size_estimator=None, overlap_ratio=0.0, text_splitters=None)[source]
Bases:
BaseOverlapTextChunkerChunk text by recursively applying increasingly fine-grained splitters.
The chunker tries each splitter in order until a text part fits within the configured
chunk_size. Custom splitters are attempted before the default fallback splitters.- Parameters:
chunk_size (int)
size_estimator (BaseSizeEstimator)
overlap_ratio (float)
text_splitters (List[BaseTextSplitter])
- class chunkipy.text_chunker.TextPart(size, text)[source]
Bases:
objectRepresents a fragment or segment of a complete text, along with its character size.
- Parameters:
Modules