chunkipy.text_chunker.data_models

Classes

Chunk([overlap, content])

Single chunk returned by a text chunker.

Chunks([iterable])

List-like collection of Chunk objects returned by chunkers.

Overlap

Deque-like collection used to carry overlap between consecutive chunks.

TextPart(size, text)

Represents a fragment or segment of a complete text, along with its character size.

TextParts([iterable])

List-like collection of TextPart values.

TextPartsMixin()

A base class with utilities for handling collections of TextPart.

class chunkipy.text_chunker.data_models.Chunk(overlap=<factory>, content=<factory>)[source]

Bases: object

Single chunk returned by a text chunker.

A chunk is composed of two ordered collections:

  • overlap: text parts repeated from the previous chunk to preserve context

  • content: text parts that are unique to the current chunk

The text and size properties are computed over the combined text_parts view.

Parameters:
content: TextParts
overlap: Overlap
property size: int

Calculates and returns the total size of all TextPart objects within text_parts.

Returns:

The total size of all TextPart objects.

Return type:

int

property text: str

Returns the full concatenated text of the chunk by joining all ‘text’ values from the TextPart objects.

Returns:

The full text of the chunk, concatenated from all text parts.

Return type:

str

property text_parts: TextParts

Return a combined ordered view of overlap and content text parts.

class chunkipy.text_chunker.data_models.Chunks(iterable=(), /)[source]

Bases: list[Chunk]

List-like collection of Chunk objects returned by chunkers.

get_all_text()[source]

Return the serialized text for every chunk.

Returns:

A list of strings, where each string is the full text of a chunk.

Return type:

List[str]

get_all_text_parts()[source]

Return the text parts for every chunk.

Returns:

A list of per-chunk TextParts collections.

Return type:

List[TextParts]

class chunkipy.text_chunker.data_models.Overlap[source]

Bases: TextPartsMixin, deque[TextPart]

Deque-like collection used to carry overlap between consecutive chunks.

class chunkipy.text_chunker.data_models.TextPart(size, text)[source]

Bases: object

Represents a fragment or segment of a complete text, along with its character size.

Parameters:
  • size (int) – The size of the text based on the SizeEstimator used.

  • text (str) – The text of the segment.

size: int
text: str
class chunkipy.text_chunker.data_models.TextParts(iterable=(), /)[source]

Bases: TextPartsMixin, list[TextPart]

List-like collection of TextPart values.

This container preserves the normal list API while exposing aggregated size and text properties via TextPartsMixin.

class chunkipy.text_chunker.data_models.TextPartsMixin[source]

Bases: object

A base class with utilities for handling collections of TextPart.

property size: int

Calculates the total size of all TextPart objects in the collection.

Returns:

The total size of all TextPart objects.

Return type:

int

property text: str

Concatenates and returns the full text of all TextParts in the collection.

Returns:

A single string containing the concatenated text of all TextParts.

Return type:

str