chunkipy.text_chunker.data_models

Classes

Chunk([overlap, content])

Represents a single chunk of text, which consists of multiple text parts.

Chunks([iterable])

A list-like collection of chunks with utility methods for aggregation.

Overlap

A deque-like collection of TextParts with utility methods for aggregation.

TextPart(size, text)

Represents a fragment or segment of a complete text, along with its character size.

TextParts([iterable])

A list-like collection of TextParts.

TextPartsMixin()

A base class with utilities for handling collections of TextPart.

class chunkipy.text_chunker.data_models.Chunk(overlap=<factory>, content=<factory>)[source]

Bases: object

Represents a single chunk of text, which consists of multiple text parts.

Computed Properties: :param text: Represents the full text of the chunk by joining all ‘text’ values from its ‘text parts. :type overlap: Overlap :param overlap: A list of TextPart objects that make up the chunk. :type content: TextParts :param content: A list of TextPart objects that make up the chunk.

Parameters:
content: TextParts
overlap: Overlap
property size: int

Calculates and returns the total size of all TextPart objects within text_parts.

Returns:

The total size of all TextPart objects.

Return type:

int

property text: str

Returns the full concatenated text of the chunk by joining all ‘text’ values from the TextPart objects.

Returns:

The full text of the chunk, concatenated from all text parts.

Return type:

str

property text_parts: TextParts

Returns the full concatenated text of the chunk by joining all ‘text’ values from the TextPart objects.

Returns:

The full text of the chunk, concatenated from all text parts.

Return type:

str

class chunkipy.text_chunker.data_models.Chunks(iterable=(), /)[source]

Bases: List[Chunk]

A list-like collection of chunks with utility methods for aggregation.

Inherits from ‘list’ to act as a standard list, while providing additional methods for aggregated operations.

get_all_text()[source]

Returns the full text from all chunks as a list.

Returns:

A list of strings, where each string is the full text of a chunk.

Return type:

List[str]

get_all_text_parts()[source]

Returns all text parts from each chunk as a list of lists.

Returns:

A list of lists, where each inner list contains the text parts of a chunk.

Return type:

List[List[str]]

class chunkipy.text_chunker.data_models.Overlap[source]

Bases: Deque[TextPart]

A deque-like collection of TextParts with utility methods for aggregation. Inherits from deque to act as a standard deque, while providing additional methods for aggregated operations (e.g. size).

property size: str

Calculates and returns the total size of all TextPart objects.

Returns:

The total size of all TextPart objects.

Return type:

int

class chunkipy.text_chunker.data_models.TextPart(size, text)[source]

Bases: object

Represents a fragment or segment of a complete text, along with its character size.

Parameters:
  • size (int) – The size of the text based on the SizeEstimator used.

  • text (str) – The text of the segment.

size: int
text: str
class chunkipy.text_chunker.data_models.TextParts(iterable=(), /)[source]

Bases: TextPartsMixin, List[TextPart]

A list-like collection of TextParts. Inherits from list to act as a standard list, and from TextPartsMixin to provide additional methods for aggregated operations (e.g. size, text).

class chunkipy.text_chunker.data_models.TextPartsMixin[source]

Bases: object

A base class with utilities for handling collections of TextPart.

property size: int

Calculates the total size of all TextPart objects in the collection.

Returns:

The total size of all TextPart objects.

Return type:

int

property text: str

Concatenates and returns the full text of all TextParts in the collection.

Returns:

A single string containing the concatenated text of all TextParts.

Return type:

str