chunkipy.text_chunker

class chunkipy.text_chunker.Chunk(text_parts=<factory>)[source]

Bases: object

Represents a single chunk of text, which consists of multiple text parts.

Computed Properties: :param text: Represents the full text of the chunk by joining all ‘text’ values from its ‘text parts. :param text:parts: A list of TextPart objects that make up the chunk.

Parameters:

text_parts (List[TextPart])

property size: int

Calculates and returns the total size of all TextPart objects within text_parts.

Returns: int: The total size of all TextPart objects.

property text: str

Returns the full concatenated text of the chunk by joining all ‘text’ values from the TextPart objects.

Returns:

The full text of the chunk, concatenated from all text parts.

Return type:

str

text_parts: List[TextPart]
class chunkipy.text_chunker.Chunks(iterable=(), /)[source]

Bases: list

A list-like collection of chunks with utility methods for aggregation.

Inherits from ‘list’ to act as a standard list, while providing additional methods for aggregated operations.

get_all_text()[source]

Returns the full text from all chunks as a list.

Returns:

A list of strings, where each string is the full text of a chunk.

Return type:

List[str]

get_all_text_parts()[source]

Returns all text parts from each chunk as a list of lists.

Returns:

A list of lists, where each inner list contains the text parts of a chunk.

Return type:

List[List[str]]

class chunkipy.text_chunker.Overlapping[source]

Bases: deque

A deque-like collection of TextParts with utility methods for aggregation. Inherits from deque to act as a standard deque, while providing additional methods for aggregated operations (e.g. size).

property size: str

Calculates and returns the total size of all TextPart objects.

Returns:

The total size of all TextPart objects.

Return type:

int

class chunkipy.text_chunker.TextChunker(chunk_size=1000, size_estimator=None, tokens=False, overlap_ratio=0.0, text_splitters=[])[source]

Bases: object

Parameters:
chunk(text)[source]

Chunk the provided text into smaller parts based on the configured chunk size and overlap.

Parameters:

text (str) – The text to be chunked

Returns:

A list containing the chunks and for each chunks the list of text parts the made it up.

Return type:

Chunks

split_text(text)[source]

Split the provided text into smaller parts based on the configured text splitters and chunk size.

Parameters:

text (str) – The text to be split.

Yields:

Generator [TextPart, None, None] – A generator yielding TextPart objects, each containing a piece of text and its estimated size.

Return type:

Generator[TextPart, None, None]

class chunkipy.text_chunker.TextPart(text, size)[source]

Bases: object

Represents a fragment or segment of a complete text, along with its character size.

Parameters:
  • text (str) – The text of the segment.

  • size (int) – The size of the text in characters.

size: int
text: str

Modules

data_models

text_chunker