Overview
Text chunkers define how a long text is divided into chunks before storage,
embedding, retrieval, or prompt construction. In chunkipy, each chunker
shares the same base interface but follows a different splitting philosophy,
from deterministic windows to structure-aware recursive strategies.
Use this page as a quick visual comparison of current and roadmap chunkers. Even when features are shown in multiple pages, keeping the animated previews here helps you choose faster.
When needed, you can implement your own chunker by extending
BaseTextChunker and plugging in
your custom splitting and sizing logic.
See Custom text chunkers for a custom chunker template.
Chunkers
FixedSizeTextChunker
FixedSizeTextChunker builds
chunks with a fixed target size using the configured size estimator. It is ideal
when you want stable chunk lengths and predictable overlap behavior.
RecursiveTextChunker
RecursiveTextChunker
applies splitters recursively (from coarser to finer separators) to keep chunks
within the desired size while preserving more natural text boundaries.
DocumentBasedTextChunker (Roadmap)
A planned chunker focused on document structure (sections, paragraphs, headings), useful for markdown, HTML, and rich formatted sources.
SemanticTextChunker (Roadmap)
A planned chunker based on semantic similarity between adjacent text units, designed to preserve contextual coherence for embeddings and RAG pipelines.
LLMBasedChunker (Roadmap)
A planned LLM-driven chunker that uses model reasoning to identify meaningful split points based on topic, discourse, and context.