Logo

Getting Started

  • Installation
    • Optional dependencies overview
    • Install using pip
    • Install using Poetry
    • Install using uv
    • Install using pipx (for CLI or isolated usage)
    • Verification
    • Next steps
  • Quickstart
    • Basic chunking
    • Overlapping
    • Choose a size estimator
    • Recursive chunking with splitters
    • Semantic sentence splitters (spaCy / Stanza)
    • Examples
  • API Reference
    • chunkipy
      • BaseLanguageDetector
        • BaseLanguageDetector.detect()
      • BaseTextChunker
        • BaseTextChunker.chunk()
      • Chunk
        • Chunk.content
        • Chunk.overlap
        • Chunk.size
        • Chunk.text
        • Chunk.text_parts
      • Chunks
        • Chunks.get_all_text()
        • Chunks.get_all_text_parts()
      • FastTextLanguageDetector
        • FastTextLanguageDetector.detect()
      • FixedSizeTextChunker
        • FixedSizeTextChunker.split_text()
      • LangdetectLanguageDetector
        • LangdetectLanguageDetector.detect()
      • Overlap
      • RecursiveTextChunker
        • RecursiveTextChunker.split_text()
      • TextPart
        • TextPart.size
        • TextPart.text
      • chunkipy.language_detectors
        • BaseLanguageDetector
        • FastTextLanguageDetector
        • LangdetectLanguageDetector
        • chunkipy.language_detectors.base_language_detector
        • chunkipy.language_detectors.fasttext_language_detector
        • chunkipy.language_detectors.langdetect_language_detector
      • chunkipy.size_estimators
        • BaseSizeEstimator
        • CharSizeEstimator
        • OpenAISizeEstimator
        • WordSizeEstimator
        • chunkipy.size_estimators.base_size_estimator
        • chunkipy.size_estimators.char_size_estimator
        • chunkipy.size_estimators.openai_size_estimator
        • chunkipy.size_estimators.word_size_estimator
      • chunkipy.text_chunker
        • BaseOverlapTextChunker
        • BaseTextChunker
        • Chunk
        • Chunks
        • FixedSizeTextChunker
        • Overlap
        • RecursiveTextChunker
        • TextPart
        • chunkipy.text_chunker.base_overlap_text_chunker
        • chunkipy.text_chunker.base_text_chunker
        • chunkipy.text_chunker.data_models
        • chunkipy.text_chunker.fixed_size
        • chunkipy.text_chunker.recursive
      • chunkipy.text_splitters
        • BaseTextSplitter
        • ColonTextSplitter
        • CommaTextSplitter
        • FullStopTextSplitter
        • NewlineTextSplitter
        • SemicolonTextSplitter
        • SeparatorTextSplitter
        • WordTextSplitter
        • chunkipy.text_splitters.base_text_splitter
        • chunkipy.text_splitters.basic_text_splitters
        • chunkipy.text_splitters.semantic
      • chunkipy.utils
        • MissingDependencyError
        • format_instructions()
        • import_dependencies()
  • Contributing
    • Development setup
    • Linting & formatting
    • Testing
    • Documentation
    • Submitting your PR

Text Chunkers

  • Overview
    • Chunkers
      • FixedSizeTextChunker
      • RecursiveTextChunker
      • DocumentBasedTextChunker (Roadmap)
      • SemanticTextChunker (Roadmap)
      • LLMBasedChunker (Roadmap)
  • Custom text chunkers
    • Base class
    • Minimal example
    • Guidelines
    • See also
  • FixedSizeTextChunker
    • Description
    • API / Documentation
    • Example
  • RecursiveTextChunker
    • Description
    • API / Documentation
    • Example
  • Document-based chunking (Roadmap)
    • Goal
    • Planned behavior
    • Current status
  • Semantic chunking (Roadmap)
    • Goal
    • Planned behavior
    • Current status
  • LLM-based chunking (Roadmap)
    • Goal
    • Planned behavior
    • Current status

Text Splitters

  • Overview
    • Built-in basic splitters
    • Semantic sentence splitters
    • When to use what
  • Custom text splitters
    • Base class
    • Minimal example
    • Guidelines
    • See also
  • Spacy Sentence Text Splitter
    • Description
    • API / Documentation
  • StanzaSentenceTextSplitter
    • Description
    • API / Documentation
    • Example

Language Detection

  • Overview
    • Available detectors
    • Custom detectors
  • Custom language detectors
    • Base class
    • Minimal example
    • Guidelines
    • See also
  • Langdetect Language Detector
    • Description
    • API / Documentation
    • Common use cases
  • FastText Language Detector
    • Description
    • API / Documentation
    • Example
    • Common use cases
    • Notes

Size Estimators

  • Overview
    • Built-in estimators
    • Quick comparison
    • Choosing an estimator
  • Custom size estimators
    • Base class
    • Minimal example
    • Guidelines
    • See also
  • CharSizeEstimator
    • Description
    • API / Documentation
    • Example
  • WordSizeEstimator
    • Description
    • API / Documentation
    • Example
  • OpenAI Size Estimator
    • Description
    • API / Documentation
    • Example
Chunkipy
  • Search


© Copyright 2023, Gioele Crispo.

Built with Sphinx using a theme provided by Read the Docs.
Other Versions v: main
Releases
v1.0.0.post1
v1.0.1
v1.1.0
Branches
main