Overview
Chunkipy provides optional language detectors for multilingual pipelines.
These detectors are especially useful when sentence splitters or downstream processing need to choose models based on the detected language.
All detectors follow the same base contract, so you can swap implementations without changing splitter or chunker APIs.
Available detectors
LangdetectLanguageDetector: lightweight detector powered by thelangdetectpackage.FastTextLanguageDetector: detector backed by a FastText language ID model loaded from disk.
Both implement the same base interface and can be passed into semantic sentence splitters or used independently in your pipeline.
Custom detectors
When built-in detectors are not enough, you can create your own detector by
extending BaseLanguageDetector
and then pass that object into language-aware splitters.
See Custom language detectors for a minimal custom detector template.