Overview

Chunkipy provides optional language detectors for multilingual pipelines.

These detectors are especially useful when sentence splitters or downstream processing need to choose models based on the detected language.

All detectors follow the same base contract, so you can swap implementations without changing splitter or chunker APIs.

Available detectors

  • LangdetectLanguageDetector: lightweight detector powered by the langdetect package.

  • FastTextLanguageDetector: detector backed by a FastText language ID model loaded from disk.

Both implement the same base interface and can be passed into semantic sentence splitters or used independently in your pipeline.

Custom detectors

When built-in detectors are not enough, you can create your own detector by extending BaseLanguageDetector and then pass that object into language-aware splitters.

See Custom language detectors for a minimal custom detector template.