Langdetect Language Detector
Description
LangdetectLanguageDetector detects a language code from raw text using the
langdetect package. It is a lightweight default detector for semantic
splitters and can be replaced with custom detector objects implementing the same
interface.
Note
Install the optional dependency first:
pip install "chunkipy[langdetect]"
API / Documentation
- class chunkipy.language_detectors.LangdetectLanguageDetector[source]
Bases:
BaseLanguageDetectorDetect language codes using the optional
langdetectdependency.
Example:
This example is included in examples/language_detectors/langdetect_detector.py.
1from chunkipy.language_detectors import LangdetectLanguageDetector
2from chunkipy.utils import MissingDependencyError
3
4
5if __name__ == "__main__":
6 text = "Questo testo è scritto in italiano."
7
8 try:
9 detector = LangdetectLanguageDetector()
10 print(f"Detected language: {detector.detect(text)}")
11 except MissingDependencyError as error:
12 print(error)
Common use cases
Detect the language of a text before routing it to a language-aware splitter.
Reuse the same detector instance across multiple semantic processing steps.