Langdetect Language Detector

Description

LangdetectLanguageDetector detects a language code from raw text using the langdetect package. It is a lightweight default detector for semantic splitters and can be replaced with custom detector objects implementing the same interface.

Note

Install the optional dependency first:

pip install "chunkipy[langdetect]"

API / Documentation

class chunkipy.language_detectors.LangdetectLanguageDetector[source]

Bases: BaseLanguageDetector

Detect language codes using the optional langdetect dependency.

detect(text)[source]

Return the ISO-like language code detected by langdetect.

Return type:

str

Parameters:

text (str)

Example:

This example is included in examples/language_detectors/langdetect_detector.py.

 1from chunkipy.language_detectors import LangdetectLanguageDetector
 2from chunkipy.utils import MissingDependencyError
 3
 4
 5if __name__ == "__main__":
 6    text = "Questo testo è scritto in italiano."
 7
 8    try:
 9        detector = LangdetectLanguageDetector()
10        print(f"Detected language: {detector.detect(text)}")
11    except MissingDependencyError as error:
12        print(error)

Common use cases

  • Detect the language of a text before routing it to a language-aware splitter.

  • Reuse the same detector instance across multiple semantic processing steps.