FastText Language Detector
Description
FastTextLanguageDetector detects language codes using a local FastText
language identification model. It is useful when you need reproducible
predictions from a specific model file and full control over model lifecycle in
production environments.
Note
Install the optional dependency first:
pip install "chunkipy[fasttext]"
API / Documentation
- class chunkipy.language_detectors.FastTextLanguageDetector(model_path, label_prefix='__label__')[source]
Bases:
BaseLanguageDetectorDetect language codes using a FastText language identification model.
The detector expects a path to a model compatible with the FastText Python bindings, such as Facebook’s
lid.176.bin.
Example
This example is included in examples/language_detectors/fasttext_detector.py.
1import os
2
3from chunkipy.language_detectors import FastTextLanguageDetector
4from chunkipy.utils import MissingDependencyError
5
6
7if __name__ == "__main__":
8 model_path = os.getenv("FASTTEXT_MODEL_PATH")
9
10 if not model_path:
11 print("Set FASTTEXT_MODEL_PATH to run this example with a local FastText model.")
12 raise SystemExit(0)
13
14 try:
15 detector = FastTextLanguageDetector(model_path=model_path)
16 print(f"Detected language: {detector.detect('This text is written in English.')}")
17 except MissingDependencyError as error:
18 print(error)
Common use cases
Load a local FastText language ID model once and reuse it across requests.
Use a model-based detector when you want reproducible predictions from a specific model file.
Notes
FastText detectors require a model file available on disk.
The returned language code is normalized by stripping the standard
__label__prefix.