FastText Language Detector

Description

FastTextLanguageDetector detects language codes using a local FastText language identification model. It is useful when you need reproducible predictions from a specific model file and full control over model lifecycle in production environments.

Note

Install the optional dependency first:

pip install "chunkipy[fasttext]"

API / Documentation

class chunkipy.language_detectors.FastTextLanguageDetector(model_path, label_prefix='__label__')[source]

Bases: BaseLanguageDetector

Detect language codes using a FastText language identification model.

The detector expects a path to a model compatible with the FastText Python bindings, such as Facebook’s lid.176.bin.

Parameters:
  • model_path (str)

  • label_prefix (str)

detect(text)[source]

Return the top predicted FastText language code for text.

Return type:

str

Parameters:

text (str)

Example

This example is included in examples/language_detectors/fasttext_detector.py.

 1import os
 2
 3from chunkipy.language_detectors import FastTextLanguageDetector
 4from chunkipy.utils import MissingDependencyError
 5
 6
 7if __name__ == "__main__":
 8    model_path = os.getenv("FASTTEXT_MODEL_PATH")
 9
10    if not model_path:
11        print("Set FASTTEXT_MODEL_PATH to run this example with a local FastText model.")
12        raise SystemExit(0)
13
14    try:
15        detector = FastTextLanguageDetector(model_path=model_path)
16        print(f"Detected language: {detector.detect('This text is written in English.')}")
17    except MissingDependencyError as error:
18        print(error)

Common use cases

  • Load a local FastText language ID model once and reuse it across requests.

  • Use a model-based detector when you want reproducible predictions from a specific model file.

Notes

  • FastText detectors require a model file available on disk.

  • The returned language code is normalized by stripping the standard __label__ prefix.