FastText Language Detector

Description

FastTextLanguageDetector detects language codes using a local FastText language identification model. It is useful when you need reproducible predictions from a specific model file and full control over model lifecycle in production environments.

Note

Install the optional dependency first:

pip install "chunkipy[fasttext]"

API / Documentation

class chunkipy.language_detectors.FastTextLanguageDetector(model_path, label_prefix='__label__')[source]

Bases: BaseLanguageDetector

Detect language codes using a FastText language identification model.

The detector expects a path to a model compatible with the FastText Python bindings, such as Facebook’s lid.176.bin.

Parameters:

model_path (str)
label_prefix (str)

detect(text)[source]

Return the top predicted FastText language code for text.

Return type:: str
Parameters:: text (str)

Example

This example is included in examples/language_detectors/fasttext_detector.py.

import os

from chunkipy.language_detectors import FastTextLanguageDetector
from chunkipy.utils import MissingDependencyError


if __name__ == "__main__":
    model_path = os.getenv("FASTTEXT_MODEL_PATH")

    if not model_path:
        print("Set FASTTEXT_MODEL_PATH to run this example with a local FastText model.")
        raise SystemExit(0)

    try:
        detector = FastTextLanguageDetector(model_path=model_path)
        print(f"Detected language: {detector.detect('This text is written in English.')}")
    except MissingDependencyError as error:
        print(error)

Common use cases

Load a local FastText language ID model once and reuse it across requests.
Use a model-based detector when you want reproducible predictions from a specific model file.

Notes

FastText detectors require a model file available on disk.
The returned language code is normalized by stripping the standard __label__ prefix.