chunkipy.language_detectors

Public language detector classes exposed by chunkipy.language_detectors.

class chunkipy.language_detectors.BaseLanguageDetector[source]

Bases: ABC

Base class for strategies that detect the language of a text.

abstract detect(text)[source]

Detect the language code for the given text.

Return type:

str

Parameters:

text (str)

class chunkipy.language_detectors.FastTextLanguageDetector(model_path, label_prefix='__label__')[source]

Bases: BaseLanguageDetector

Detect language codes using a FastText language identification model.

The detector expects a path to a model compatible with the FastText Python bindings, such as Facebook’s lid.176.bin.

Parameters:
  • model_path (str)

  • label_prefix (str)

detect(text)[source]

Return the top predicted FastText language code for text.

Return type:

str

Parameters:

text (str)

class chunkipy.language_detectors.LangdetectLanguageDetector[source]

Bases: BaseLanguageDetector

Detect language codes using the optional langdetect dependency.

detect(text)[source]

Return the ISO-like language code detected by langdetect.

Return type:

str

Parameters:

text (str)

Modules

base_language_detector

fasttext_language_detector

langdetect_language_detector