chunkipy.text_splitters.semantic.sentences.stanza_sentence_text_splitter
Classes
|
Sentence splitter using Stanza for semantic text splitting. |
- class chunkipy.text_splitters.semantic.sentences.stanza_sentence_text_splitter.StanzaSentenceTextSplitter(text_limit=None, language_detector=None)[source]
Bases:
BaseSemanticTextSplitterSentence splitter using Stanza for semantic text splitting. This class uses Stanza to split text into sentences based on the language detected in the text. It supports multiple languages by loading different Stanza models based on the detected language.
- Parameters:
text_limit (int)
language_detector (BaseLanguageDetector | None)
- text_limit
The maximum length of text to process at once. If None, DEFAULT_LIMIT from base class is applied.
- Type:
- langdetect_stanza_mapping = {'af': 'af', 'ar': 'ar', 'bg': 'bg', 'bn': None, 'ca': 'ca', 'cs': 'cs', 'cy': None, 'da': 'da', 'de': 'de', 'el': 'el', 'en': 'en', 'es': 'es', 'et': 'et', 'fa': 'fa', 'fi': 'fi', 'fr': 'fr', 'gu': None, 'he': 'he', 'hi': 'hi', 'hr': 'hr', 'hu': 'hu', 'id': 'id', 'it': 'it', 'ja': 'ja', 'kn': None, 'ko': 'ko', 'lt': 'lt', 'lv': 'lv', 'mk': None, 'ml': None, 'mr': 'mr', 'ne': None, 'nl': 'nl', 'no': 'no', 'pa': None, 'pl': 'pl', 'pt': 'pt', 'ro': 'ro', 'ru': 'ru', 'sk': 'sk', 'sl': 'sl', 'so': None, 'sq': None, 'sv': 'sv', 'sw': None, 'ta': 'ta', 'te': 'te', 'th': None, 'tl': None, 'tr': 'tr', 'uk': 'uk', 'ur': 'ur', 'vi': 'vi', 'zh-cn': 'zh-hans', 'zh-tw': 'zh-hant'}