WordSizeEstimator
Description
WordSizeEstimator measures size by counting words and segments text at word
boundaries. It is the default estimator for chunkers and is a practical choice
for most NLP pipelines where word units are more meaningful than raw characters.
It requires no optional dependency.
API / Documentation
- class chunkipy.size_estimators.WordSizeEstimator[source]
Bases:
BaseSizeEstimatorSize estimator that counts the number of words in the text.
- estimate_size(text)[source]
Estimate the size of the given text by counting the number of words.
Example
This example is included in examples/size_estimators/word_size_estimator.py.
1from chunkipy.size_estimators import WordSizeEstimator
2
3
4if __name__ == "__main__":
5 text = "Chunkipy estimates by words in this simple sentence."
6 estimator = WordSizeEstimator()
7
8 print(f"Estimated size: {estimator.estimate_size(text)}")
9 print("Segments:", list(estimator.segment(text)))