CharSizeEstimator
Description
CharSizeEstimator measures text size using character count. It is the
simplest estimator when you want deterministic sizing independent of language
tokenization rules. It also provides character-level segmentation that works
without any external dependency.
API / Documentation
- class chunkipy.size_estimators.CharSizeEstimator[source]
Bases:
BaseSizeEstimatorSize estimator that counts the number of characters in the text.
- estimate_size(text)[source]
Estimate the size of the given text by counting the number of characters.
Example
This example is included in examples/size_estimators/char_size_estimator.py.
1from chunkipy.size_estimators import CharSizeEstimator
2
3
4if __name__ == "__main__":
5 text = "Chunkipy estimates by characters."
6 estimator = CharSizeEstimator()
7
8 print(f"Estimated size: {estimator.estimate_size(text)}")
9 print("First 10 segments:", list(estimator.segment(text))[:10])