Module kuzukiri.kuzukiri
Classes
class Segmenter (terminals, parentheses, force, max_buf_length)
-
Text Segmentation Class
Args
terminals
:Optional[set[str]]
- a set of terminal characters (Default: {'。', '.', '!', '?', '\n'})
parentheses
:Optional[map[str, str]]
- pairs of parentheses (Default: {'「': '」', '『': '』', '(': ')', '[': ']', '【': '】'})
force
:Optional[set[str]]
- a set of terminal characters, those ignore parentheses (Default: {})
max_buf_length
:Optional[int]
- max buffer size (Default: 1000)
Methods
def split(self, text)
-
Execute text segmentation
Args
text (str) : target text
Returns
List[str]
- list of segmented texts
def split_with_norm(self, text)
-
Execute text segmentation with normalization
After splitting, NFKC normalization and trimming are performed.
Args
text (str) : target text
Returns
List[str]
- list of segmented texts