aboutsummaryrefslogtreecommitdiff
path: root/chinese/p5-Lingua-ZH-WordSegmenter/pkg-descr
blob: a0657187d16265bce2a999eeae5460d2b0cb224a (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
This is a perl version of simplified Chinese word segmentation.

The algorithm for this segmenter is to search the longest word at each point
from both left and right directions, and choose the one with higher frequency
product.

The original program is from the CPAN module Lingua::ZH::WordSegment
(https://metacpan.org/author/CHENYR) I did the follwing changes: 1) make the
interface object oriented; 2) make the internal string into utf8; 3) using
sogou's dictionary (http://www.sogou.com/labs/dl/w.html) as the default
dictionary.

WWW: https://metacpan.org/release/Lingua-ZH-WordSegmenter