Go 語言高效分詞, 支持英文、中文、日文等
詞典用雙數組 trie(Double-Array Trie)實現, 分詞器算法為基於詞頻的最短路徑加動態規劃。v0.30.0 版本主要新增了 DAG 和 HMM (Viterbi) 算法分詞, 新增 API 基本和結巴分詞保持一致.
支持普通、搜尋引擎、全模式、精確模式和 HMM模式多種分詞模式,支持用戶詞典、詞性標註,可運行JSON RPC服務。
更新詳情和項目地址: gse
package mainimport ( "fmt" "github.com/go-ego/gse")func main() { var seg gse.Segmenter seg.LoadDict() text1 := "你好世界, Hello world" fmt.Println(seg.Cut(text1, true))}Rhine RiverAdd[NEW] Add HMM cut support
[NEW] Add go mod support and remove dep files
[NEW] Add find word in dictionary func
[NEW] Add Cut(), CutAll(), CutSearch(), LoadModel(), HMMCut() func
[NEW] Add hmm cut test code
[NEW] Add hmm cut example code
Update[NEW] Cutting the dict method, move load dictionary to dict_util.go
[NEW] Update example and Add more test
[NEW] Update and clean utils code
[NEW] Simplify test code, add equal benchmark code
[NEW] Update pkg cedar code
[NEW] Update code style
[NEW] Update README.md [ Format README.mdand Update example ]
FixSee Commits for more details, after Oct 9.