Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[java-nlp-user] How to train chinese word with Stanford segmenter

Rueshyna rueshyna at gmail.com
Mon May 2 03:25:39 PDT 2011


   hi, everyone

I want to train a new chineses dictionary that helps me to segment chinese
sentences with white space.

In other words, all the sentence in my corpus are well-segmented, for
instance, the following sentences:

例如 , 用具 有 广谱 抗微生物活性 的 聚 腈基 丙烯酸酯 膜覆盖 皮肤表面 的 不可 缝合 性 小 伤口 将会 减弱 伤口感染 的 可能 。

第一 , 抗微生物剂 在 腈基 丙烯酸酯组合物 内 必须 是 可溶 或 可分散的 , 其 浓度 需要 达到 能 产生 抗微生物性质 。

I had downloaded train the segmenter of an example and tried to understand
it.

Why does it use POS as a feature?

I don't want to train a dictionary with POS feature.

I saw feature about current caharacter, previous character, next charcter
and the conjuntion of pervious and current on paper[1].

How do I use it?

What is the "dict-chris6.ser.gz"??????

How do I use it with training dictionary?

Thanks!

[1]    P.-C. Chang, et al., "Optimizing Chinese word segmentation for
machine translation performance," presented at the Proceedings of the Third
Workshop on Statistical Machine Translation, Columbus, Ohio, 2008.

by Rueshyna
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/java-nlp-user/attachments/20110502/35cae3da/attachment.html>


More information about the java-nlp-user mailing list