Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[java-nlp-user] 答复: NER - CRFClassifier Classes

Gerber Daniel dgerber at informatik.uni-leipzig.de
Fri Apr 8 03:07:56 PDT 2011


On 08.04.2011, at 11:47, John Bauer wrote:

> On Fri, Apr 8, 2011 at 2:20 AM, Gerber Daniel
> <dgerber at informatik.uni-leipzig.de> wrote:
>> 
>> The command:
>> $ java -cp stanford-corenlp-2010-11-12.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier ~/ner-date-model.ser.gz -testFile ~/Date-Training.tsv
>> 
>> leads to the same result :(
>> 
>> testFile=/Users/gerb/Date-Training.tsv
>> loadClassifier=/Users/gerb/ner-date-model.ser.gz
>> Loading classifier from /Users/gerb/ner-date-model.ser.gz ... Error deserializing /Users/gerb/ner-date-model.ser.gz
>> Exception in thread "main" java.lang.RuntimeException: java.io.InvalidClassException: edu.stanford.nlp.util.Index; local class incompatible: stream classdesc serialVersionUID = 5398562825928375260, local class serialVersionUID = 533612384823468898
>>        at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1223)
>>        at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1174)
>>        at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:2293)
>> Caused by: java.io.InvalidClassException: edu.stanford.nlp.util.Index; local class incompatible: stream classdesc serialVersionUID = 5398562825928375260, local class serialVersionUID = 533612384823468898
>> 
> 
> Maybe I misunderstood, but it sounded like you were training your own
> model.  My suggestion is to use the corenlp library when *training* as
> well as running, ie train via
> 
> java -cp stanford-corenlp-2010-11-12.jar
> 
> It should work, since all of the dependencies of CRFClassifier are in
> corenlp, but I haven't actually tried it myself.

Oh.. sorry may bad. I used the wrong command for training my model without double checking that. Training the model with CoreNLP works fine and I can deserialize it again in my code. *yeah*

> 
>>> 
>>> 
>>>> One more thing: Is it somehow possible to add the trained DATE class to let's say the "conll.closed.iob2.crf"-classifier? I would really like to avoid recreating it or using different classifiers in my code?
>>> 
>>> You mean, is it possible to use part of one model and ignore the rest?
>>> Not without coding for that yourself.
>> 
>> Let me explain this in sample code:
>> 
>> What I don't want to do:
>> 
>> AbstractSequenceClassifier companyPersonLocationClassifier = CRFClassifier.getClassifierNoExceptions(companyPersonLocationModel);
>> AbstractSequenceClassifier dateClassifier = CRFClassifier.getClassifierNoExceptions(dateModel);
>> 
>> String companyPersonLocationTagged = companyPersonLocationClassifier.classify(string);
>> String companyPersonLocationDateTagged = dateClassifier.classify(companyPersonLocationTagged);
>> 
>> What I want to do:
>> 
>> AbstractSequenceClassifier classifier = CRFClassifier.getClassifierNoExceptions(companyPersonLocationModel);
>> classifier.addModel(dateModel)
>> 
>> String companyPersonLocationDateTagged = dateClassifier.classify(string);
> 
> 
> Perhaps the class ClassifierCombiner will let you do what you want.  I
> have absolutely no experience using this class, but a brief look at
> the javadoc suggests this is what you want.  There's also the subclass
> NERClassifierCombiner.
> 
> ClassifierCombiner doc:
> 
> """
> Merges the outputs of two or more AbstractSequenceClassifiers
> according to a simple precedence scheme: any given base classifier
> contributes only classifications of labels that do not exist in the
> base classifiers specified before, and that do not have any token
> overlap with labels assigned by higher priority classifiers.
> """
> 
> -John

This looks promising. I will have a look at that.
Thank you very much!!

Could anybody tell me how many sentences need to be annotated in order to get a decent classifier?

Kind regards,
Daniel

Attached is the model for dates, the training and the properties file. I used 10 sentences for each month (contained the month) with a total of 120 sentences all from a current (03/2011) wikipedia dump. I will be extending this model in the future and share this here again. 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ner-date-model.ser.gz
Type: application/x-gzip
Size: 584191 bytes
Desc: not available
URL: <http://mailman.stanford.edu/pipermail/java-nlp-user/attachments/20110408/08e500cf/attachment.gz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: date.prop
Type: application/octet-stream
Size: 950 bytes
Desc: not available
URL: <http://mailman.stanford.edu/pipermail/java-nlp-user/attachments/20110408/08e500cf/attachment.prop>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Date-Training.tsv
Type: text/tab-separated-values
Size: 16845 bytes
Desc: not available
URL: <http://mailman.stanford.edu/pipermail/java-nlp-user/attachments/20110408/08e500cf/attachment.tsv>
-------------- next part --------------




More information about the java-nlp-user mailing list