Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[java-nlp-user] arabic.tagger model error message

Christopher Manning manning at stanford.edu
Thu Aug 21 15:15:17 PDT 2008


Hello,

Sorry, this is a bug in how the Arabic tagger model is specified in  
the arabic.tagger.props file.  We should make a fixed Arabic tagger.   
Hopefully we can do that next week.

In the meantime, you can work around it by overriding the  
specification for the tokenizerFactory.  Due to a second bug, you  
can't actually do that with a command line argument, but you can by  
specifying a properties file :-(.

That is, if you use the attached arabic.tagger.fixed.props such as in  
the following command:

java -mx300m -cp stanford-postagger.jar  
edu.stanford.nlp.tagger.maxent.MaxentTagger -model models/ 
arabic.tagger -props models/arabic.tagger.fixed.props -textFile arabic- 
sent-utf8.txt > arabic-sent-utf8.tag

Then things will work okay, and should produce the attached output for  
the attached input example.

There's unfortunately no real documentation of the Arabic POS tagger  
at the moment, but except for it being tagging not parsing, almost  
everything that appears on the Stanford Arabic Parser IAQ page:

	http://nlp.stanford.edu/software/parser-arabic-faq.shtml

also applies to the POS tagger (required tokenization, normalization,  
POS tag set used, etc.)

Let us know if there are other problems!

Best,

Chris.



-------------- next part --------------
A non-text attachment was scrubbed...
Name: arabic.tagger.fixed.props
Type: application/octet-stream
Size: 616 bytes
Desc: not available
URL: <http://mailman.stanford.edu/pipermail/java-nlp-user/attachments/20080821/28261ca0/attachment.props>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: arabic-sent-utf8.txt
URL: <http://mailman.stanford.edu/pipermail/java-nlp-user/attachments/20080821/28261ca0/attachment.txt>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arabic-sent-utf8.tag
Type: application/octet-stream
Size: 260 bytes
Desc: not available
URL: <http://mailman.stanford.edu/pipermail/java-nlp-user/attachments/20080821/28261ca0/attachment.tag>
-------------- next part --------------



On Aug 20, 2008, at 2:20 AM, Iman Alodah wrote:

>
> Hi,
> I'm trying to use the arabic.tagger model but i keep getting the  
> following error:
>
> trying to load default properties from trained tagger
> an error occurred while tagging
> java.lang.instantiationException:  
> edu.stanford.nlp.process.WhitespaceTokenizer
>                       at java.lang.Class.newInstance0<Class.java:340>
>                       at java.lang.Class.newInstance<Class.java:308>
>                       at   
> edu 
> .stanford.nlp.tagger.maxent.MaxentTagger.runTagger<MaxentTagger.java: 
> 577>
>                       at   
> edu.stanford.nlp.tagger.maxent.MaxentTagger.main<MaxentTagger.java: 
> 696>
>
> any suggestions please?
>
> Thanking you in anticipation
> Iman Alodah
>
> _______________________________________________
> java-nlp-user mailing list
> java-nlp-user at lists.stanford.edu
> https://mailman.stanford.edu/mailman/listinfo/java-nlp-user



More information about the java-nlp-user mailing list