Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[java-nlp-user] Tagging Arabic

Hajder hajderr at gmail.com
Thu Apr 28 09:18:20 PDT 2011


Do you mean this applies to English as well? I thought defining a couple  
of Extractors could handle unknown words to some extent, no?

H

Den 2011-04-07 18:56:13 skrev John Bauer <horatio at gmail.com>:

> If the lexicons are different to the point that most words are unknown,  
> the
> tagger is going to get most things wrong.  That sounds like the situation
> you are describing.
>
> For a deeper explanation of how it works, let me again refer you to the
> papers on the tagger website,
>
> John
> On Apr 7, 2011 2:37 AM, "Hajder" <hajderr at gmail.com> wrote:
>> Hey again.
>>
>> Thanks John for your reply. I understand the features you mentioned,  
>> dbut
>> how does the actual tagger do it for
>> unseen words ie words that are not in the training corpus?
>>
>> For example the Penn ATB is Modern Standard Arabic (MSA) while if I try  
>> to
>
>> tag say...Quranic Arabic which is classical, yet same scripture - a
>> sample extract with the first 50 words tags majority of the words as NNP
>> (most freq. tag ?).
>>
>> So there's no analysis of the word as I understand?
>>
>> Thank you
>>
>> Best Regards
>> H
>>
>> Den 2011-04-05 17:35:02 skrev John Bauer <horatio at gmail.com>:
>>
>>> You can imagine some features that would let you tag an unknown word:
>>>
>>> 1) Surrounding tags, eg guess what the missing tag is in NNP ? NNP
>>> 2) Word shape, eg capitalization in English, not sure what similar
>>> features there are in Arabic
>>> 3) Specific surrounding words such as "to" will often have a verb next
>>>
>>> etc etc
>>>
>>> John
>>>
>>> On Tue, Apr 5, 2011 at 9:24 AM, Hajder <hajderr at gmail.com> wrote:
>>>> Hello
>>>>
>>>>> As far as I know, we don't use morphology at all in the tagger.
>>>>
>>>> Then I have some questions about how the tagger tags unseen words. I'm
>>>> planning to
>>>> do the morphological analysis part by integrating an analyzer with the
>>>> tagger, just need a lot of background info :).
>>>>
>>>> Can you please confirm that in training with a tagged
>>>> corpus, the language model simply learns each word in the training
>>>> corpus (and the tag it appears with)? So then in tagging new, unseen
>>>> text, it can only tag words it has seen in the training coprus?
>>>> But then, how does it deal with unseen words? If I'm wrong on this,
>>>> there
>>>> must still be SOME method in the Standford
>>>> Tagger for dealing with unknown words - right?
>>>>
>>>>
>>>> For a morphologically rich language like Arabic, I would expect a LOT  
>>>> of
>>>> "unseen" words in a new unseen text.
>>>>
>>>> Best regards
>>>> Hajder
>>>>
>>>> Den 2011-04-01 19:34:44 skrev John Bauer <horatio at gmail.com>:
>>>>
>>>>> As far as I know, we don't use morphology at all in the tagger. For
>>>>> example, you can see there is the following comment in the tagger
>>>>> which no one has ever followed up on:
>>>>>
>>>>> // TODO: Add a flag to lemmatize words (Morphology class) on output  
>>>>> of
>>>>> tagging
>>>>>
>>>>> Sorry, but the best documentation we have for the code is the  
>>>>> Javadocs,
>>>>>
>>>>> John
>>>>>
>>>>> On Fri, Apr 1, 2011 at 10:11 AM, Hajder <hajderr at gmail.com> wrote:
>>>>>>
>>>>>> Hi again
>>>>>>
>>>>>> Following the previous email, I'd like to be more specific and  
>>>>>> wonder
>>>>>> how has the Arabic model been generated? Would it be
>>>>>> possible to integrate another morphological analyzer written in  
>>>>>> Java?
>>>>>> Only
>>>>>> file in the source code pertaining to this was Morphology.java,  
>>>>>> which
>>>>>> explains a morphological analyzer for English.
>>>>>>
>>>>>> Thank you in advance.
>>>>>>
>>>>>> Best Regards
>>>>>> Hajder
>>>>>>
>>>>>>
>>>>>> Den 2011-04-01 11:20:44 skrev Hajder <hajderr at gmail.com>:
>>>>>>
>>>>>>> Hello
>>>>>>>
>>>>>>> Yes I've read those papers, thank you. But I wonder if there's any
>>>>>>> doc
>>>>>>> on
>>>>>>> the source apart from the API
>>>>>>> on
>>>>>>>
>>>>>>>
> http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/tagger/maxent/package-summary.html
>>>>>>> ?
>>>>>>>
>>>>>>> I want to incorporate or actually get the output from a  
>>>>>>> morphological
>>>>>>> analyzer for every unknown word in the model so I can try tagging  
>>>>>>> on
>>>>>>> different texts than the Quran.
>>>>>>>
>>>>>>> Best Regards
>>>>>>> Hajder
>>>>>>>
>>>>>>> Den 2011-03-30 19:04:03 skrev John Bauer <horatio at gmail.com>:
>>>>>>>
>>>>>>>> Tag/word is clearly wrong... what are the results if you test  
>>>>>>>> after
>>>>>>>> training that way? They should be horrible.
>>>>>>>>
>>>>>>>> I don't know much about the structure of Arabic, but I'm guessing
>>>>>>>> that
>>>>>>>> like most languages, you break it up into sentences. Each sentence
>>>>>>>> should be on its own line, not the whole text on one line.
>>>>>>>>
>>>>>>>> A description of how the tagger works is in the papers cited on  
>>>>>>>> this
>>>>>>>> page:
>>>>>>>>
>>>>>>>> http://nlp.stanford.edu/software/tagger.shtml
>>>>>>>>
>>>>>>>> A brief summary is that there are algorithms where you try to  
>>>>>>>> solve
>>>>>>>> for several unknown variables at once. You can use the predicted
>>>>>>>> values for some of the variables to influence the predictions for
>>>>>>>> other variables. Depending on the setup of the problem, it's
>>>>>>>> possible
>>>>>>>> to solve for the best values of all the variables at once. That's
>>>>>>>> what we do here, using the values of the tags as the variables.
>>>>>>>>
>>>>>>>> John
>>>>>>>>
>>>>>>>> On Wed, Mar 30, 2011 at 8:36 AM, Hajder <hajderr at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Hello again.
>>>>>>>>>
>>>>>>>>> Update with good news.
>>>>>>>>>
>>>>>>>>> You're right, the file I sent you before with the data
>>>>>>>>> "qurancorpus12.txt"
>>>>>>>>> contained lines
>>>>>>>>> that were not in the format WORD/TAG (or TAG/WORD if you 'cat' in
>>>>>>>>> Bash,
>>>>>>>>> bidirectional issues?) but actually had some
>>>>>>>>> rest from the cleaning of the original corpus file, I apologize  
>>>>>>>>> for
>>>>>>>>> that.
>>>>>>>>>
>>>>>>>>> Anyway so I removed the errors from that file, retrained and also
>>>>>>>>> recreated
>>>>>>>>> the file with all the words on one line
>>>>>>>>> (tr '\n' ' ' < corpus.in > corpus.out ) and the tagging for the
>>>>>>>>> first
>>>>>>>>> 50
>>>>>>>>> words seems pretty accurate, ~95%. It didn't make any difference
>>>>>>>>> though
>>>>>>>>> having all the words on separate lines, the Arabic text does not
>>>>>>>>> really
>>>>>>>>> have
>>>>>>>>> any punctuation as in English.
>>>>>>>>>
>>>>>>>>> I generated two models, one from the training data on the form
>>>>>>>>> arabicword/tag and the other on the form tag/arabicword.
>>>>>>>>> Now in Bash the two files look the same if you 'cat' them, if you
>>>>>>>>> open
>>>>>>>>> up in
>>>>>>>>> mlterm - multilingual terminal - or any text editor the  
>>>>>>>>> difference
>>>>>>>>> is
>>>>>>>>> apparent. The arabicword/tag works and the other raises an
>>>>>>>>> exception.
>>>>>>>>> This
>>>>>>>>> gives me *some* hint but I'd like to know how does the tagger  
>>>>>>>>> read
>>>>>>>>> the
>>>>>>>>> input
>>>>>>>>> when it's bidirectional text?
>>>>>>>>>
>>>>>>>>> Best Regards
>>>>>>>>> Hajder
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Den 2011-03-29 02:01:56 skrev John Bauer <horatio at gmail.com>:
>>>>>>>>>
>>>>>>>>>>>> I don't know what the expected behavior is if there is a set  
>>>>>>>>>>>> of
>>>>>>>>>>>> open tags specified, but no set of closed tags
>>>>>>>>>>>
>>>>>>>>>>> I remember I couldn't train it if both open and closed tags  
>>>>>>>>>>> were
>>>>>>>>>>> set.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It turns out it only lets you specify one of the two.
>>>>>>>>>>
>>>>>>>>>> I did some more investigation, and it also turns out it would
>>>>>>>>>> crash
>>>>>>>>>> if
>>>>>>>>>> you told it such-and-such was an open tag but the tag didn't  
>>>>>>>>>> show
>>>>>>>>>> up
>>>>>>>>>> in the training data anywhere. That will be fixed in the next
>>>>>>>>>> release
>>>>>>>>>> of the tagger, whenever that is.
>>>>>>>>>>
>>>>>>>>>> However, open/closed makes no difference in your case, I  
>>>>>>>>>> believe.
>>>>>>>>>> It
>>>>>>>>>> doesn't matter whether you specify open or closed, it just fills
>>>>>>>>>> in
>>>>>>>>>> the missing class with the other tags. Performance in your
>>>>>>>>>> particular
>>>>>>>>>> case will be much better if you fix the missing tags, ie the
>>>>>>>>>> lines in
>>>>>>>>>> your training data where the exception comes up, and then
>>>>>>>>>> rearrange
>>>>>>>>>> the training data so the sentences are correct. One word per  
>>>>>>>>>> line
>>>>>>>>>> makes it think you have one sentence per line, and then it won't
>>>>>>>>>> have
>>>>>>>>>> any context for tags that uses surrounding words or tags,
>>>>>>>>>>
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 28, 2011 at 12:53 PM, Hajder <hajderr at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi John
>>>>>>>>>>>
>>>>>>>>>>> Ok thank you.
>>>>>>>>>>>
>>>>>>>>>>> Regarding
>>>>>>>>>>>>
>>>>>>>>>>>> I don't know what the expected behavior is if there is a set  
>>>>>>>>>>>> of
>>>>>>>>>>>> open tags specified, but no set of closed tags
>>>>>>>>>>>
>>>>>>>>>>> I remember I couldn't train it if both open and closed tags  
>>>>>>>>>>> were
>>>>>>>>>>> set.
>>>>>>>>>>>
>>>>>>>>>>> Best Regards
>>>>>>>>>>> H
>>>>>>>>>>>
>>>>>>>>>>> Den 2011-03-26 19:59:28 skrev John Bauer <horatio at gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Hajder,
>>>>>>>>>>>>
>>>>>>>>>>>> I believe the exception you found when training the file
>>>>>>>>>>>> prevented
>>>>>>>>>>>> the
>>>>>>>>>>>> tagger from using any more of the data to train the tagger.
>>>>>>>>>>>>
>>>>>>>>>>>> Also, you probably want to have multiple words on one line.  
>>>>>>>>>>>> The
>>>>>>>>>>>> way
>>>>>>>>>>>> it is set up now, you are only doing one word per sentence,
>>>>>>>>>>>> which
>>>>>>>>>>>> contributes to the low accuracy.
>>>>>>>>>>>>
>>>>>>>>>>>> That should get you started on improving the accuracy. In the
>>>>>>>>>>>> meantime, I will try to figure out why there is an exception
>>>>>>>>>>>> occurring
>>>>>>>>>>>> here. I don't know what the expected behavior is if there is a
>>>>>>>>>>>> set
>>>>>>>>>>>> of
>>>>>>>>>>>> open tags specified, but no set of closed tags,
>>>>>>>>>>>>
>>>>>>>>>>>> John
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Mar 26, 2011 at 2:42 AM, Hajder <hajderr at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> You mean the corpus with WORD/TAG ? I remember having that
>>>>>>>>>>>>> problem
>>>>>>>>>>>>> when
>>>>>>>>>>>>> *training* the tagger, causing an exception. But that
>>>>>>>>>>>>> ArrayIndexOutOfBounds
>>>>>>>>>>>>> occurs when *testing*. Anyway I will attach my props file.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>> Hajder
>>>>>>>>>>>>>
>>>>>>>>>>>>> Den 2011-03-26 08:22:44 skrev John Bauer <horatio at gmail.com>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> By the way, some of the lines in the data file you sent me
>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>> the tag separator. It is throwing an exception for me when I
>>>>>>>>>>>>>> run
>>>>>>>>>>>>>> that, and the result is it stops reading the text from the
>>>>>>>>>>>>>> point
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>> hits the exception. The first thing I did was to go through
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> edit
>>>>>>>>>>>>>> those lines as best as I could to give them the proper tag
>>>>>>>>>>>>>> separator
>>>>>>>>>>>>>> and tags.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Mar 26, 2011 at 1:00 AM, John Bauer
>>>>>>>>>>>>>> <horatio at gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sorry for the delay, caught up in other things. What did  
>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>> properties when training the tagger?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Mar 25, 2011 at 3:25 AM, Hajder <hajderr at gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hello again.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I wonder if you've had the time to look at my problem?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thank you in advance.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>> Hajder
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Den 2011-03-21 22:16:57 skrev Hajder <hajderr at gmail.com>:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Attached is the tagged Quran corpus (it's open-domain
>>>>>>>>>>>>>>>>> basically)
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> first 50 lines of the Quran which is used for testing.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Now I'm not sure how it's handling bidirectional text  
>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>> executing
>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>> the termainal..perhaps the problem lies there.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thank you.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>> Hajder
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Den 2011-03-21 19:17:48 skrev John Bauer
>>>>>>>>>>>>>>>>> <horatio at gmail.com>:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thank you for the crash report... would you send me (not
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> whole
>>>>>>>>>>>>>>>>>> list)
>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>> copy of the data used to train, or perhaps a link to the
>>>>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>> used
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> train? It will make it much easier to reproduce the
>>>>>>>>>>>>>>>>>> crash,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>>>> On Mar 21, 2011 10:15 AM, "Hajder" <hajderr at gmail.com>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hello
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I've trained the tagger but when I try to tag a sample
>>>>>>>>>>>>>>>>>>> text
>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> following
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
> --------------------------------------------------------------------------------------------
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ....etc....
>>>>>>>>>>>>>>>>>>> Reading POS tagger model from qmodelOpen.model ... done
>>>>>>>>>>>>>>>>>>> [1.3
>>>>>>>>>>>>>>>>>>> sec].
>>>>>>>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>>>>>>>> java.lang.ArrayIndexOutOfBoundsException:
>>>>>>>>>>>>>>>>>>> -1
>>>>>>>>>>>>>>>>>>> at java.util.ArrayList.get(ArrayList.java:324)
>>>>>>>>>>>>>>>>>>> at  
>>>>>>>>>>>>>>>>>>> edu.stanford.nlp.util.HashIndex.get(HashIndex.java:95)
>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>
> edu.stanford.nlp.tagger.maxent.TTags.getTag(TTags.java:207)
>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
> edu.stanford.nlp.tagger.maxent.TestSentence.setHistory(TestSentence.java:301)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
> edu.stanford.nlp.tagger.maxent.TestSentence.scoresOf(TestSentence.java:649)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
> edu.stanford.nlp.sequences.ExactBestSequenceFinder.bestSequenceNew(ExactBestSequenceFinder.java:158)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
> edu.stanford.nlp.sequences.ExactBestSequenceFinder.bestSequence(ExactBestSequenceFinder.java:98)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
> edu.stanford.nlp.tagger.maxent.TestSentence.runTagInference(TestSentence.java:277)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
> edu.stanford.nlp.tagger.maxent.TestSentence.testTagInference(TestSentence.java:258)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
> edu.stanford.nlp.tagger.maxent.TestSentence.tagSentence(TestSentence.java:110)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
> edu.stanford.nlp.tagger.maxent.MaxentTagger.tagSentence(MaxentTagger.java:825)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
> edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.java:1319)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
> edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.java:1225)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
> edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.java:1183)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
> edu.stanford.nlp.tagger.maxent.MaxentTagger.main(MaxentTagger.java:1358)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
> --------------------------------------------------------------------------------------------
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Now this only occurs when I use openClassTags, if I
>>>>>>>>>>>>>>>>>>> comment
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> out
>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> my
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> .prop when training and use the closedClassTags  
>>>>>>>>>>>>>>>>>>> instead,
>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>> works.
>>>>>>>>>>>>>>>>>>> But
>>>>>>>>>>>>>>>>>>> then many words are tagged incorrectly, so not sure if
>>>>>>>>>>>>>>>>>>> getting
>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> work
>>>>>>>>>>>>>>>>>>> with openClassTags would make it better.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Anyway...
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> These are my tag settings which I switch between:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> #openClassTags = "N PN ADJ IMPN T V"
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> #closedClassTags = "PRON DEM REL P EMPH IMPV PRP CONJ  
>>>>>>>>>>>>>>>>>>> SUB
>>>>>>>>>>>>>>>>>>> ACC
>>>>>>>>>>>>>>>>>>> AMD
>>>>>>>>>>>>>>>>>>> ANS
>>>>>>>>>>>>>>>>>>> AVR
>>>>>>>>>>>>>>>>>>> CAUS CERT CIRC COM COND EQ EXH EXL EXP FUT INC INT INTG
>>>>>>>>>>>>>>>>>>> NEG
>>>>>>>>>>>>>>>>>>> PREV
>>>>>>>>>>>>>>>>>>> PRO
>>>>>>>>>>>>>>>>>>> REM
>>>>>>>>>>>>>>>>>>> RES RET RSLT SUP SUR VOC INL"
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> (from http://corpus.quran.com/documentation/tagset.jsp  
>>>>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> And the following command is used when testing:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> java -classpath stanford-postagger.jar
>>>>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.MaxentTagger -model
>>>>>>>>>>>>>>>>>>> qmodelOpen.model
>>>>>>>>>>>>>>>>>>> -textFile quran-first50.txt
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thank you !
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>> H
>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>> java-nlp-user mailing list
>>>>>>>>>>>>>>>>>>> java-nlp-user at lists.stanford.edu
>>>>>>>>>>>>>>>>>>>
> https://mailman.stanford.edu/mailman/listinfo/java-nlp-user



More information about the java-nlp-user mailing list