Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[java-nlp-user] Tagging Arabic

Hajder hajderr at gmail.com
Thu Apr 7 02:37:28 PDT 2011


Hey again.

Thanks John for your reply. I understand the features you mentioned, dbut  
how does the actual tagger do it for
unseen words ie words that are not in the training corpus?

For example the Penn ATB is Modern Standard Arabic (MSA) while if I try to  
tag say...Quranic Arabic  which is classical, yet same scripture -  a  
sample extract with the first 50 words tags majority of the words as NNP  
(most freq. tag ?).

So there's no analysis of the word as I understand?

Thank you

Best Regards
H

Den 2011-04-05 17:35:02 skrev John Bauer <horatio at gmail.com>:

> You can imagine some features that would let you tag an unknown word:
>
> 1) Surrounding tags, eg guess what the missing tag is in NNP ? NNP
> 2) Word shape, eg capitalization in English, not sure what similar
> features there are in Arabic
> 3) Specific surrounding words such as "to" will often have a verb next
>
> etc etc
>
> John
>
> On Tue, Apr 5, 2011 at 9:24 AM, Hajder <hajderr at gmail.com> wrote:
>> Hello
>>
>>> As far as I know, we don't use morphology at all in the tagger.
>>
>> Then I have some questions about how the tagger tags unseen words. I'm
>> planning to
>> do the morphological analysis part by integrating an analyzer with the
>> tagger, just need a lot of background info :).
>>
>> Can you please confirm that in training with a tagged
>> corpus, the language model simply learns each word in the training
>> corpus (and the tag it appears with)? So then in tagging new, unseen
>> text, it can only tag words it has seen in the training coprus?
>> But then, how does it deal with unseen words? If I'm wrong on this,  
>> there
>> must still be SOME method in the Standford
>> Tagger for dealing with unknown words - right?
>>
>>
>> For a morphologically rich language like Arabic, I would expect a LOT of
>> "unseen" words in a new unseen text.
>>
>> Best regards
>> Hajder
>>
>> Den 2011-04-01 19:34:44 skrev John Bauer <horatio at gmail.com>:
>>
>>> As far as I know, we don't use morphology at all in the tagger.  For
>>> example, you can see there is the following comment in the tagger
>>> which no one has ever followed up on:
>>>
>>>  // TODO: Add a flag to lemmatize words (Morphology class) on output of
>>> tagging
>>>
>>> Sorry, but the best documentation we have for the code is the Javadocs,
>>>
>>> John
>>>
>>> On Fri, Apr 1, 2011 at 10:11 AM, Hajder <hajderr at gmail.com> wrote:
>>>>
>>>> Hi again
>>>>
>>>> Following the previous email, I'd like to be more specific and wonder
>>>> how has the Arabic model been generated? Would it be
>>>> possible to integrate another morphological analyzer written in Java?
>>>> Only
>>>> file in the source code pertaining to this was Morphology.java, which
>>>> explains a morphological analyzer for English.
>>>>
>>>> Thank you in advance.
>>>>
>>>> Best Regards
>>>> Hajder
>>>>
>>>>
>>>> Den 2011-04-01 11:20:44 skrev Hajder <hajderr at gmail.com>:
>>>>
>>>>> Hello
>>>>>
>>>>> Yes I've read those papers, thank you. But I wonder if there's any  
>>>>> doc
>>>>> on
>>>>> the source apart from the API
>>>>> on
>>>>>
>>>>> http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/tagger/maxent/package-summary.html
>>>>> ?
>>>>>
>>>>> I want to incorporate or actually get the output from a morphological
>>>>> analyzer for every unknown word in the model so I can try tagging on
>>>>> different texts than the Quran.
>>>>>
>>>>> Best Regards
>>>>> Hajder
>>>>>
>>>>> Den 2011-03-30 19:04:03 skrev John Bauer <horatio at gmail.com>:
>>>>>
>>>>>> Tag/word is clearly wrong... what are the results if you test after
>>>>>> training that way?  They should be horrible.
>>>>>>
>>>>>> I don't know much about the structure of Arabic, but I'm guessing  
>>>>>> that
>>>>>> like most languages, you break it up into sentences.  Each sentence
>>>>>> should be on its own line, not the whole text on one line.
>>>>>>
>>>>>> A description of how the tagger works is in the papers cited on this
>>>>>> page:
>>>>>>
>>>>>> http://nlp.stanford.edu/software/tagger.shtml
>>>>>>
>>>>>> A brief summary is that there are algorithms where you try to solve
>>>>>> for several unknown variables at once.  You can use the predicted
>>>>>> values for some of the variables to influence the predictions for
>>>>>> other variables.  Depending on the setup of the problem, it's  
>>>>>> possible
>>>>>> to solve for the best values of all the variables at once.  That's
>>>>>> what we do here, using the values of the tags as the variables.
>>>>>>
>>>>>> John
>>>>>>
>>>>>> On Wed, Mar 30, 2011 at 8:36 AM, Hajder <hajderr at gmail.com> wrote:
>>>>>>>
>>>>>>> Hello again.
>>>>>>>
>>>>>>> Update with good news.
>>>>>>>
>>>>>>> You're right, the file I sent you before with the data
>>>>>>> "qurancorpus12.txt"
>>>>>>> contained lines
>>>>>>> that were not in the format WORD/TAG (or TAG/WORD if you 'cat' in
>>>>>>> Bash,
>>>>>>> bidirectional issues?) but actually had some
>>>>>>> rest from the cleaning of the original corpus file, I apologize for
>>>>>>> that.
>>>>>>>
>>>>>>> Anyway so I removed the errors from that file, retrained and also
>>>>>>> recreated
>>>>>>> the file with all the words on one line
>>>>>>> (tr '\n' ' ' < corpus.in > corpus.out ) and the tagging for the  
>>>>>>> first
>>>>>>> 50
>>>>>>> words seems pretty accurate, ~95%. It didn't make any difference
>>>>>>> though
>>>>>>> having all the words on separate lines, the Arabic text does not
>>>>>>> really
>>>>>>> have
>>>>>>> any punctuation as in English.
>>>>>>>
>>>>>>> I generated two models, one from the training data on the form
>>>>>>> arabicword/tag and the other on the form tag/arabicword.
>>>>>>> Now in Bash the two files look the same if you 'cat' them, if you  
>>>>>>> open
>>>>>>> up in
>>>>>>> mlterm - multilingual terminal - or any text editor the difference  
>>>>>>> is
>>>>>>> apparent. The arabicword/tag works and the other raises an  
>>>>>>> exception.
>>>>>>> This
>>>>>>> gives me *some* hint but I'd like to know how does the tagger read  
>>>>>>> the
>>>>>>> input
>>>>>>> when it's bidirectional text?
>>>>>>>
>>>>>>> Best Regards
>>>>>>> Hajder
>>>>>>>
>>>>>>>
>>>>>>> Den 2011-03-29 02:01:56 skrev John Bauer <horatio at gmail.com>:
>>>>>>>
>>>>>>>>>> I don't know what the expected behavior is if there is a set of
>>>>>>>>>> open tags specified, but no set of closed tags
>>>>>>>>>
>>>>>>>>> I remember I couldn't train it if both open and closed tags were
>>>>>>>>> set.
>>>>>>>>>
>>>>>>>>
>>>>>>>> It turns out it only lets you specify one of the two.
>>>>>>>>
>>>>>>>> I did some more investigation, and it also turns out it would  
>>>>>>>> crash
>>>>>>>> if
>>>>>>>> you told it such-and-such was an open tag but the tag didn't show  
>>>>>>>> up
>>>>>>>> in the training data anywhere.  That will be fixed in the next
>>>>>>>> release
>>>>>>>> of the tagger, whenever that is.
>>>>>>>>
>>>>>>>> However, open/closed makes no difference in your case, I believe.  
>>>>>>>>  It
>>>>>>>> doesn't matter whether you specify open or closed, it just fills  
>>>>>>>> in
>>>>>>>> the missing class with the other tags.  Performance in your
>>>>>>>> particular
>>>>>>>> case will be much better if you fix the missing tags, ie the  
>>>>>>>> lines in
>>>>>>>> your training data where the exception comes up, and then  
>>>>>>>> rearrange
>>>>>>>> the training data so the sentences are correct.  One word per line
>>>>>>>> makes it think you have one sentence per line, and then it won't  
>>>>>>>> have
>>>>>>>> any context for tags that uses surrounding words or tags,
>>>>>>>>
>>>>>>>> John
>>>>>>>>
>>>>>>>> On Mon, Mar 28, 2011 at 12:53 PM, Hajder <hajderr at gmail.com>  
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi John
>>>>>>>>>
>>>>>>>>> Ok thank you.
>>>>>>>>>
>>>>>>>>> Regarding
>>>>>>>>>>
>>>>>>>>>> I don't know what the expected behavior is if there is a set of
>>>>>>>>>> open tags specified, but no set of closed tags
>>>>>>>>>
>>>>>>>>> I remember I couldn't train it if both open and closed tags were
>>>>>>>>> set.
>>>>>>>>>
>>>>>>>>> Best Regards
>>>>>>>>> H
>>>>>>>>>
>>>>>>>>>  Den 2011-03-26 19:59:28 skrev John Bauer <horatio at gmail.com>:
>>>>>>>>>
>>>>>>>>>> Hi Hajder,
>>>>>>>>>>
>>>>>>>>>> I believe the exception you found when training the file  
>>>>>>>>>> prevented
>>>>>>>>>> the
>>>>>>>>>> tagger from using any more of the data to train the tagger.
>>>>>>>>>>
>>>>>>>>>> Also, you probably want to have multiple words on one line.  The
>>>>>>>>>> way
>>>>>>>>>> it is set up now, you are only doing one word per sentence,  
>>>>>>>>>> which
>>>>>>>>>> contributes to the low accuracy.
>>>>>>>>>>
>>>>>>>>>> That should get you started on improving the accuracy.  In the
>>>>>>>>>> meantime, I will try to figure out why there is an exception
>>>>>>>>>> occurring
>>>>>>>>>> here.  I don't know what the expected behavior is if there is a  
>>>>>>>>>> set
>>>>>>>>>> of
>>>>>>>>>> open tags specified, but no set of closed tags,
>>>>>>>>>>
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>> On Sat, Mar 26, 2011 at 2:42 AM, Hajder <hajderr at gmail.com>  
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> You mean the corpus with WORD/TAG ? I remember having that  
>>>>>>>>>>> problem
>>>>>>>>>>> when
>>>>>>>>>>> *training* the tagger, causing an exception. But that
>>>>>>>>>>> ArrayIndexOutOfBounds
>>>>>>>>>>> occurs when *testing*. Anyway I will attach my props file.
>>>>>>>>>>>
>>>>>>>>>>> Best Regards
>>>>>>>>>>> Hajder
>>>>>>>>>>>
>>>>>>>>>>> Den 2011-03-26 08:22:44 skrev John Bauer <horatio at gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>> By the way, some of the lines in the data file you sent me  
>>>>>>>>>>>> don't
>>>>>>>>>>>> have
>>>>>>>>>>>> the tag separator.  It is throwing an exception for me when I  
>>>>>>>>>>>> run
>>>>>>>>>>>> that, and the result is it stops reading the text from the  
>>>>>>>>>>>> point
>>>>>>>>>>>> it
>>>>>>>>>>>> hits the exception.  The first thing I did was to go through  
>>>>>>>>>>>> and
>>>>>>>>>>>> edit
>>>>>>>>>>>> those lines as best as I could to give them the proper tag
>>>>>>>>>>>> separator
>>>>>>>>>>>> and tags.
>>>>>>>>>>>>
>>>>>>>>>>>> John
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Mar 26, 2011 at 1:00 AM, John Bauer  
>>>>>>>>>>>> <horatio at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry for the delay, caught up in other things.  What did you
>>>>>>>>>>>>> use
>>>>>>>>>>>>> as
>>>>>>>>>>>>> properties when training the tagger?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> John
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Mar 25, 2011 at 3:25 AM, Hajder <hajderr at gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello again.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I wonder if you've had the time to look at my problem?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you in advance.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>> Hajder
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Den 2011-03-21 22:16:57 skrev Hajder <hajderr at gmail.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Attached is the tagged Quran corpus (it's open-domain
>>>>>>>>>>>>>>> basically)
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> first 50 lines of the Quran which is used for testing.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Now I'm not sure how it's handling bidirectional text when
>>>>>>>>>>>>>>> executing
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>> the termainal..perhaps the problem lies there.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thank you.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>> Hajder
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Den 2011-03-21 19:17:48 skrev John Bauer  
>>>>>>>>>>>>>>> <horatio at gmail.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thank you for the crash report...  would you send me (not  
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> whole
>>>>>>>>>>>>>>>> list)
>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> copy of the data used to train, or perhaps a link to the  
>>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>> used
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> train?  It will make it much easier to reproduce the  
>>>>>>>>>>>>>>>> crash,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>> On Mar 21, 2011 10:15 AM, "Hajder" <hajderr at gmail.com>  
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hello
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I've trained the tagger but when I try to tag a sample  
>>>>>>>>>>>>>>>>> text
>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> following
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --------------------------------------------------------------------------------------------
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ....etc....
>>>>>>>>>>>>>>>>> Reading POS tagger model from qmodelOpen.model ... done  
>>>>>>>>>>>>>>>>> [1.3
>>>>>>>>>>>>>>>>> sec].
>>>>>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>>>>>> java.lang.ArrayIndexOutOfBoundsException:
>>>>>>>>>>>>>>>>> -1
>>>>>>>>>>>>>>>>> at java.util.ArrayList.get(ArrayList.java:324)
>>>>>>>>>>>>>>>>> at edu.stanford.nlp.util.HashIndex.get(HashIndex.java:95)
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.TTags.getTag(TTags.java:207)
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.TestSentence.setHistory(TestSentence.java:301)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.TestSentence.scoresOf(TestSentence.java:649)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> edu.stanford.nlp.sequences.ExactBestSequenceFinder.bestSequenceNew(ExactBestSequenceFinder.java:158)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> edu.stanford.nlp.sequences.ExactBestSequenceFinder.bestSequence(ExactBestSequenceFinder.java:98)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.TestSentence.runTagInference(TestSentence.java:277)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.TestSentence.testTagInference(TestSentence.java:258)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.TestSentence.tagSentence(TestSentence.java:110)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.MaxentTagger.tagSentence(MaxentTagger.java:825)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.java:1319)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.java:1225)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.java:1183)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.MaxentTagger.main(MaxentTagger.java:1358)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --------------------------------------------------------------------------------------------
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Now this only occurs when I use openClassTags, if I  
>>>>>>>>>>>>>>>>> comment
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> out
>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>> my
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> .prop when training and use the closedClassTags instead,  
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> works.
>>>>>>>>>>>>>>>>> But
>>>>>>>>>>>>>>>>> then many words are tagged incorrectly, so not sure if
>>>>>>>>>>>>>>>>> getting
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> work
>>>>>>>>>>>>>>>>> with openClassTags would make it better.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Anyway...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> These are my tag settings which I switch between:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> #openClassTags = "N PN ADJ IMPN T V"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> #closedClassTags = "PRON DEM REL P EMPH IMPV PRP CONJ SUB
>>>>>>>>>>>>>>>>> ACC
>>>>>>>>>>>>>>>>> AMD
>>>>>>>>>>>>>>>>> ANS
>>>>>>>>>>>>>>>>> AVR
>>>>>>>>>>>>>>>>> CAUS CERT CIRC COM COND EQ EXH EXL EXP FUT INC INT INTG  
>>>>>>>>>>>>>>>>> NEG
>>>>>>>>>>>>>>>>> PREV
>>>>>>>>>>>>>>>>> PRO
>>>>>>>>>>>>>>>>> REM
>>>>>>>>>>>>>>>>> RES RET RSLT SUP SUR VOC INL"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> (from http://corpus.quran.com/documentation/tagset.jsp )
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> And the following command is used when testing:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> java -classpath stanford-postagger.jar
>>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.MaxentTagger -model
>>>>>>>>>>>>>>>>> qmodelOpen.model
>>>>>>>>>>>>>>>>> -textFile quran-first50.txt
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thank you !
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>> H
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> java-nlp-user mailing list
>>>>>>>>>>>>>>>>> java-nlp-user at lists.stanford.edu
>>>>>>>>>>>>>>>>> https://mailman.stanford.edu/mailman/listinfo/java-nlp-user



More information about the java-nlp-user mailing list