Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[java-nlp-user] Tagging Arabic

John Bauer horatio at gmail.com
Mon Apr 4 23:18:16 PDT 2011


You seem to have forgotten to set the architecture (eg the set of
features to use).  Try adding something like

                    arch = left3words,naacl2003unknowns,unicodeshapes(-1,1)

Also, openClassTags and closedClassTags should not be surrounded by quotes.

Note that the model will not be very good if you don't then put words
of one sentence on the same line.

-John

On Mon, Apr 4, 2011 at 5:59 PM, John Bauer <horatio at gmail.com> wrote:
> Excellent, you are sending us all sorts of test cases that cause our
> system to crawl in a hole and die.  When you train the tagger with
> your data file, you get many lines of the form:
>
> Experiments error: for y=3, ptildeY(y)=1.2914890869172156E-5 but Sum_x
> ptildeXY(x,y)=0.0
>
> Interestingly enough, you get the same thing if you use closed tags
> instead of open tags.  I will figure out what this means (unless
> someone else on this list already knows!) and come up with a solution.
>
> John
>
>
>
> 2011/4/4 Hajder <hajderr at gmail.com>:
>> Hi John
>>
>> The training set is the same, fixed all the lines that were in the wrong
>> format or had some trailing characters. If the training file contained
>> errors, then the exception would've been raised during training and not
>> testing, correct?
>>
>> Just to make sure we're on the same ground, I've attached the files I use
>> and the commands I execute, see if you can reproduce the error.
>>
>> 1. Training a new model with the Quran corpus, OK!
>>        $java -classpath stanford-postagger.jar
>> edu.stanford.nlp.tagger.maxent.MaxentTagger -prop myPropsFile.prop -model
>> qmodel -trainFile corpus_stanford
>>
>>        attached files related to this step: training_log.txt,
>> corpus_stanford, myPropsFile.prop (open class tags used)
>>
>> 2. Tag a file which is a subset from the training corpus (corpus_stanford),
>> OK!
>>        $ java -classpath stanford-postagger.jar
>> edu.stanford.nlp.tagger.maxent.MaxentTagger -model qmodel -textFile
>> quran-first50words.in
>>
>>        attached files related to this step: quran-first50words.in
>>
>> 3. Tag another Arabic text (Modern Standard Arabic, bbc.in) that is not of
>> the same type as the Quranic Arabic - still Arabic text. Apart from the
>> words which probably don't exist in the training file, the file  bbc.in does
>> not have any diacritics, should not matter, ERROR!
>>        $ java -classpath stanford-postagger.jar
>> edu.stanford.nlp.tagger.maxent.MaxentTagger -model qmodel -textFile bbc.in
>>
>>        attached files related to this step: bbc.in, tagging_log.txt
>>
>> 4. FYI, The Arabic model that comes with the package can be used for tagging
>> the bbc.in
>>
>> 5. Using a formatted training file in step 1) with several words per line
>> does not make a difference in step 3).
>>
>> See what you conclusions you can reach on this.
>>
>> Thank you.
>>
>> Best Regards
>> H
>>
>> Den 2011-04-03 21:51:59 skrev John Bauer <horatio at gmail.com>:
>>
>>> Hi Hajder,
>>>
>>> Are you using the same training set as before?  If so, did you update
>>> it the way we talked about?
>>>
>>> Did you specify open or closed tags?
>>>
>>> John
>>>
>>> On Sun, Apr 3, 2011 at 10:41 AM, Hajder <hajderr at gmail.com> wrote:
>>>>
>>>> Hello again
>>>>
>>>> I reported in a previous e-mail about the succesful tagging of the first
>>>> 50 words from the Quran, attached is that file. Now I just tried to
>>>> extend
>>>> to more words and
>>>> other Arabic texts, but I get that Array index out of bounds...*strange*.
>>>>
>>>> It ONLY tags the words from the training corpus and nothing else. This is
>>>> not what I expected, at least it should assign tags, correct?
>>>>
>>>> I tried just typing some words in a text editor (utf-8 enc), copying from
>>>> some
>>>> Arabic newspaper (BBC Arabic, attached) but nothing changes.
>>>>
>>>> Here's the output, any ideas?
>>>>
>>>> -----------------------------------------------------------
>>>>
>>>> $ java -classpath stanford-postagger.jar
>>>> edu.stanford.nlp.tagger.maxent.MaxentTagger -model quranmodel -textFile
>>>> bbc.in
>>>>
>>>>
>>>> Reading POS tagger model from quranmodel ... done [1.2 sec].
>>>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
>>>>       at java.util.ArrayList.get(ArrayList.java:324)
>>>>       at edu.stanford.nlp.util.HashIndex.get(HashIndex.java:95)
>>>>       at edu.stanford.nlp.tagger.maxent.TTags.getTag(TTags.java:207)
>>>>       at
>>>>
>>>> edu.stanford.nlp.tagger.maxent.TestSentence.setHistory(TestSentence.java:301)
>>>>       at
>>>>
>>>> edu.stanford.nlp.tagger.maxent.TestSentence.scoresOf(TestSentence.java:649)
>>>>       at
>>>>
>>>> edu.stanford.nlp.sequences.ExactBestSequenceFinder.bestSequenceNew(ExactBestSequenceFinder.java:158)
>>>>       at
>>>>
>>>> edu.stanford.nlp.sequences.ExactBestSequenceFinder.bestSequence(ExactBestSequenceFinder.java:98)
>>>>       at
>>>>
>>>> edu.stanford.nlp.tagger.maxent.TestSentence.runTagInference(TestSentence.java:277)
>>>>       at
>>>>
>>>> edu.stanford.nlp.tagger.maxent.TestSentence.testTagInference(TestSentence.java:258)
>>>>       at
>>>>
>>>> edu.stanford.nlp.tagger.maxent.TestSentence.tagSentence(TestSentence.java:110)
>>>>       at
>>>>
>>>> edu.stanford.nlp.tagger.maxent.MaxentTagger.tagSentence(MaxentTagger.java:825)
>>>>       at
>>>>
>>>> edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.java:1319)
>>>>       at
>>>>
>>>> edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.java:1225)
>>>>       at
>>>>
>>>> edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.java:1183)
>>>>       at
>>>> edu.stanford.nlp.tagger.maxent.MaxentTagger.main(MaxentTagger.java:1358)
>>>> -----------------------------------------------------------
>>>>
>>>> Thank you in advance.
>>>>
>>>> Best Regards
>>>> Hajder
>>>>
>>>> Den 2011-04-01 11:20:44 skrev Hajder <hajderr at gmail.com>:
>>>>
>>>>> Hello
>>>>>
>>>>> Yes I've read those papers, thank you. But I wonder if there's any doc
>>>>> on
>>>>> the source apart from the API
>>>>> on
>>>>>
>>>>> http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/tagger/maxent/package-summary.html
>>>>> ?
>>>>>
>>>>> I want to incorporate or actually get the output from a morphological
>>>>> analyzer for every unknown word in the model so I can try tagging on
>>>>> different texts than the Quran.
>>>>>
>>>>> Best Regards
>>>>> Hajder
>>>>>
>>>>> Den 2011-03-30 19:04:03 skrev John Bauer <horatio at gmail.com>:
>>>>>
>>>>>> Tag/word is clearly wrong... what are the results if you test after
>>>>>> training that way?  They should be horrible.
>>>>>>
>>>>>> I don't know much about the structure of Arabic, but I'm guessing that
>>>>>> like most languages, you break it up into sentences.  Each sentence
>>>>>> should be on its own line, not the whole text on one line.
>>>>>>
>>>>>> A description of how the tagger works is in the papers cited on this
>>>>>> page:
>>>>>>
>>>>>> http://nlp.stanford.edu/software/tagger.shtml
>>>>>>
>>>>>> A brief summary is that there are algorithms where you try to solve
>>>>>> for several unknown variables at once.  You can use the predicted
>>>>>> values for some of the variables to influence the predictions for
>>>>>> other variables.  Depending on the setup of the problem, it's possible
>>>>>> to solve for the best values of all the variables at once.  That's
>>>>>> what we do here, using the values of the tags as the variables.
>>>>>>
>>>>>> John
>>>>>>
>>>>>> On Wed, Mar 30, 2011 at 8:36 AM, Hajder <hajderr at gmail.com> wrote:
>>>>>>>
>>>>>>> Hello again.
>>>>>>>
>>>>>>> Update with good news.
>>>>>>>
>>>>>>> You're right, the file I sent you before with the data
>>>>>>> "qurancorpus12.txt"
>>>>>>> contained lines
>>>>>>> that were not in the format WORD/TAG (or TAG/WORD if you 'cat' in
>>>>>>> Bash,
>>>>>>> bidirectional issues?) but actually had some
>>>>>>> rest from the cleaning of the original corpus file, I apologize for
>>>>>>> that.
>>>>>>>
>>>>>>> Anyway so I removed the errors from that file, retrained and also
>>>>>>> recreated
>>>>>>> the file with all the words on one line
>>>>>>> (tr '\n' ' ' < corpus.in > corpus.out ) and the tagging for the first
>>>>>>> 50
>>>>>>> words seems pretty accurate, ~95%. It didn't make any difference
>>>>>>> though
>>>>>>> having all the words on separate lines, the Arabic text does not
>>>>>>> really
>>>>>>> have
>>>>>>> any punctuation as in English.
>>>>>>>
>>>>>>> I generated two models, one from the training data on the form
>>>>>>> arabicword/tag and the other on the form tag/arabicword.
>>>>>>> Now in Bash the two files look the same if you 'cat' them, if you open
>>>>>>> up in
>>>>>>> mlterm - multilingual terminal - or any text editor the difference is
>>>>>>> apparent. The arabicword/tag works and the other raises an exception.
>>>>>>> This
>>>>>>> gives me *some* hint but I'd like to know how does the tagger read the
>>>>>>> input
>>>>>>> when it's bidirectional text?
>>>>>>>
>>>>>>> Best Regards
>>>>>>> Hajder
>>>>>>>
>>>>>>>
>>>>>>> Den 2011-03-29 02:01:56 skrev John Bauer <horatio at gmail.com>:
>>>>>>>
>>>>>>>>>> I don't know what the expected behavior is if there is a set of
>>>>>>>>>> open tags specified, but no set of closed tags
>>>>>>>>>
>>>>>>>>> I remember I couldn't train it if both open and closed tags were
>>>>>>>>> set.
>>>>>>>>>
>>>>>>>>
>>>>>>>> It turns out it only lets you specify one of the two.
>>>>>>>>
>>>>>>>> I did some more investigation, and it also turns out it would crash
>>>>>>>> if
>>>>>>>> you told it such-and-such was an open tag but the tag didn't show up
>>>>>>>> in the training data anywhere.  That will be fixed in the next
>>>>>>>> release
>>>>>>>> of the tagger, whenever that is.
>>>>>>>>
>>>>>>>> However, open/closed makes no difference in your case, I believe.  It
>>>>>>>> doesn't matter whether you specify open or closed, it just fills in
>>>>>>>> the missing class with the other tags.  Performance in your
>>>>>>>> particular
>>>>>>>> case will be much better if you fix the missing tags, ie the lines in
>>>>>>>> your training data where the exception comes up, and then rearrange
>>>>>>>> the training data so the sentences are correct.  One word per line
>>>>>>>> makes it think you have one sentence per line, and then it won't have
>>>>>>>> any context for tags that uses surrounding words or tags,
>>>>>>>>
>>>>>>>> John
>>>>>>>>
>>>>>>>> On Mon, Mar 28, 2011 at 12:53 PM, Hajder <hajderr at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi John
>>>>>>>>>
>>>>>>>>> Ok thank you.
>>>>>>>>>
>>>>>>>>> Regarding
>>>>>>>>>>
>>>>>>>>>> I don't know what the expected behavior is if there is a set of
>>>>>>>>>> open tags specified, but no set of closed tags
>>>>>>>>>
>>>>>>>>> I remember I couldn't train it if both open and closed tags were
>>>>>>>>> set.
>>>>>>>>>
>>>>>>>>> Best Regards
>>>>>>>>> H
>>>>>>>>>
>>>>>>>>>  Den 2011-03-26 19:59:28 skrev John Bauer <horatio at gmail.com>:
>>>>>>>>>
>>>>>>>>>> Hi Hajder,
>>>>>>>>>>
>>>>>>>>>> I believe the exception you found when training the file prevented
>>>>>>>>>> the
>>>>>>>>>> tagger from using any more of the data to train the tagger.
>>>>>>>>>>
>>>>>>>>>> Also, you probably want to have multiple words on one line.  The
>>>>>>>>>> way
>>>>>>>>>> it is set up now, you are only doing one word per sentence, which
>>>>>>>>>> contributes to the low accuracy.
>>>>>>>>>>
>>>>>>>>>> That should get you started on improving the accuracy.  In the
>>>>>>>>>> meantime, I will try to figure out why there is an exception
>>>>>>>>>> occurring
>>>>>>>>>> here.  I don't know what the expected behavior is if there is a set
>>>>>>>>>> of
>>>>>>>>>> open tags specified, but no set of closed tags,
>>>>>>>>>>
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>> On Sat, Mar 26, 2011 at 2:42 AM, Hajder <hajderr at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> You mean the corpus with WORD/TAG ? I remember having that problem
>>>>>>>>>>> when
>>>>>>>>>>> *training* the tagger, causing an exception. But that
>>>>>>>>>>> ArrayIndexOutOfBounds
>>>>>>>>>>> occurs when *testing*. Anyway I will attach my props file.
>>>>>>>>>>>
>>>>>>>>>>> Best Regards
>>>>>>>>>>> Hajder
>>>>>>>>>>>
>>>>>>>>>>> Den 2011-03-26 08:22:44 skrev John Bauer <horatio at gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>>> By the way, some of the lines in the data file you sent me don't
>>>>>>>>>>>> have
>>>>>>>>>>>> the tag separator.  It is throwing an exception for me when I run
>>>>>>>>>>>> that, and the result is it stops reading the text from the point
>>>>>>>>>>>> it
>>>>>>>>>>>> hits the exception.  The first thing I did was to go through and
>>>>>>>>>>>> edit
>>>>>>>>>>>> those lines as best as I could to give them the proper tag
>>>>>>>>>>>> separator
>>>>>>>>>>>> and tags.
>>>>>>>>>>>>
>>>>>>>>>>>> John
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Mar 26, 2011 at 1:00 AM, John Bauer <horatio at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry for the delay, caught up in other things.  What did you
>>>>>>>>>>>>> use
>>>>>>>>>>>>> as
>>>>>>>>>>>>> properties when training the tagger?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> John
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Mar 25, 2011 at 3:25 AM, Hajder <hajderr at gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello again.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I wonder if you've had the time to look at my problem?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you in advance.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>> Hajder
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Den 2011-03-21 22:16:57 skrev Hajder <hajderr at gmail.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Attached is the tagged Quran corpus (it's open-domain
>>>>>>>>>>>>>>> basically)
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> first 50 lines of the Quran which is used for testing.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Now I'm not sure how it's handling bidirectional text when
>>>>>>>>>>>>>>> executing
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>> the termainal..perhaps the problem lies there.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thank you.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>> Hajder
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Den 2011-03-21 19:17:48 skrev John Bauer <horatio at gmail.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thank you for the crash report...  would you send me (not the
>>>>>>>>>>>>>>>> whole
>>>>>>>>>>>>>>>> list)
>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> copy of the data used to train, or perhaps a link to the data
>>>>>>>>>>>>>>>> used
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> train?  It will make it much easier to reproduce the crash,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>> On Mar 21, 2011 10:15 AM, "Hajder" <hajderr at gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hello
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I've trained the tagger but when I try to tag a sample text
>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> following
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --------------------------------------------------------------------------------------------
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ....etc....
>>>>>>>>>>>>>>>>> Reading POS tagger model from qmodelOpen.model ... done [1.3
>>>>>>>>>>>>>>>>> sec].
>>>>>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>>>>>> java.lang.ArrayIndexOutOfBoundsException:
>>>>>>>>>>>>>>>>> -1
>>>>>>>>>>>>>>>>> at java.util.ArrayList.get(ArrayList.java:324)
>>>>>>>>>>>>>>>>> at edu.stanford.nlp.util.HashIndex.get(HashIndex.java:95)
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.TTags.getTag(TTags.java:207)
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.TestSentence.setHistory(TestSentence.java:301)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.TestSentence.scoresOf(TestSentence.java:649)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> edu.stanford.nlp.sequences.ExactBestSequenceFinder.bestSequenceNew(ExactBestSequenceFinder.java:158)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> edu.stanford.nlp.sequences.ExactBestSequenceFinder.bestSequence(ExactBestSequenceFinder.java:98)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.TestSentence.runTagInference(TestSentence.java:277)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.TestSentence.testTagInference(TestSentence.java:258)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.TestSentence.tagSentence(TestSentence.java:110)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.MaxentTagger.tagSentence(MaxentTagger.java:825)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.java:1319)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.java:1225)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.java:1183)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.MaxentTagger.main(MaxentTagger.java:1358)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --------------------------------------------------------------------------------------------
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Now this only occurs when I use openClassTags, if I comment
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> out
>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>> my
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> .prop when training and use the closedClassTags instead, it
>>>>>>>>>>>>>>>>> works.
>>>>>>>>>>>>>>>>> But
>>>>>>>>>>>>>>>>> then many words are tagged incorrectly, so not sure if
>>>>>>>>>>>>>>>>> getting
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> work
>>>>>>>>>>>>>>>>> with openClassTags would make it better.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Anyway...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> These are my tag settings which I switch between:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> #openClassTags = "N PN ADJ IMPN T V"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> #closedClassTags = "PRON DEM REL P EMPH IMPV PRP CONJ SUB
>>>>>>>>>>>>>>>>> ACC
>>>>>>>>>>>>>>>>> AMD
>>>>>>>>>>>>>>>>> ANS
>>>>>>>>>>>>>>>>> AVR
>>>>>>>>>>>>>>>>> CAUS CERT CIRC COM COND EQ EXH EXL EXP FUT INC INT INTG NEG
>>>>>>>>>>>>>>>>> PREV
>>>>>>>>>>>>>>>>> PRO
>>>>>>>>>>>>>>>>> REM
>>>>>>>>>>>>>>>>> RES RET RSLT SUP SUR VOC INL"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> (from http://corpus.quran.com/documentation/tagset.jsp )
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> And the following command is used when testing:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> java -classpath stanford-postagger.jar
>>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.MaxentTagger -model
>>>>>>>>>>>>>>>>> qmodelOpen.model
>>>>>>>>>>>>>>>>> -textFile quran-first50.txt
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thank you !
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>> H
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> java-nlp-user mailing list
>>>>>>>>>>>>>>>>> java-nlp-user at lists.stanford.edu
>>>>>>>>>>>>>>>>> https://mailman.stanford.edu/mailman/listinfo/java-nlp-user
>



More information about the java-nlp-user mailing list