Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[java-nlp-user] Tagging Arabic

Christopher Manning manning at stanford.edu
Wed May 4 09:05:47 PDT 2011


On Apr 19, 2011, at 2:05 PM, Hajder wrote:

> Hi John and the list.
> 
> I've reread the papers and inspected the code in order to understand how I
> can integrate a morphological analyzer. As I see it, the papers are on a
> higher level and describe the features and how the MaxEnt model works. In
> the source code I've only found Morphology.java (as I mentioned before)
> and that is not being used even - not sure how it would be either!
> 
> In MaxentTagger and TestSentence.java The tagSentence() returns an int[] that's supposed to be the "tags", but I
> cannot make sense of how it maps the those integers to actual tags, maybe
> this is something specified in the model. If I was to feed the
> output from a MA, I understand the tagger would run Viterbi over
> the sentence's tags...but how would I actually do it?

Hi Hajder,

Yes, the papers are higher level, but they do describe in some detail how unknown words are handled in terms of things like character prefix and suffix features and contextual features.

I agree that there is a gap between that level of description and the actual code.

Unfortunately, I don't think that you can really expect to get an understanding of how the code works by successive questions to the mailing list.  It seems like you'd need to spend time learning both about Java, how feature-based discriminative taggers work in general, and staring at our code in particular.  The kind of thing that you are imagining doing would require significant code extensions and hence significant technical understanding of how things work at both a conceptual and implementation level that you don't seem to have at present.

> 
> Obviously the idea is to find those tags with the lowest score and take the corresponding
> word, input to the MA and get rerun the tagger with the new tags from the MA.
> 
> I've also read Rathnaparki,1996 who claims that his model performs well on
> unseen text, which is obviously the ultimate goal of a tagger.
> How come the Stanford tagger cannot do this with the current
> model that is also based on it?

Performance depends on the features and how well they generalize to previously unseen words.  E.g., prefix and suffix features work well for many languages, certainly English.  However, if you think about it, I think it is fairly clear that they won't work effectively if you train on diacritized Arabic and test on undiacritized Arabic, precisely because all the short vowels that show so much of the inflectional morphology are lost.  E.g., in the training data, nominative nouns will be found ending in -i, but at test time they never will :-(.  

That's never going to work; you'd either need to train on undiacritized Arabic if that's what will be provided at runtime or to define new feature extractors that can regard things as equivalent regardless of the presence or not of diacritic characters.

Chris.



> 
> Forgive me if I've missed something fundamental and I appreciate any advice on this.
> 
> Best Regards
> Hajder
> 
> Den 2011-04-07 19:56:13 skrev John Bauer <horatio at gmail.com>:
> 
>> If the lexicons are different to the point that most words are unknown, the
>> tagger is going to get most things wrong.  That sounds like the situation
>> you are describing.
>> 
>> For a deeper explanation of how it works, let me again refer you to the
>> papers on the tagger website,
>> 
>> John
>> On Apr 7, 2011 2:37 AM, "Hajder" <hajderr at gmail.com> wrote:
>>> Hey again.
>>> 
>>> Thanks John for your reply. I understand the features you mentioned, dbut
>>> how does the actual tagger do it for
>>> unseen words ie words that are not in the training corpus?
>>> 
>>> For example the Penn ATB is Modern Standard Arabic (MSA) while if I try to
>> 
>>> tag say...Quranic Arabic which is classical, yet same scripture - a
>>> sample extract with the first 50 words tags majority of the words as NNP
>>> (most freq. tag ?).
>>> 
>>> So there's no analysis of the word as I understand?
>>> 
>>> Thank you
>>> 
>>> Best Regards
>>> H
>>> 
>>> Den 2011-04-05 17:35:02 skrev John Bauer <horatio at gmail.com>:
>>> 
>>>> You can imagine some features that would let you tag an unknown word:
>>>> 
>>>> 1) Surrounding tags, eg guess what the missing tag is in NNP ? NNP
>>>> 2) Word shape, eg capitalization in English, not sure what similar
>>>> features there are in Arabic
>>>> 3) Specific surrounding words such as "to" will often have a verb next
>>>> 
>>>> etc etc
>>>> 
>>>> John
>>>> 
>>>> On Tue, Apr 5, 2011 at 9:24 AM, Hajder <hajderr at gmail.com> wrote:
>>>>> Hello
>>>>> 
>>>>>> As far as I know, we don't use morphology at all in the tagger.
>>>>> 
>>>>> Then I have some questions about how the tagger tags unseen words. I'm
>>>>> planning to
>>>>> do the morphological analysis part by integrating an analyzer with the
>>>>> tagger, just need a lot of background info :).
>>>>> 
>>>>> Can you please confirm that in training with a tagged
>>>>> corpus, the language model simply learns each word in the training
>>>>> corpus (and the tag it appears with)? So then in tagging new, unseen
>>>>> text, it can only tag words it has seen in the training coprus?
>>>>> But then, how does it deal with unseen words? If I'm wrong on this,
>>>>> there
>>>>> must still be SOME method in the Standford
>>>>> Tagger for dealing with unknown words - right?
>>>>> 
>>>>> 
>>>>> For a morphologically rich language like Arabic, I would expect a LOT of
>>>>> "unseen" words in a new unseen text.
>>>>> 
>>>>> Best regards
>>>>> Hajder
>>>>> 
>>>>> Den 2011-04-01 19:34:44 skrev John Bauer <horatio at gmail.com>:
>>>>> 
>>>>>> As far as I know, we don't use morphology at all in the tagger. For
>>>>>> example, you can see there is the following comment in the tagger
>>>>>> which no one has ever followed up on:
>>>>>> 
>>>>>> // TODO: Add a flag to lemmatize words (Morphology class) on output of
>>>>>> tagging
>>>>>> 
>>>>>> Sorry, but the best documentation we have for the code is the Javadocs,
>>>>>> 
>>>>>> John
>>>>>> 
>>>>>> On Fri, Apr 1, 2011 at 10:11 AM, Hajder <hajderr at gmail.com> wrote:
>>>>>>> 
>>>>>>> Hi again
>>>>>>> 
>>>>>>> Following the previous email, I'd like to be more specific and wonder
>>>>>>> how has the Arabic model been generated? Would it be
>>>>>>> possible to integrate another morphological analyzer written in Java?
>>>>>>> Only
>>>>>>> file in the source code pertaining to this was Morphology.java, which
>>>>>>> explains a morphological analyzer for English.
>>>>>>> 
>>>>>>> Thank you in advance.
>>>>>>> 
>>>>>>> Best Regards
>>>>>>> Hajder
>>>>>>> 
>>>>>>> 
>>>>>>> Den 2011-04-01 11:20:44 skrev Hajder <hajderr at gmail.com>:
>>>>>>> 
>>>>>>>> Hello
>>>>>>>> 
>>>>>>>> Yes I've read those papers, thank you. But I wonder if there's any
>>>>>>>> doc
>>>>>>>> on
>>>>>>>> the source apart from the API
>>>>>>>> on
>>>>>>>> 
>>>>>>>> 
>> http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/tagger/maxent/package-summary.html
>>>>>>>> ?
>>>>>>>> 
>>>>>>>> I want to incorporate or actually get the output from a morphological
>>>>>>>> analyzer for every unknown word in the model so I can try tagging on
>>>>>>>> different texts than the Quran.
>>>>>>>> 
>>>>>>>> Best Regards
>>>>>>>> Hajder
>>>>>>>> 
>>>>>>>> Den 2011-03-30 19:04:03 skrev John Bauer <horatio at gmail.com>:
>>>>>>>> 
>>>>>>>>> Tag/word is clearly wrong... what are the results if you test after
>>>>>>>>> training that way? They should be horrible.
>>>>>>>>> 
>>>>>>>>> I don't know much about the structure of Arabic, but I'm guessing
>>>>>>>>> that
>>>>>>>>> like most languages, you break it up into sentences. Each sentence
>>>>>>>>> should be on its own line, not the whole text on one line.
>>>>>>>>> 
>>>>>>>>> A description of how the tagger works is in the papers cited on this
>>>>>>>>> page:
>>>>>>>>> 
>>>>>>>>> http://nlp.stanford.edu/software/tagger.shtml
>>>>>>>>> 
>>>>>>>>> A brief summary is that there are algorithms where you try to solve
>>>>>>>>> for several unknown variables at once. You can use the predicted
>>>>>>>>> values for some of the variables to influence the predictions for
>>>>>>>>> other variables. Depending on the setup of the problem, it's
>>>>>>>>> possible
>>>>>>>>> to solve for the best values of all the variables at once. That's
>>>>>>>>> what we do here, using the values of the tags as the variables.
>>>>>>>>> 
>>>>>>>>> John
>>>>>>>>> 
>>>>>>>>> On Wed, Mar 30, 2011 at 8:36 AM, Hajder <hajderr at gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hello again.
>>>>>>>>>> 
>>>>>>>>>> Update with good news.
>>>>>>>>>> 
>>>>>>>>>> You're right, the file I sent you before with the data
>>>>>>>>>> "qurancorpus12.txt"
>>>>>>>>>> contained lines
>>>>>>>>>> that were not in the format WORD/TAG (or TAG/WORD if you 'cat' in
>>>>>>>>>> Bash,
>>>>>>>>>> bidirectional issues?) but actually had some
>>>>>>>>>> rest from the cleaning of the original corpus file, I apologize for
>>>>>>>>>> that.
>>>>>>>>>> 
>>>>>>>>>> Anyway so I removed the errors from that file, retrained and also
>>>>>>>>>> recreated
>>>>>>>>>> the file with all the words on one line
>>>>>>>>>> (tr '\n' ' ' < corpus.in > corpus.out ) and the tagging for the
>>>>>>>>>> first
>>>>>>>>>> 50
>>>>>>>>>> words seems pretty accurate, ~95%. It didn't make any difference
>>>>>>>>>> though
>>>>>>>>>> having all the words on separate lines, the Arabic text does not
>>>>>>>>>> really
>>>>>>>>>> have
>>>>>>>>>> any punctuation as in English.
>>>>>>>>>> 
>>>>>>>>>> I generated two models, one from the training data on the form
>>>>>>>>>> arabicword/tag and the other on the form tag/arabicword.
>>>>>>>>>> Now in Bash the two files look the same if you 'cat' them, if you
>>>>>>>>>> open
>>>>>>>>>> up in
>>>>>>>>>> mlterm - multilingual terminal - or any text editor the difference
>>>>>>>>>> is
>>>>>>>>>> apparent. The arabicword/tag works and the other raises an
>>>>>>>>>> exception.
>>>>>>>>>> This
>>>>>>>>>> gives me *some* hint but I'd like to know how does the tagger read
>>>>>>>>>> the
>>>>>>>>>> input
>>>>>>>>>> when it's bidirectional text?
>>>>>>>>>> 
>>>>>>>>>> Best Regards
>>>>>>>>>> Hajder
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Den 2011-03-29 02:01:56 skrev John Bauer <horatio at gmail.com>:
>>>>>>>>>> 
>>>>>>>>>>>>> I don't know what the expected behavior is if there is a set of
>>>>>>>>>>>>> open tags specified, but no set of closed tags
>>>>>>>>>>>> 
>>>>>>>>>>>> I remember I couldn't train it if both open and closed tags were
>>>>>>>>>>>> set.
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> It turns out it only lets you specify one of the two.
>>>>>>>>>>> 
>>>>>>>>>>> I did some more investigation, and it also turns out it would
>>>>>>>>>>> crash
>>>>>>>>>>> if
>>>>>>>>>>> you told it such-and-such was an open tag but the tag didn't show
>>>>>>>>>>> up
>>>>>>>>>>> in the training data anywhere. That will be fixed in the next
>>>>>>>>>>> release
>>>>>>>>>>> of the tagger, whenever that is.
>>>>>>>>>>> 
>>>>>>>>>>> However, open/closed makes no difference in your case, I believe.
>>>>>>>>>>> It
>>>>>>>>>>> doesn't matter whether you specify open or closed, it just fills
>>>>>>>>>>> in
>>>>>>>>>>> the missing class with the other tags. Performance in your
>>>>>>>>>>> particular
>>>>>>>>>>> case will be much better if you fix the missing tags, ie the
>>>>>>>>>>> lines in
>>>>>>>>>>> your training data where the exception comes up, and then
>>>>>>>>>>> rearrange
>>>>>>>>>>> the training data so the sentences are correct. One word per line
>>>>>>>>>>> makes it think you have one sentence per line, and then it won't
>>>>>>>>>>> have
>>>>>>>>>>> any context for tags that uses surrounding words or tags,
>>>>>>>>>>> 
>>>>>>>>>>> John
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Mar 28, 2011 at 12:53 PM, Hajder <hajderr at gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi John
>>>>>>>>>>>> 
>>>>>>>>>>>> Ok thank you.
>>>>>>>>>>>> 
>>>>>>>>>>>> Regarding
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I don't know what the expected behavior is if there is a set of
>>>>>>>>>>>>> open tags specified, but no set of closed tags
>>>>>>>>>>>> 
>>>>>>>>>>>> I remember I couldn't train it if both open and closed tags were
>>>>>>>>>>>> set.
>>>>>>>>>>>> 
>>>>>>>>>>>> Best Regards
>>>>>>>>>>>> H
>>>>>>>>>>>> 
>>>>>>>>>>>> Den 2011-03-26 19:59:28 skrev John Bauer <horatio at gmail.com>:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Hajder,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I believe the exception you found when training the file
>>>>>>>>>>>>> prevented
>>>>>>>>>>>>> the
>>>>>>>>>>>>> tagger from using any more of the data to train the tagger.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Also, you probably want to have multiple words on one line. The
>>>>>>>>>>>>> way
>>>>>>>>>>>>> it is set up now, you are only doing one word per sentence,
>>>>>>>>>>>>> which
>>>>>>>>>>>>> contributes to the low accuracy.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> That should get you started on improving the accuracy. In the
>>>>>>>>>>>>> meantime, I will try to figure out why there is an exception
>>>>>>>>>>>>> occurring
>>>>>>>>>>>>> here. I don't know what the expected behavior is if there is a
>>>>>>>>>>>>> set
>>>>>>>>>>>>> of
>>>>>>>>>>>>> open tags specified, but no set of closed tags,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> John
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Sat, Mar 26, 2011 at 2:42 AM, Hajder <hajderr at gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> You mean the corpus with WORD/TAG ? I remember having that
>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>> when
>>>>>>>>>>>>>> *training* the tagger, causing an exception. But that
>>>>>>>>>>>>>> ArrayIndexOutOfBounds
>>>>>>>>>>>>>> occurs when *testing*. Anyway I will attach my props file.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>> Hajder
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Den 2011-03-26 08:22:44 skrev John Bauer <horatio at gmail.com>:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> By the way, some of the lines in the data file you sent me
>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>> the tag separator. It is throwing an exception for me when I
>>>>>>>>>>>>>>> run
>>>>>>>>>>>>>>> that, and the result is it stops reading the text from the
>>>>>>>>>>>>>>> point
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>> hits the exception. The first thing I did was to go through
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> edit
>>>>>>>>>>>>>>> those lines as best as I could to give them the proper tag
>>>>>>>>>>>>>>> separator
>>>>>>>>>>>>>>> and tags.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Sat, Mar 26, 2011 at 1:00 AM, John Bauer
>>>>>>>>>>>>>>> <horatio at gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Sorry for the delay, caught up in other things. What did you
>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>> properties when training the tagger?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Fri, Mar 25, 2011 at 3:25 AM, Hajder <hajderr at gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hello again.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I wonder if you've had the time to look at my problem?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thank you in advance.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>> Hajder
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Den 2011-03-21 22:16:57 skrev Hajder <hajderr at gmail.com>:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Attached is the tagged Quran corpus (it's open-domain
>>>>>>>>>>>>>>>>>> basically)
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> first 50 lines of the Quran which is used for testing.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Now I'm not sure how it's handling bidirectional text when
>>>>>>>>>>>>>>>>>> executing
>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>> the termainal..perhaps the problem lies there.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thank you.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>> Hajder
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Den 2011-03-21 19:17:48 skrev John Bauer
>>>>>>>>>>>>>>>>>> <horatio at gmail.com>:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thank you for the crash report... would you send me (not
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> whole
>>>>>>>>>>>>>>>>>>> list)
>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> copy of the data used to train, or perhaps a link to the
>>>>>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>>> used
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> train? It will make it much easier to reproduce the
>>>>>>>>>>>>>>>>>>> crash,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>>>>> On Mar 21, 2011 10:15 AM, "Hajder" <hajderr at gmail.com>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Hello
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> I've trained the tagger but when I try to tag a sample
>>>>>>>>>>>>>>>>>>>> text
>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>> get
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> following
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>> --------------------------------------------------------------------------------------------
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> ....etc....
>>>>>>>>>>>>>>>>>>>> Reading POS tagger model from qmodelOpen.model ... done
>>>>>>>>>>>>>>>>>>>> [1.3
>>>>>>>>>>>>>>>>>>>> sec].
>>>>>>>>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>>>>>>>>> java.lang.ArrayIndexOutOfBoundsException:
>>>>>>>>>>>>>>>>>>>> -1
>>>>>>>>>>>>>>>>>>>> at java.util.ArrayList.get(ArrayList.java:324)
>>>>>>>>>>>>>>>>>>>> at edu.stanford.nlp.util.HashIndex.get(HashIndex.java:95)
>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>> 
>> edu.stanford.nlp.tagger.maxent.TTags.getTag(TTags.java:207)
>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>> edu.stanford.nlp.tagger.maxent.TestSentence.setHistory(TestSentence.java:301)
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>> edu.stanford.nlp.tagger.maxent.TestSentence.scoresOf(TestSentence.java:649)
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>> edu.stanford.nlp.sequences.ExactBestSequenceFinder.bestSequenceNew(ExactBestSequenceFinder.java:158)
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>> edu.stanford.nlp.sequences.ExactBestSequenceFinder.bestSequence(ExactBestSequenceFinder.java:98)
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>> edu.stanford.nlp.tagger.maxent.TestSentence.runTagInference(TestSentence.java:277)
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>> edu.stanford.nlp.tagger.maxent.TestSentence.testTagInference(TestSentence.java:258)
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>> edu.stanford.nlp.tagger.maxent.TestSentence.tagSentence(TestSentence.java:110)
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>> edu.stanford.nlp.tagger.maxent.MaxentTagger.tagSentence(MaxentTagger.java:825)
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>> edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.java:1319)
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>> edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.java:1225)
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>> edu.stanford.nlp.tagger.maxent.MaxentTagger.runTagger(MaxentTagger.java:1183)
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>> edu.stanford.nlp.tagger.maxent.MaxentTagger.main(MaxentTagger.java:1358)
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>> --------------------------------------------------------------------------------------------
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Now this only occurs when I use openClassTags, if I
>>>>>>>>>>>>>>>>>>>> comment
>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> out
>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>> my
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> .prop when training and use the closedClassTags instead,
>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>> works.
>>>>>>>>>>>>>>>>>>>> But
>>>>>>>>>>>>>>>>>>>> then many words are tagged incorrectly, so not sure if
>>>>>>>>>>>>>>>>>>>> getting
>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> work
>>>>>>>>>>>>>>>>>>>> with openClassTags would make it better.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Anyway...
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> These are my tag settings which I switch between:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> #openClassTags = "N PN ADJ IMPN T V"
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> #closedClassTags = "PRON DEM REL P EMPH IMPV PRP CONJ SUB
>>>>>>>>>>>>>>>>>>>> ACC
>>>>>>>>>>>>>>>>>>>> AMD
>>>>>>>>>>>>>>>>>>>> ANS
>>>>>>>>>>>>>>>>>>>> AVR
>>>>>>>>>>>>>>>>>>>> CAUS CERT CIRC COM COND EQ EXH EXL EXP FUT INC INT INTG
>>>>>>>>>>>>>>>>>>>> NEG
>>>>>>>>>>>>>>>>>>>> PREV
>>>>>>>>>>>>>>>>>>>> PRO
>>>>>>>>>>>>>>>>>>>> REM
>>>>>>>>>>>>>>>>>>>> RES RET RSLT SUP SUR VOC INL"
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> (from http://corpus.quran.com/documentation/tagset.jsp )
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> And the following command is used when testing:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> java -classpath stanford-postagger.jar
>>>>>>>>>>>>>>>>>>>> edu.stanford.nlp.tagger.maxent.MaxentTagger -model
>>>>>>>>>>>>>>>>>>>> qmodelOpen.model
>>>>>>>>>>>>>>>>>>>> -textFile quran-first50.txt
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Thank you !
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Best Regards
>>>>>>>>>>>>>>>>>>>> H
>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>> java-nlp-user mailing list
>>>>>>>>>>>>>>>>>>>> java-nlp-user at lists.stanford.edu
>>>>>>>>>>>>>>>>>>>> 
>> https://mailman.stanford.edu/mailman/listinfo/java-nlp-user
> _______________________________________________
> java-nlp-user mailing list
> java-nlp-user at lists.stanford.edu
> https://mailman.stanford.edu/mailman/listinfo/java-nlp-user




More information about the java-nlp-user mailing list