Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[java-nlp-user] Tagging imperatives

John Bauer horatio at gmail.com
Fri Apr 8 02:32:14 PDT 2011


In case this never got followed up on...  generally this will be a
training data issue.  Probably "report" was more common as a noun in
the training data, and "update" had more weight as a verb.

Fixing this almost definitely involves using different / more training data.

John

On Mon, May 24, 2010 at 3:14 AM, Jan-Christian Krause
<Jan-Christian.Krause at akra.de> wrote:
> Hi Chris,
>
> for my tags I used the model "bidirectional-wsj-0-18.tagger". With the model
> "left3words-wsj-0-18.tagger" I get better results, but many sentences like
> this remain:
>
> report_NN subscribed_VBD contacts_NNS ._.
> run_NN ._.
>
> I expected the tokens "run" and "report" to be tagged as VB. My exact
> command to start the tagger is:
>
> [COPY]
> java -mx1g -cp ./stanfordTagger-1.6.0.jar
> edu.stanford.nlp.tagger.maxent.MaxentTagger -model
> ./tagger/left3words-wsj-0-18.tagger -textFile imperatives.txt
> [/COPY]
>
> I do not understand the result of the tagger, because the verb in sentences
> like
>
> [COPY]
> update_VB subscribed_VBN contact_NN attribute_NN ._.
> [/COPY]
>
> is tagged correctly. Could you give me a hint to this? Is there a
> possibility to optimize the result?
>
> Greetings from Hamburg, Germany
>
> Jan Christian
>
>
> -----Ursprüngliche Nachricht-----
> Von: Christopher Manning [mailto:manning at stanford.edu]
> Gesendet: Mi 23.12.2009 07:37
> An: Jan-Christian Krause
> Cc: java-nlp-user at lists.stanford.edu
> Betreff: Re: [java-nlp-user] Tagging imperatives
>
> On Dec 10, 2009, at 2:03 AM, Jan-Christian Krause wrote:
>
>> Dear all,
>>
>> I use the Stanford Tagger to tag a lot of very short imperative
>> sentences like "Invite buddy" or "Ignore buddy" (all sentences are in
>> english). It seems to be a problem for the tagger to identify verbs in
>> those sentences (with all three english models). The output was like
>> this
>>
>> [COPY]
>> invite \NNP buddy \NNP
>> ignore \VB buddy \NNP
>> withdraw \VB request \NNP
>> delete \NNB buddy \NNP
>> accept \NNB buddy \NNP
>> deny \NNP buddy \NNP
>> [/COPY]
>>
>> I expected something like this:
>> [COPY]
>> invite \VBP buddy \NNP
>> ignore \VBP buddy \NNP
>> withdraw \VBP request \NNP
>> delete \VBP buddy \NNP
>> accept \VBP buddy \NNP
>> deny \VBP buddy \NNP
>> [/COPY]
>
> Dear Jan-Christian,
>
> According to the Penn Treebank tagset, the correct output should actually be
> with tag VB, since imperatives don't count as a finite verb form (cf. "Be
> good!").
>
> But for the main issue: I'm not sure what you're doing to get this output.
> While the tagger is trained on newswire style text and can be expected to
> perform less well on lowercased text on topics like this, I just don't get
> this behavior.  For this input file, I get just what I would expect:
>
> [manning at jerome wsj3t0-18-bidirectional]$ cat imperatives.txt
> invite buddy.
> withdraw request.
> deny buddy.
> [manning at jerome wsj3t0-18-bidirectional]$ java -mx1g -cp
> /u/nlp/distrib/stanford-postagger-2008-09-28/stanford-postagger.jar
> edu.stanford.nlp.tagger.maxent.MaxentTagger -model
> /u/nlp/distrib/stanford-postagger-2008-09-28/models/left3words-wsj-0-18.tagger
> -textFile imperatives.txt
> Loading default properties from trained tagger
> /u/nlp/distrib/stanford-postagger-2008-09-28/models/left3words-wsj-0-18.tagger
> Reading POS tagger from
> /u/nlp/distrib/stanford-postagger-2008-09-28/models/left3words-wsj-0-18.tagger
> ... done [2.9 sec].
> invite_VB buddy_NN ._.
> withdraw_VB request_NN ._.
> deny_VB buddy_NN ._.
> Tagged 9 words at 78.95 words per second.
>
> If I just put the 6 words in a file, on 3 lines then note that the sequence
> is being regarded as a single sentence, and so things are getting rather
> irregular, but it's still not so bad, and not what you give:
>
> [manning at jerome wsj3t0-18-bidirectional]$ cat
> imperatives2.txt                       invite buddy withdraw request deny
> buddy[manning at jerome wsj3t0-18-bidirectional]$ java -mx1g -cp
> /u/nlp/distrib/stanford-postagger-2008-09-28/stanford-postagger.jar
> edu.stanford.nlp.tagger.maxent.MaxentTagger -model
> /u/nlp/distrib/stanford-postagger-2008-09-28/models/left3words-wsj-0-18.tagger
> -textFile imperatives2.txt
> Loading default properties from trained tagger
> /u/nlp/distrib/stanford-postagger-2008-09-28/models/left3words-wsj-0-18.tagger
> Reading POS tagger from
> /u/nlp/distrib/stanford-postagger-2008-09-28/models/left3words-wsj-0-18.tagger
> ... done [2.8 sec].
> invite_VB buddy_NN withdraw_VB request_NN deny_VBP buddy_NN
> Tagged 6 words at 41.10 words per second.
>
> If you still think there's a problem here, could you send the exact
> command/input you used?
>
> It is certainly the case that there are very few imperatives (of any sort,
> but especially this sort) in the POS tagger training data, and training a
> model with extra appropriate data could well improve performance.
>
> Best,
>
> Chris.
>
>
>
>>
>> Could you give me a tip how to improve the output of the tagger? I
>> thought of tagging some samples and training an own model. Where can I
>> find more information on how to create an own model?
>>
>> Greetings from Hamburg, Germany
>>
>> Jan Christian
>> _______________________________________________
>> java-nlp-user mailing list
>> java-nlp-user at lists.stanford.edu
>> https://mailman.stanford.edu/mailman/listinfo/java-nlp-user
>
>
>
>
>
> _______________________________________________
> java-nlp-user mailing list
> java-nlp-user at lists.stanford.edu
> https://mailman.stanford.edu/mailman/listinfo/java-nlp-user
>
>



More information about the java-nlp-user mailing list