Search Mailing List Archives
[java-nlp-user] CoreNLP vs Stanford Parser
horatio at gmail.com
Fri Aug 19 18:18:08 PDT 2011
On Fri, Aug 19, 2011 at 2:26 PM, Sandro <ord_nas at live.com> wrote:
> Hello all,
> I have a few questions about using the Stanford CoreNLP vs the Stanford
> 1. I have noticed differences between the parse trees that the CoreNLP
> generates and that the Online Parser generates. Is this just because they
> are using different parser models? Which model does the Online Parser use,
> and how could I programmatically configure the CoreNLP to use the same
> model? If I have only downloaded the CoreNLP, would I have to download
> another model?
They use the same models, but CoreNLP first POS tags the data, then
parses it. This leads to different results.
> 2. What is the difference between using the POSTaggerAnnotator in the
> CoreNLP and using the POS tags that the Parser generates? I have also
> noticed differences in output on this front.
The tagger is generally better at tagging, which may or may not
produce better parse trees. Also, the parser was trained with more
data than the tagger, so there are a few situations that the parser
has better training data than the tagger, although in general the
tagger is better at POS tags than the parser is.
> 3. Finally, I know that I can configure the parser to accept my own POS
> tags. How would I accomplish this using the CoreNLP (from within a Java
> program)? Would I have to add a custom annotator, and if so, how exactly is
> this done? Could someone elaborate on the explanation at
> http://nlp.stanford.edu/software/corenlp.shtml#newannotators , or provide
> sample code?
Programmatically, you would need to turn off the tagger and build the
StanfordCoreNLP with "enforceRequirements" set to false. It also
depends on if you already have tokenized & sentence split text. If
so, you can skip those steps, too, and just put lists of CoreLabels in
the Annotation object. You would need to call setTag on the
CoreLabels with the tags that you want to use.
If you need to have CoreNLP first tokenize & split sentences, you
probably need to make a new annotator. Which part of the explanation
which means implement annotate(Annotation foo). annotate(Annotation)
would need to get the SentencesAnnotation, and then for each sentence,
get the TokensAnnotation. For each CoreLabel in the TokensAnnotation,
call setTag() to give it the tag you want to use.
More information about the java-nlp-user