Search Mailing List Archives
[java-nlp-user] Replacing named entities in a string before parsing.
myahya at mpi-inf.mpg.de
Fri Apr 1 07:40:00 PDT 2011
Is this consistent across parsing and POS taggining (MaxentTagger)? I
mainly do tagging with MaxentTagger, and supply the tagged list of
words to the parser. Can you please point me to the relevant part of
On Sat, Feb 5, 2011 at 01:24, Christopher Manning <manning at stanford.edu> wrote:
> But you could also replace the the whole entity with a String and parse. It basically works. It will be processed by the unknown word handling, so for English, if it's a proper name, you're best off having it start with a capital and for the rest to be lowercase: Xxxxxxx.
> On Feb 4, 2011, at 4:10 PM, John Bauer wrote:
>> There is a really hacky interface that accomplishes something along
>> these lines. Look in Test.java in the lexparser package. There is an
>> object, List<Constraint> constraints, which you can set to force a
>> particular span to have a particular structure. For example, if you
>> know that words 3-7 are a noun, you can set a constraint with those as
>> the endpoints (possible in this example you would need start = 3, end
>> = 8). Then you can set "Pattern state" to be a noun phrase, something
>> such as "NP.*" for example.
>> This requires that you already have tokenized text, which you probably
>> do if you're doing NER, and it requires that you know where the
>> positions you care about are. Also, it's a static variable that you
>> need to set each time through the parser, which is kind of ugly but
>> there it is.
>> On Fri, Feb 4, 2011 at 5:58 AM, Mohamed Yahya <myahya at mpi-inf.mpg.de> wrote:
>>> I want to generate the parse trees and dependency graphs for some
>>> sentences. Before that, I would like to preprocess the sentence string
>>> so that named entities (obtained through some third party named entity
>>> recognizer) are replaced by some placeholder to hopefully get a better
>>> For example, I have the following sentence:
>>> Fly Me to the Moon was composed by Bart Howard.
>>> With named entity tagging, it becomes:
>>> <NE>Fly Me to the Moon</NE> was composed by <NE>Bart Howard</NE>.
>>> And I would like to send to the parser a sentence like:
>>> NE1 was composed by NE2.
>>> The question is, what strings can NE1 and NE2 be, for example as many
>>> X's as in the NE, so
>>> XXXXXXXXXXXXXXXXXX was composed by XXXXXXXXXXX.
>>> Is there a systematic way to do this.
>>> java-nlp-user mailing list
>>> java-nlp-user at lists.stanford.edu
>> java-nlp-user mailing list
>> java-nlp-user at lists.stanford.edu
More information about the java-nlp-user