Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[java-nlp-user] question about parsing sentences

Gene Golovchinsky gene at fxpal.com
Sat Apr 30 17:01:40 PDT 2011


Thanks for the info. I guess I'll leave it to someone else to address 
this, as it's a bit beyond my NLP chops. For the record, that sentence 
came from an LA Times article that is part of the TREC corpus.

Thanks again,

Gene

On 4/30/2011 4:57 PM, Christopher Manning wrote:
> Yes, there's clearly a possible research project here of writing a high quality sentence splitter, but we don't at present have one.  We have a deterministic sentence splitter which works pretty well on formal written text but won't do the job for embedded multi-sentence quotations or informal texting.
>
> Chris.
>
>
> On Apr 29, 2011, at 4:50 PM, John Bauer wrote:
>
>> There's no such logic in the pipeline, but you could add it yourself to the ssplit annotator if you want,
>>
>> John
>>
>> On Apr 29, 2011 4:44 PM, "Gene Golovchinsky"<gene at fxpal.com>  wrote:
>>> Yeah, I understand that it's a tricky parsing situation. Is there a way
>>> to set a minimum sentence length, or at least a hint in that direction?
>>>
>>> Thanks again for your help,
>>>
>>> Gene
>>>
>>> On 4/29/2011 4:32 PM, John Bauer wrote:
>>>> What you can do is create the pipeline with the property
>>>> "ssplit.isOneSentence = true" and process one sentence at a time.
>>>>
>>>> You can see how this would be almost impossible to change the way you
>>>> want, because then you'd potentially be trying to process extremely
>>>> long "sentences" as one object just because they were all in one
>>>> quoted passage.
>>>>
>>>> John
>>>>
>>>> On Fri, Apr 29, 2011 at 3:39 PM, Gene Golovchinsky<gene at fxpal.com>  wrote:
>>>>> I've included the stanford-corenlp-2011-04-22.jar and xom.jar in my project.
>>>>>
>>>>> I am using the StanfordCoreNLP pipeline which I initialize like this:
>>>>> Properties props = new Properties();
>>>>> props.put("annotators", "tokenize, ssplit");
>>>>> this.pipeline = new StanfordCoreNLP(props);
>>>>>
>>>>> and use like this:
>>>>>
>>>>> Annotation document = new Annotation(paragraph);
>>>>> this.pipeline.annotate(document);
>>>>> List<CoreMap>  sentences =
>>>>> document.get(SentencesAnnotation.class);
>>>>>
>>>>> for(CoreMap s: sentences) {
>>>>> String sentence = s.get(TextAnnotation.class);
>>>>> }
>>>>>
>>>>> It hums along nicely, but isn't as accurate as I'd like :-)
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Gene
>>>>>
>>>>> On 4/29/2011 2:47 PM, John Bauer wrote:
>>>>>> What tool are you using?
>>>>>>
>>>>>> John
>>>>>>
>>>>>> On Fri, Apr 29, 2011 at 2:35 PM, Gene Golovchinsky<gene at fxpal.com>  wrote:
>>>>>>> I am trying to process some TREC data, dividing each article into
>>>>>>> sentences.
>>>>>>> At first, I tried using the standard BreakIterator class which is part of
>>>>>>> Java, but it choked on the following sentence:
>>>>>>>
>>>>>>> Athol Fugard's choice of the La Jolla Playhouse for the first West Coast
>>>>>>> presentation of his latest play, "My Children! My Africa!," may come as
>>>>>>> something of a surprise to the theater community.
>>>>>>>
>>>>>>> It split this into three sentences, using the exclamation marks as
>>>>>>> end-of-sentence delimiters.
>>>>>>>
>>>>>>> I then tried performing the same parse with the ssplit annotator, and it
>>>>>>> produced the identical results.
>>>>>>>
>>>>>>> Is there a way to configure this parser to recognize quoted phrases
>>>>>>> within a
>>>>>>> sentence?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Gene
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> java-nlp-user mailing list
>>>>>>> java-nlp-user at lists.stanford.edu
>>>>>>> https://mailman.stanford.edu/mailman/listinfo/java-nlp-user
>>>>>>>
>>>>>>>
>> _______________________________________________
>> java-nlp-user mailing list
>> java-nlp-user at lists.stanford.edu
>> https://mailman.stanford.edu/mailman/listinfo/java-nlp-user




More information about the java-nlp-user mailing list