Search Mailing List Archives

Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[parser-user] Stanford Parser version 1.6.5 released

Christopher Manning manning at
Wed Jan 12 14:07:27 PST 2011

On Dec 1, 2010, at 4:09 PM, Maryam Hasan wrote:

> Dear Chris,
> It's great!
> Can you please explain more about the new option that allows the copula to be treated as the head (with some examples).
> How I can set this option in my java program?
> thanks a lot
> --

Hi Maryam,

From the command-line:
 - At present you can't directly get this from just running the parser.  But you could do it in 2 steps by running LexicalizedParser with -output penn (or -outputFormat oneLine) and then using EnglishGrammaticalStructure with the option -makeCopulaHead, and whatever other options you want, as in the Stanford Dependencies manual.

In a program:
 - You need to set things up with a version of SemanticHeadFinder with constructor parameter "false" to make the copula a head.  The below version of ParserDemo does this.  I changed (only) the 5th line of main()....


import java.util.*;

import edu.stanford.nlp.ling.HasWord;
import edu.stanford.nlp.process.DocumentPreprocessor;
import edu.stanford.nlp.process.Tokenizer;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.parser.lexparser.LexicalizedParser;
import edu.stanford.nlp.util.Filters;

class ParserDemo5 {

  /** Usage: ParserDemo5 [[grammar] textFile] */
  public static void main(String[] args) throws IOException {
    String grammar = args.length > 0 ? args[0] : "englishPCFG.ser.gz";
    LexicalizedParser lp = new LexicalizedParser(grammar);
    lp.setOptionFlags("-maxLength", "80", "-retainTmpSubcategories");
    PennTreebankLanguagePack tlp = new PennTreebankLanguagePack();
    GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory(Filters.<String>acceptFilter(), new SemanticHeadFinder(false));
    DocumentPreprocessor dp = new DocumentPreprocessor();

    Iterable<List<? extends HasWord>> sentences;
    if (args.length > 1) {
      sentences = dp.getSentencesFromText(args[1]);
    } else {
      String sent2 = "This is a slightly longer and more complex sentence requiring tokenization.";
      Tokenizer<? extends HasWord> toke = tlp.getTokenizerFactory().getTokenizer(new StringReader(sent2));
      List<? extends HasWord> sentence2 = toke.tokenize();
      List<List<? extends HasWord>> tmp = new ArrayList<List<? extends HasWord>>();
      sentences = tmp;

    for (List<? extends HasWord> sentence : sentences) {
      Tree parse = lp.apply(sentence);
      GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
      EnglishGrammaticalStructure.printDependencies(gs, gs.typedDependencies(false), parse, true, false);


More information about the parser-user mailing list