Search Mailing List Archives

Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[java-nlp-user] Tagging Arabic

John Bauer horatio at
Fri Apr 29 14:45:29 PDT 2011

> I thought defining a couple of Extractors could handle unknown words to some extent, no?

Yes, the tagger uses whatever features it has available to guess at
unknown words.  Unknown words can only go into open classes of tags.

On Fri, Apr 29, 2011 at 10:22 AM, Hajder <hajderr at> wrote:
> I do not have access to any parts of the ATB, I've retrained the tagger with
> the Quran corpus since that's the only one available for me.
> I could try your idea of adding extractors. Just to get started, I did a
> test and copied the CompanyNameDetector extractor in
> and
> created my own test ExtractorArabVerb class, in there I had a Set with only
> one verb (a verb that was being tagged as N and I hoped to tag as a V). In
> extract I had like
> String extract(History h, PairsHolder pH) {
>  if(isArabVerb(pH.getWord(0))) {
>          return "1";
>  }
> But in the end of the extractor, what should the serialversionUID be set to?

Doesn't matter.  This is a number that you need to change when
backwards compatibility with old serialized objects is lost.

> The extract-method returns a "1" or "0" but how is that mapped to a tag? In
> my case with a simple verb extractor, how would that let all words in that
> Set be tagged as verbs when encountered?

It's not guaranteed, but the system will hopefully learn that if this
feature is set to "1", it means the word is a verb.  Obviously it has
to be set to a verb in the training data.

Don't forget to include the extractor in the arch.  You might want to
debug it (either via jdb or with print statements) to make sure it is
being created the first few times you try it.


More information about the java-nlp-user mailing list