Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[java-nlp-user] NER and UTF-8

Christopher Manning manning at stanford.edu
Fri May 13 08:47:51 PDT 2011


The standard reason why this happens is: The default character encoding of the tagger is utf-8 (Unicode), but your document is in some other encoding such as an 8 bit encoding like iso-8859-1 or Windows cp1252. You can convert the document or specify an input document encoding with the -encoding flag.

Chris.


On May 13, 2011, at 8:36 AM, John Bauer wrote:

> If it worked on your own laptop (including the database calls?), then it sounds like the problem is most likely a configuration difference on the server.  For example, perhaps it was not set up with utf-8 as the default encoding.  I would investigate possible differences such as that, since it sounds like you already have the NER part figured out,
> 
> John
> 
> On May 13, 2011 1:15 AM, "Gerber Daniel" <dgerber at informatik.uni-leipzig.de> wrote:
> > Hi,
> > I ran into a problem regarding UTF-8. I'm querying my Lucene index and try to NER-tag the results. This works perfectly on my personal laptop (current MacBook Pro), but if I run the program on the server I get this message for almost every tagged sentence:
> > 
> > Mar 9, 2011 1:22:06 PM edu.stanford.nlp.process.PTBLexer next
> > WARNING: Untokenizable: � (U+FFFD, decimal: 65533)
> > 
> > I know that this problem has been discussed before, but those answers didn't help me very much. :(
> > 
> > Kind regards,
> > Daniel
> > 
> > 
> > My configurations:
> > 
> > Linux 2.6.32-21 / x86_64 / GNU/Linux / Ubuntu 10.04 LTS
> > 
> > Mac OS X 10.6.7 
> > _______________________________________________
> > java-nlp-user mailing list
> > java-nlp-user at lists.stanford.edu
> > https://mailman.stanford.edu/mailman/listinfo/java-nlp-user
> _______________________________________________
> java-nlp-user mailing list
> java-nlp-user at lists.stanford.edu
> https://mailman.stanford.edu/mailman/listinfo/java-nlp-user




More information about the java-nlp-user mailing list