Search Mailing List Archives

Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[java-nlp-user] NER and UTF-8

John Bauer horatio at
Fri May 13 08:36:35 PDT 2011

If it worked on your own laptop (including the database calls?), then it
sounds like the problem is most likely a configuration difference on the
server.  For example, perhaps it was not set up with utf-8 as the default
encoding.  I would investigate possible differences such as that, since it
sounds like you already have the NER part figured out,

On May 13, 2011 1:15 AM, "Gerber Daniel" <dgerber at>
> Hi,
> I ran into a problem regarding UTF-8. I'm querying my Lucene index and try
to NER-tag the results. This works perfectly on my personal laptop (current
MacBook Pro), but if I run the program on the server I get this message for
almost every tagged sentence:
> Mar 9, 2011 1:22:06 PM edu.stanford.nlp.process.PTBLexer next
> WARNING: Untokenizable: � (U+FFFD, decimal: 65533)
> I know that this problem has been discussed before, but those answers
didn't help me very much. :(
> Kind regards,
> Daniel
> My configurations:
> Linux 2.6.32-21 / x86_64 / GNU/Linux / Ubuntu 10.04 LTS
> Mac OS X 10.6.7
> _______________________________________________
> java-nlp-user mailing list
> java-nlp-user at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the java-nlp-user mailing list