Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[java-nlp-user] NER and UTF-8

John Bauer horatio at gmail.com
Fri May 13 08:36:35 PDT 2011


If it worked on your own laptop (including the database calls?), then it
sounds like the problem is most likely a configuration difference on the
server.  For example, perhaps it was not set up with utf-8 as the default
encoding.  I would investigate possible differences such as that, since it
sounds like you already have the NER part figured out,

John
On May 13, 2011 1:15 AM, "Gerber Daniel" <dgerber at informatik.uni-leipzig.de>
wrote:
> Hi,
> I ran into a problem regarding UTF-8. I'm querying my Lucene index and try
to NER-tag the results. This works perfectly on my personal laptop (current
MacBook Pro), but if I run the program on the server I get this message for
almost every tagged sentence:
>
> Mar 9, 2011 1:22:06 PM edu.stanford.nlp.process.PTBLexer next
> WARNING: Untokenizable: � (U+FFFD, decimal: 65533)
>
> I know that this problem has been discussed before, but those answers
didn't help me very much. :(
>
> Kind regards,
> Daniel
>
>
> My configurations:
>
> Linux 2.6.32-21 / x86_64 / GNU/Linux / Ubuntu 10.04 LTS
>
> Mac OS X 10.6.7
> _______________________________________________
> java-nlp-user mailing list
> java-nlp-user at lists.stanford.edu
> https://mailman.stanford.edu/mailman/listinfo/java-nlp-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.stanford.edu/pipermail/java-nlp-user/attachments/20110513/cc3b3cef/attachment.html>


More information about the java-nlp-user mailing list