Search Mailing List Archives

Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[bioontology-support] Annotator help request

Alf Eaton alf at
Tue Aug 17 03:41:10 PDT 2010

I've been testing the NCBO Annotator web service, and have been
finding that the offsets it returns for annotation positions are
incorrect: it seems to be returning offsets in characters rather than
bytes, so the "from" and "to" positions are wrong.

This can be seen at when
annotating using ontology 42878 (ChEBI) and the following text:

"an acid selected from the group consisting of hydrogen bromide,
hydrogen chloride, sulfuric acid, phosphoric acid, nitric acid, formic
acid, acetic acid, propionic acid, succinic acid, glycolic acid,
lactic acid, malic acid, tartaric acid, citric acid, ascorbic acid,
α-ketoglutaric acid, glutamic acid, aspartic acid, maleic acid,
hydroxymaleic acid, pyruvic acid, phenylacetic acid, benzoic acid,
p-aminobenzoic acid, anthranilic acid, p-hydroxybenzoic acid,
salicyclic acid, hydroxyethanesulfonic acid, ethylenesulfonic acid,
halobenzenesulfonic acid, toluenesulfonic acid, naphthalenesulfonic
acid, methanesulfonic acid and sulfanilic acid"

(after the α of α-ketoglutaric acid, annotations are offset to the
right because the α character is two bytes long).

If this could be fixed, that would be great - at the moment the
annotator's not really usable for anything containing Unicode
characters (which are very common in scientific documents).


More information about the bioontology-support mailing list