Search Mailing List Archives
[bioontology-support] Annotator help request
alf at hubmed.org
Tue Aug 17 03:41:10 PDT 2010
I've been testing the NCBO Annotator web service, and have been
finding that the offsets it returns for annotation positions are
incorrect: it seems to be returning offsets in characters rather than
bytes, so the "from" and "to" positions are wrong.
This can be seen at http://bioportal.bioontology.org/annotator when
annotating using ontology 42878 (ChEBI) and the following text:
"an acid selected from the group consisting of hydrogen bromide,
hydrogen chloride, sulfuric acid, phosphoric acid, nitric acid, formic
acid, acetic acid, propionic acid, succinic acid, glycolic acid,
lactic acid, malic acid, tartaric acid, citric acid, ascorbic acid,
α-ketoglutaric acid, glutamic acid, aspartic acid, maleic acid,
hydroxymaleic acid, pyruvic acid, phenylacetic acid, benzoic acid,
p-aminobenzoic acid, anthranilic acid, p-hydroxybenzoic acid,
salicyclic acid, hydroxyethanesulfonic acid, ethylenesulfonic acid,
halobenzenesulfonic acid, toluenesulfonic acid, naphthalenesulfonic
acid, methanesulfonic acid and sulfanilic acid"
(after the α of α-ketoglutaric acid, annotations are offset to the
right because the α character is two bytes long).
If this could be fixed, that would be great - at the moment the
annotator's not really usable for anything containing Unicode
characters (which are very common in scientific documents).
More information about the bioontology-support