Search Mailing List Archives


Limit search to: Subject & Body Subject Author
Sort by: Reverse Sort
Limit to: All This Week Last Week This Month Last Month
Select Date Range     through    

[parser-user] "words" are [scientifically] baseless things!

dinar qurbanov qdinar at gmail.com
Thu Jul 20 07:14:05 PDT 2017


"words" are [scientifically] baseless things!

where from they have come? just from spaces between them. who and why
decided to put spaces there? i think they had not good proofs, else we
would know that proofs. i know only theory about lexemes to put in
dictionaries, and their word forms.

also "words" in grammar come from old grammars written in old times
for latin, arabic, etc. but it is not authoritative source. you should
know how much errors were in old sciences of chemistry, medicine,
astronomy.

as i know lemmas with tags are used in stanford parser and as i know
there is no way to show whether some another word is used with lemma
only, or with lemma with some suffix(es)...

but i think real atoms of syntax are morphemes and it is an idea
written by several authors in several books.

also i think that syntax and morpholgy should be redivided and
renamed. one of them ("syntax"?) should include all trees in both of
syntax and morphology. (similar idea is also suggested in a book). and
part of morphology should go to a science named like "surface
decoration of syntax trees".

difference is in possible different priority/order of using morphemes.
in many cases resulting meaning is similar, because in that cases
a(bc) = (ab)c ; it can be written "a bc" but it can have meaning (ab)c
and there can be not much practical problem if a language analyser
program uses it as a(bc), since a(bc) = (ab)c. for example "a" can be
an adverb, "b" - a verb and "c" - gerund suffix. for example, "frankly
speaking".

i can give an example when this has practical differences. in turkic
languages verb negation suffix is written sticked. usually adverb is
used with verb stem (ie to part without negation suffix) and negation
is used to the phrase consisting of verb and adverb. for example:
"кызу бармады" - "qozu barmado" in tatar is "did not go fast" and has
structure "{{кызу бар}ма}ды" - "did not {go fast}". but you cannot use
this as a rule, similarly written sequence of morphemes can has also
another structure: "бөтенләй эшләмәде" - "botonlay islamadi" means
"(he/she/it/they) has not worked at all" and it has structure
"{бөтенләй} {эшләмәде}" - "{did not work} {at all}" , or "{{бөтенләй}
{эшләмә}}де" - "did {{not work} {at all}}". ( alternatively it could
have structure "{{{бөтенләй эшлә}мә}де}" and meaning "did not make
wholly" - "did not {make wholly}". )

to translate this correctly from tatar to english you should better
use morphemes as atoms, as tree nodes instead of words, because you
should find correct tree structure before you translate, and you
should be able to set morphemes at correct places of tree.

probably there are also other examples with other suffixes. there is
also imperative mood suffix in tatar language, with which i expect to
find similar example, and i do not completely deny such problem with
other suffixes like negation and gerund suffixes when translating from
some language to some language.


More information about the parser-user mailing list