9.Computational Lexicography

Quiz 1

1.What is a KWIc concordance?

A KWIC concordance is a corpus based dictionary where each word is a text corpus(each word is paired with the words it is in relation to,in a context given)

2.Which are the two main components of lexicon
construction based on empirical data?

Thw two main components of lexicon construction based on empirical data are the two levels we look into to make the distinction:corpus(with primary and secondary data)and the lexicon.

3.Which layers of abstraction are involved in corpus acquisition?

There are two layers involved in corpus acquisition:primary(audio/video recording) and secondary data(transcription,annotation,metadata).

4.Which layers of abstraction are involved in lexicon construction?Describe them.

The layers of abstraction are layers of corpus lexicon with word lists.At a first level there is a lexicon matrix with data categories,but no generalisation,then a lexicon with selected generalisations.Then a lexicon is created ,with hierarchies.

5.Which layer do standard dictionary types tipically belong to?

To the third layer:a lexicon with selected generalisation-semasiological or onomasiological dictionaries.

Quiz 2

1.What are the 6 main steps in KWIC concordance construction?Explain each of these steps.

a)Corpus creation

create a written text (or transform a spoken message into a written text)


first is the deletion of the punctuation marks,followed by the division of the text into units(words).

c)Keywordlist extraction

create a list of the words that occur in the text in alfabetically order and delete the duplicates.

d)Context collation

keywords are put in context (left and right context)

e)Keyword search

search for a keyword with its left end right context

f)Output formatting

the output is a list of keywords alfabetically arranged with left and right context.

Quiz 3

1.In which programming languages could the concordance software be implemented?

2.What are the problems with the demonstration software which need to be removed in a later realistic project?


HOMEWORK-see quizzes

a word consisting of the first letters of a phrase,having its meaning.eg.NATO

affix=a morpheme( a bound one)which can be a prefix,suffix,circumfix,infix,circumfix,superfix and which is added to a stem.

allomorph=different forms of the same(bound )morpheme

antonym= a word which has different form and opposite meaning to another one.Thus,they are antonyms.

bbb=bahuvrihi(exocentric compound)

is a compound of two other stems,each having its own meaning,but put together ,the new word(the compound)has a totally different meaning from theirs.It has the formula:

co-hyponym=equivalenet subordinate terms

complex word=is a ford that consists of at least one stem and that can be a compound ,a derivation(in which has one stem and inflexion),a blend or an abbreviation

computational linguistics=a descipline that involves both linguistics and informatics,and is concerned with dealing with huge amounts of words,developing precise models of grammar and lexic which can be processed automaticaly.

Concordance=A list of all words in a corpus held on a computer database showing every example of a particular word in a corpus.

Corpus linguistics= The branch of linguistics which analyses large corpora (bodies, collections) of written texts or recordings of speech.

Database=An organized set of data that is stored in a computer and can be looked at and used
in various ways.

Dvandva (bicentric compounds)=The meaning of these compounds is calculated according to the formula an AB is an
A and B

Etymology=The study of the origins, history, and changing meanings of words.

Grapheme=(linguistics) The smallest unit that has meaning in a writing system.(àphoneme)

Homonym=Same pronunciation and same spelling for different things; e.g. face (front)- face(head)

Hyperonym=Superordinate term (e.g. an ape is an animal, a poodle is a dog)

IPA (abbr.)=1. International Phonetic Alphabet (a notation system that is used to show the pronunciation of words in any language).
2. International Phonetic Association (founded in 1886)

Lemma=The headword in a dictionary, thesaurus, or encyclopedia, the form looked up.(a degrammaticalised word)

Lexeme=The smallest meaningful objects having a lexical meaning.

Lexicography=The theory and practice of writing dictionaries

Macrostructure=The structure according to which categorisation and search functions can be defined. In general, it is possible to differentiate between two different models. Semasiological macrostructure is form- based, assigning meaning to a form. Onomasiological macrostructure is not form- based, but concept/meaning oriented.

Meronymy=Semantic relation in which something is composed of different parts (e.g. a wheel is part of a car).

Metadata=the data giving information about the production of the product(eg.location and year,publisher and code for a dictionary )

Microstructure=The inherent structure of each lexicon entry; denotes the smallest surface of a lexicon.

Morpheme= The smallest meaningful units,with lexical or grammatical meaning. A word can be comprised of one or more morphemes.

Orthography =The standard system used to write a language is called orthography (from Greek ortho = correct, graphy = writing)

Ostensive definition =Definition by showing (a picture or a life model).

Phoneme=Smallest meaning distinguishing unit(s) in a language. Speech sounds that make two words distinguishable.

Polysemy=One word that has two (or more) (closely) related meanings.

Recursive definition=Defines a word in terms of itself. For defining infinite sets of items; start with root of a word and add something to it ?

Spelling-sound rule = Phonotactics: Possible combinations of phonemes into syllables and words.

Stem (grammar)= The main part of a word that stays the same when endings are added to it: ‘Writ’ is the stem of the forms ‘writes’, ‘writing’ and ‘written’.
A stem is the root or roots of a word, together, to which with any derivational affixes
inflectional affixes are added.

Suffix (grammar) =A letter or group of letters added to the end of a word to make another word,
such as -ly in quickly or -ness in sadness—compare AFFIX, PREFIX.
Synonym =Words which mean the same (or similar) thing and can be exchanged for each other.
(e.g. start/begin, near/close).

Syntagmatic relations=Relations of compositionality which construct larger units out of smaller units, e.g. syllables out of phonemes; expressed by rules and networks.
Tatpurusa =The meaning of these compounds is (or endocentric compounds) calculated according to the formula an AB is a B.

Taxonomy =The science of classification according to a pre-determined system, with the resulting
catalogue used to provide a conceptual framework for discussion, analysis, or information retrieval. (From the Greek word taxis (arrangement) and nomos (law). See
also Ontology.
Text technology =The interdisciplinary field which involves both text linguistics and computer science,
and which is in some respects a specialisation of computational linguistics in the area of
computation with text data (e.g. the world-wide web, information retrieval in libraries)

Zero-derivation=Special phenomenon in English. It means that a word can change its wordclass without the addition of other morphemes (or by adding an empty morpheme). (e.g. I hope / I have

