About – LEMLAT 3.0

LEMLAT 3.0 is a new and updated version of the Latin Lemmatiser and Morphological Analyser LEMLAT, curated by the Istituto di Linguistica Computazionale “A. Zampolli”, CNR, Pisa Italy (CNR-ILC) and the Centro Interdisciplinare di Ricerche per la Computerizzazione dei Segni dell’Espressione, Università Cattolica del Sacro Cuore, Milan, Italy (UCSC-CIRCSE).

LEMLAT 3.0 is a derivative work from LEMLAT 2.0, which in its turn reimplemented and extended the original version of LEMLAT. Click here for more details about credits.

The lexical base of LEMLAT 3.0 is the result of the collation of three Latin dictionaries (Georges and Georges, 1913-1918; Glare, 1982; Gradenwitz, 1904). It counts 40,014 lexical entries and 43,432 lemmas (as more than one lemma can be included into one entry). LEMLAT 3.0 was also enriched by 26,250 lemmas out of 28,178 from the Onomasticon by Forcellini (1940). In total, the lexical basis of LEMLAT 3.0 includes 69,652 lemmas. A large number of graphical variants is managed.

Given in input a Latin word form that is recognised by LEMLAT 3.0, the tool produces in output the corresponding lemma(s) and morphological features. The latter are further divided into those for lemma (PoS and inflectional paradigm: e.g. first conjugation verb) and those for word form (gender, number, case, tense, person etc.: e.g. singular nominative).

No contextual disambiguation is performed.

For instance, receiving in input the word form rosam, LEMLAT 3.0 outputs two lemmas, which are assigned their PoS and inflectional paradigm: (a) rosa, –ae (first declension noun, “rose”) and (b) rodo, –ere (third conjugation verb, “to gnaw”).

The morphological features for the word form rosam produced by LEMLAT 3.0 are the following:

lemma rosa, –ae: singular, accusative
lemma rodo, –ere: perfect participle, singular accusative, feminine

To process word forms, LEMLAT 3.0 makes use of data stored in tables, made available in a SQL database (“lemlat_db”). See here for more details on the database.

One of the tables stored in “lemlat_db” is called “lessario”. The “lessario” table collects the basic lexical components used by LEMLAT 3.0 to analyze input word forms. These components are named LES (“LExical Segment”). A LES is defined as the invariable part of the inflected forms (e.g. ros for ros-am). In other words, the LES is the sequence (or one of the sequences) of characters that remains the same in the inflectional paradigm of a lemma (hence, the LES does not necessarily correspond to the word stem).

In the “lessario” table, LES are assigned an ID and a number of inflectional features among which are a tag for the gender of the lemma (for nouns only) and a code (called CODLES) for its inflectional category. According to the CODLES, the LES is compatible with the endings of its inflectional paradigm.

For instance, the CODLES for the LES ros (when analyzed as a form of lemma rosa, -ae) is N1 (first declension nouns) and its gender is F (feminine). The word form rosam is thus analysed as belonging to the LES ros because the segment -am is recognised as an ending compatible with a LES that is assigned CODLES N1.

See here for the LEMLAT 3.0 full tagset.

LEMLAT 3.0 is free software. See here for details on distribution licence.

On the first version of LEMLAT, see:

Bozzi, Andrea, and Cappelli Giuseppe. “A project for Latin Lexicography: 2. A Latin morphological analyzer.” Computers and the Humanities 24.5-6 (1990): 421-426.
Marinone, Nino. “A project for Latin Lexicography: 1. Automatic lemmatization and word-list.” Computers and the Humanities 24.5-6 (1990): 417-420.

On LEMLAT 2.0, see:

Passarotti, Marco. “Development and perspectives of the Latin morphological analyser LEMLAT”, in Linguistica Computazionale» XX-XXI (2004): 397-414.

On LEMLAT 3.0, see:

Passarotti, Marco; Budassi, Marco; Litta, Eleonora; Ruffolo, Paolo, “The Lemlat 3.0 Package for Morphological Analysis of Latin”, in Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language (2017): 24-31. Northern European Association for Language Technology (NEALT): Gothenburg, SWEDEN.

On integrating the Onomasticon of Forcellini into the lexical basis of LEMLAT 3.0, see:

Budassi, Marco and Passarotti, Marco. “Nomen Omen. Enhancing the Latin Morphological Analyser Lemlat with an Onomasticon”. Proceedings of the 10th Workshop on Language Technology for Cultural Heritage, Social Sciences and Humanities (LaTeCH 2016) [.pdf].