The model definition file
This is a small TypeScript source code file that tells us how to define our model.
In the case of the wordlist lexical models, the model definition file indicates where find to find the TSV source files, as well as gives us the option to tell the compiler a little bit more about our languageās spelling system or orthography.
The model definition template
Keyman Developer provides a default model definition
similar to the following. If you want to create the file yourself, copy-paste
the following template, and save it as model.ts
. Place this file
in the same folder as wordlist.tsv
.
/*
sencoten 1.0 generated from template.
This is a minimal lexical model source that uses a tab delimited wordlist.
See documentation online at https://help.keyman.com/developer/ for
additional parameters.
*/
const source: LexicalModelSource = {
format: 'trie-1.0',
sources: ['wordlist.tsv'],
};
export default source;
Let's step through this file, line-by-line.
On the first line, we're declaring the source code of a new lexical model.
const source: LexicalModelSource = {
On the second line, we're saying the lexical model will use the
trie-1.0
format. The trie
format creates a lexical
model from one or more word lists; the trie
structures the
lexical model such that it can predict through thousands of words very
quickly.
format: 'trie-1.0',
On the third line, we're telling the trie
where to find our wordlist.
sources: ['wordlist.tsv'],
The fourth line marks the termination of the lexical model source code. If we specify any customizations, they must be declared above this line:
};
The fifth line is necessary to allow external applications to read the lexical model source code.
export default source;
Customizing a wordlist lexical model
The template, as described in the previous section, is a good starting
point, and may be all you need for you language. However, most language
require a few customizations. The trie-1.0
wordlist model
supports the following customizations:
- Punctuation
- How to define certain punctuation in your language
- Word breaker
- How to determine when words start and end in the writing system
- Search term to key
- How and when to ignore accents and letter case
To see all of the things possible in a model definition file, see the
LexicalModelSource
interface.