...
The FirstVoices Unicode 'Confusables' Database is a collection of Unicode characters found primarily within BC-based Indigenous languages, alongside the Unicode characters that can be mistakenly used in place of them. As an example:
ḵ |
---|
Latin Small Letter K With Line Below |
\u1E35 |
is a character found in many Indigenous alphabets. Some Unicode characters commonly mistaken for this one are:
k̲ | k̠ | ḵ |
---|---|---|
Latin Small Letter K + Combining Low Line | Latin Small Letter K + Combining Minus Sign Below | Latin Small Letter K + Combining Macron Below |
\u006B\u0332 | \u006B\u0320 | \u006B\u0331 |
...
What is the FirstVoices Unicode 'Adjacency Pair' Database?
...
Here is one example adjacency pair by codepoint, involving one lowercase letter and one combining diacritic:
k̓ |
---|
Latin Small Letter K + Combining Comma Above |
\u1E35 + \u0313 |
Because it involves a combining diacritic, it is also an example of a grapheme.
Here is one example adjacency pair by grapheme, which involves a total of two non-combining characters:
k̓w |
---|
Latin Small Letter K, Combining Comma Above + Latin Small Letter W |
\u1E35\u0313 + \u0077 |
...
Where is the database?
...
The FirstVoices Unicode Databases can be found here.
...
Why did we curate these databases?
...