Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The FirstVoices Unicode 'Confusables' Database is a collection of Unicode characters found primarily within BC-based Indigenous languages, alongside the Unicode characters that can be mistakenly used in place of them. As an example:

Latin Small Letter K With Line Below

\u1E35

is a character found in many Indigenous alphabets. Some Unicode characters commonly mistaken for this one are:

Latin Small Letter K + Combining Low Line

Latin Small Letter K + Combining Minus Sign Below

Latin Small Letter K + Combining Macron Below

\u006B\u0332

\u006B\u0320

\u006B\u0331

...

What is the FirstVoices Unicode 'Adjacency Pair' Database?

...

Here is one example adjacency pair by codepoint, involving one lowercase letter and one combining diacritic:

Latin Small Letter K + Combining Comma Above

\u1E35 + \u0313

Because it involves a combining diacritic, it is also an example of a grapheme.

Here is one example adjacency pair by grapheme, which involves a total of two non-combining characters:

k̓w

Latin Small Letter K, Combining Comma Above + Latin Small Letter W

\u1E35\u0313 + \u0077

...

Where is the database?

...

The FirstVoices Unicode Databases can be found here.

...

Why did we curate these databases?

...