Data sovereignty in Indigenous contexts refers to the concept that data that comes from an Indigenous community should be owned and controlled by that community. Data sovereignty includes the right of a nation to govern the collection, ownership, and application of its own language data. Technology can complicate this, and that is why FirstVoices takes data sovereignty seriously from both a legal and technical standpoint.
What does data sovereignty on FirstVoices look like?
From a legal standpoint, data sovereignty is built into our copyright terms:
"All materials on this site are protected by copyright laws and are owned by the individual Indigenous language communities who created the archival content. Language and multimedia data available on this site is intended for private, non-commercial use by individuals. Any commercial use of the language data or multimedia data in whole or in part, directly or indirectly, is specifically forbidden except with the prior written authority of the owner of the copyright. Users may, subject to these Terms and Conditions, print or otherwise save individual pages for private use. However, language and/or multimedia data may not be modified or altered in any respect, merged with other data or published in any form, in whole or in part. The prohibited uses include "screen scraping," "database scraping" and any other activity intended to collect, store, reorganize or manipulate data on the pages produced by, or displayed on the FirstVoices websites."
From a technology standpoint, we ensure data sovereignty by hosting all FirstVoices content on Canadian servers which protects all data with Canadian data privacy laws. Canadian data privacy laws provide a higher level of protection than laws in the United States, since there is no unifying law governing data privacy generally across the United States.
Other aspects of data sovereignty in your Language Technology Program (LTP) project
Data sovereignty is part of personal and community storage of information too. While FirstVoices is a useful tool to share language on, it is not a long-term repository or archive. How you choose to store and save your master files and other language content is part of your community and team's data sovereignty as well.
It is important to consider where and who has access to your data while it is being stored. Some options have more benefits than others. For example:
If you are using a cloud storage option (like iCloud), make sure to check where the physical servers are located, which may change privacy laws depending on the country (e.g. the United States)
Tools like DropBox and Google Drive are not secure long-term storage solutions due to their companies' terms and agreements and the possibility for data loss
Some of these sites are helpful for sharing information amongst your team and other community members, but are not real databases themselves. They are harder to organize and search through documents in the same way as a library or archive are.
A great way to maintain data sovereignty is to install your own server for language information that is housed in community. If you are working with an IT or digital assets management organization, consider talking through these questions:
Ask them about how they house data
Ask if there are options to store information locally/closer to community