(New) Managing Your Data for Your Project



Introduction




Data information is central to any technology-based project. It is the building block and the information itself that you share and use to exchange language.

How we think about data is also important. This information is not something we can easily see by itself. But, data makes up the images, audio, and platform you host content on.

For these reasons, we can think about data management and maintenance as a form of care and safeguarding.

To make sure our tools and projects continue to run smoothly and remain accessible, there needs to be stewardship and continued action to ensure the gears keep turning (so to speak).

The following article outlines ways in which to think of data and plan a strategy to maintain and record information about data (as known as metadata). At every step of your project, you can address and work with data differently, depending on how you plan to use it.

Frequently Used Terms

In this article, there are many terms and themes around data and technology. Some of them might be more common than others.

If you are unsure about any wording or terms, please consult these resources that describe many words relating to language and technology that would be considered 'jargon' or specialized vocabulary:



Recording Practices




Recording audio is an interactive and often fun task involved with FirstVoices projects and other language initiatives.

Some ways to include data maintenance in your recording practices are:

  • Stating the time, date, place, environment, people with you, and session number at the start of your recording

This information is a type of metadata and will be helpful for you when reviewing the audio later.

  • Assessing your devices and the recording standards you are using

The type of device you are using to record will influence how detailed your recordings will be, which relates to how much data you collect.

The settings you are recording in also will influence how much data and other information are stored in a recording. The higher the Hertz (Hz) and Bits, the more data and details you will record. Sometimes, it is not necessary record in such detail.

A good standard is 48,000 Hz or 48kHz as a Sample Rate and 24-bits as the Bit-depth in a software called Audacity.

Related resources to this topic can be found here:



Labelling




Labelling might not seem like a big deal. However, once you start recording lots of audio and need to store and find individual recordings out of thousands, labelling is helpful in your workflow.

How you manage your files includes many aspects of data.

File naming conventions are a type of metadata. Including information to identify the speaker, date or recording, session, topic, and filetype are examples of data that you record.

How you arrange and track this recording information is another aspect of data management and maintenance. We can call this our audio inventory.

Related resources to this topic can be found here:



Checksums




Checksums are another aspect of data that is often used in archiving. They are also helpful to employ and adopt into your FirstVoices project. To use checksums, you will need to download and install additional software, and we recommend BWF MetaEdit.

Checksums are bits of data that tell you if there have been changes in your file. These changes occur for many reasons including just naturally over time.

Once your data goes through enough changes, it might become corrupted or unusable.

For this reason, checksums can help identify and give you a heads up to when you might want to switch out a file with a backed-up copy with fewer changes.

These changes are sometimes called bit rot or data degradation. All in all, bit rot cannot be prevented as it can simply happen as files age. Backing up your data is a solid strategy to make sure that the effects of bit rot do not slow down or hinder your progress.

One important step in using checksums is the review and monitoring of changes. It will be important to establish a procedure and schedule to use BWF MetaEdit on your files again to check on these files. The software will also tell you if there have been changes to the files’ checksums that have already been embedded.

Related resources to this topic can be found here:



Backups & Storage




Backups are copies of your data that are stored in multiple settings or locations to be used in a pinch and in case of data loss.

Depending on the amount of data you are backing up, it may be faster to enable automatic backups. Some software programs like ELAN, which is used for transcribing audio and videos, can backup automatically if you change the settings within it.

Some programs that will expediate and help you make backups quickly on your computer include:

  • Fbackup: a free open-source, backup software
  • Areca: a free open-source, backup software

A backup is also only useful if it has copies of itself and if its storage space is protected.

When creating backups, it is important to keep in mind these factors:

Copies

A backup is not a backup if there is only one of them! For example, if this one backup is lost, then there is no backup, so you would be back to Stage 1.

You will need to make multiple copies to strengthen your backup plan. Three backups is a good number to have.

Type of Storage

The type of storage includes the settings and devices you store data on. Some cheaper options include flash drives and portal hard drives, which can store some data, but not as much as a server or on a cloud.

However, even with some of these more expensive, larger storage solutions, there are other factors to consider. A cloud that has its servers located outside of Canada may have international laws apply to the data stored there. If you have your own server for storage, be sure that it has the latest software and updates to prevent vulnerability to hackers or ransomware.

Security

These storage considerations also relate to data security. Keep your data secure by applying passwords to storage systems and speaking with IT professionals in your community or organization (if applicable) if there are other measures that the Band takes to prevent data loss or theft.

Using integrated methods like being aware of phishing scams, adding passwords, and 2-step verification steps are helpful to protect your data.  

Environment

Where you store data is also very important.

If you are in a location that is at risk of flooding or wildfires, investing in a waterproof & fireproof safe for hard drives and servers is advised. Generally, you want to store at least one backup off-site.

Keeping servers and archives in a cool, dry, temperature-controlled space is also suggested. Heat can increase the risk of data loss, and moisture with heat can lead to mould, which can also damage storage devices and materials.

If you write down information and have manuscripts or other paper-based materials, be sure to keep your workspace and storage pest-free! You do not want to open your maintenance closet and find a squirrel has gotten into your audio-cassette tapes, camera equipment, and memory cards.  

Access

Finally, who and where you can access your data are also important questions to consider.

Some third-party storage solutions may have off-site servers or charge your organization every time you access your backups. Sometimes, third-party organizations will provide you with an on-site server and free service requests as part of your subscription. It will be necessary to define and ask about access to storage and their related data management policies if using a storage solution professional.

In general, having backups be accessible is helpful in case you need to restore files. Often when you need to restore files due to bit rot, it might be a stressful situation. You do not want to stress eve more by having to locate or search for your backups.

Backups should be easy to always locate and be available to language technology workers for their use.

Related resources to this topic can be found here:



Check Before You Tech




Another resource you can use to evaluate and consider your options before beginning your language technology project is the Check Before You Tech document from the First Peoples’ Cultural Council.

This informative document includes checklists and insights about language apps and software. You can use it to create dialogue with your team and as a self-reflection tool. Understanding your own expectations and capacity involving data control, ownership, access, and possession are important as you begin your project.

Read more here: Check Before You Tech

Data Maintenance Checklist

The following checklist will also help you organize and maintain your data for your FirstVoices or language technology project.

Download the PDF here: