Controlled Vocabularies & File Naming


Introduction


Organizing and filing your resources in an retrievable way is important for you and interested community members.

When storing your files, especially recordings of speakers, it is important to have a well-organized and consistent file management system in place. This helps support your workflow and makes sure that someone who inherits your data will be able to look through and use your files.


File Naming


If you are working on a FirstVoices project, you may be aware that one of the first steps in data management is to set up a file naming convention.

This will be the rule you follow when you name different types of files. The most important thing when determining a file naming convention is consistency. If you are consistent when naming files, it will be easier to organize them and find them.

The following are examples of file naming conventions that we recommend using for FirstVoices projects (e.g., speaker’s initials_date of recording_topic of recording).

Master Files​

[Speaker’s Initials]_ [Date]_[Session number/Topic].wav​

Example: KF_2019-08-20_Greetings.wav​

Word Files

[Speaker’s Initials]_[Date]_[Word/Phrase].wav​

Example: KF_2019-08-20_Hello.wav​

File names are sensitive to certain characters. Our system cannot process filenames with special characters (i.e. characters like č, ł, ā), spaces, or punctuation​ other than - and _


Controlled Vocabularies


For Digitization projects and archiving in general, another general practice is to come up with a controlled vocabulary or thesaurus system.

A controlled vocabulary is the set naming standard and terms for the topics, subtopics, and other important navigational themes in your inventory.

Controlled vocabularies are based off the logical order of how things are grouped.

Check out this example below:

Controlled vocabulary of Canines

Another name for controlled vocabulary is ‘taxonomy’ like the animal kingdom. It is the same concept.

 

In this controlled vocabulary, we see that there is a larger group name for the different members that all share some similar characteristics.

However, the subtopics/categories branch off as what makes each member unique becomes more specialized.

Another example, the categories on FirstVoices act as a controlled vocabulary. Looking a the ‘Body’ category, you can see how it is subdivided:

 

For your archiving and DiGI purposes, it may be beneficial to come up with a controlled vocabulary of the topics, themes, and subdivisions that you will want to use to order your information.

Combined with a solid file naming convention system, you will be able to index or reference information much faster thematically and by searching by date & speaker.

Here is an example of a controlled vocabulary with more relevant categories to your work:

Check out some Frequently Asked Questions (FAQs) below on both file naming & controlled vocabularies:


File Naming FAQ


Question (Q)

Answer (A)

Question (Q)

Answer (A)

Should my filenames be in English or my language?

This is up to you. Some teams prefer to use English to avoid special characters (see next question).

How can I write my file names in my language if I can't use special characters?

Work with your team to make character conversion conventions. For every special characters you need to use to write your filename, choose a Latin-only character that you will replace it with when writing your file names.

Examples: Ł → l_

                 ƛ̓ → tl-

                 č → cv

                 ā → aa

Why should I include the speaker's name and the date?

This information provides context for others working with your files (this could be others on your team or someone who inherits your data in the future). This is important for archival reasons too.

It also helps to maintain the legacy of those who have contributed to language work in your community, ensuring that future language workers can track who worked on the project and when.


Controlled Vocabulary FAQ


Question (Q)

Answer (A)

Question (Q)

Answer (A)

Are file names and controlled vocabularies the same?

Not exactly. While they can be similar, they do not necessarily have to be.

The file name and unique ID (UID) can differ. The controlled vocabulary coding can be used to just reference physical resources or also as larger folder names to direct people to the right topics.

Can controlled vocabularies change?

Of course! Like our knowledge and understandings grow and change, terms can be added or rearranged to reflect contemporary or remembered categories and relationships.

The Māori controlled vocabulary Ngā Upoko Tukutuku is a great example of an ever expanding thesaurus of terms: https://natlib.govt.nz/librarians/nga-upoko-tukutuku

Are controlled vocabularies in our language?

Yes, if you like!

Since controlled vocabularies do not need to be file names, they can entirely in your language. It is helpful to make quick reference coding in relation to the Indigenous language terms (e.g., Hunting > HNT) so they can be interpreted in file naming if desired.

These diagrams can help illustrate the FAQs above:


File Naming vs Controlled Vocabulary


Let’s compare the two systems how visually from the information we had above & how to implement them:

There are various pros and cons to using the different convention systems.

File naming may be something that your DiGI Technicians use more so than the Archivists on staff who will arrange and catalogue the results of the others' work.

File naming is pretty inescapable in the technology field however. So, all members of the team should be aware of how you have collectively agreed to name files.

Ultimately, file naming and controlled vocabularies are intrinsically linked together. With strong file naming use, patterns form.

From these insights, it will be possible to arrange and find a logic and ‘nesting doll’ system of topics and themes: a controlled vocabulary.

For more information and suggestions about community consultation, please see this article on the sister FirstVoices Knowledge Base: File management best practices

 

For more information, check out these resources: