Common File-Types & Formats


Introduction


You will interact and use many different types of files and their formats in your language technology journey.

This article contains introductory information on some common file-types and formats. Learn more about how and when you would expect to use each of these tools.


Terminology


File-Type

File-types and formats are technically different things. In this article, we will use both terms for the most part interchangeably.

Technically, file-type relates to the file extension or application related to the file.

File Format

File formats relate to the structure, metadata, and how the information is stored/contained.

Lossy

Lossy means that the file has had irreversible compression of data, discarding of ‘non-essential’ information. It is used to reduce file size and improve storage. These files are usually better for sharing or streaming.

Lossless

Lossless means that the file still has its ​full reconstruction of data, which preserves all information. Lossless preservation results in larger files, which are better to edit.​

Compression

Compression is​ essentially using few bits/data to represent information. It results in the file shrinking to be easier to store and share. Compression and lossy-ness often correspond in files, but not always. PNG image files are both compressed but lossless.


Types of Files


How to Read the Chart 

Check out what each column means below: 

File Extension 

The three-to-four-character code that you can use to identify the file type or format on your device. 

File Name 

The full name of the file-type/ format 

Media 

What type of content the file container/type stores (e.g., audio, audio/video, text etc.) 

Compressed (Y/N)? 

Does the file irreversibly get excess non-essential data removed to reduce its size? 

Compression is an irreversible method to improve storage and shareability by shrinking the amount of excess data a file contains. 

File Extension 

File Name 

Media 

Compressed (Y/N)? 

Lossy or Lossless? 

Best For 

.wav 

Waveform Audio File 

Audio 

Lossless 

Archiving/Editing 

.mp3 

Moving Picture Experts Group 

Audio 

Lossy 

Uploading/Sharing 

.mp4 

MPEG-4 Part 14 

Audio/Video 

Lossy 

Archiving/Editing 

/Uploading/Sharing 

.tiff or .tif 

Tag Image File Format 

Image 

Lossless 

Archiving 

.mov 

Quick Time File Format 

Audio/Video 

Lossless 

Archiving/Editing (Mac/Apple) 

.avi 

Audio Video Interleave 

Audio/Video 

Lossless 

Archiving/Editing (PC/Windows) 

.jpeg 

Joint Photographic Experts Group 

Image 

Lossy 

Uploading/Sharing 

.png 

Portable Network Graphics 

Image 

Lossless 

 

Archiving/Editing 

/Uploading/Sharing 

.aup 

Audacity Project File 

Software Specific 

Lossless 

Audacity Software Editing 

.eaf 

ELAN Annotation Format 

Software Specific 

Lossless 

ELAN Software Editing 

.txt 

Text 

Text 

Lossless 

Basic Text 

.srt 

SubRip Text File 

Text 

Lossless 

Subtitles 


Differences in File Use


Best for Archiving

Audio-files

Audio-Visual files

Image files

Audio-files

Audio-Visual files

Image files

WAV

AVI

TIFF

MP4 (Broadly)

PNG

Best for Editing

Audio-files

Audio-Visual files

Image files

Audio-files

Audio-Visual files

Image files

WAV

MP4 (Broadly)

PNG

Best for FirstVoices/Streaming

Audio-files

Audio-Visual files

Image files

Audio-files

Audio-Visual files

Image files

MP3

MP4

PNG

via link (YouTube/Vimeo)

JPEG


Software Specific Files


There are some file-types that are also software specific. You will need to keep track of these files in order to access and edit media in each respective software.

Common software specific project files include:

Software

File extension

Use

Software

File extension

Use

Audacity

.aup

Audio-recording & editing

ELAN

.eaf

Transcription

You might also come across .txt files or .srt files.

These are types of text files, which can be used to keep notes or store written information.

.srt files are a subtitling format as well. You can use ELAN to export .srt files to subtitle videos in your language. Learn more about this process, here: https://firstvoices.atlassian.net/wiki/spaces/DIGI/pages/1278174

 

For quick reference on the job, print out or save this handy comparison chart: