Choose a Scanner: Texts & Documents


Introduction


Digitization is the conversion of analog materials to digital formats. This process can of course include the repurposing of paper-based materials or textual resources.

Oftentimes, a DiGI project might include digitizing related or standalone documents alongside audio-visual assets (e.g., audio-cassette tapes).

These materials are invaluable to projects and language learners. They can provide information about:

  • Language change

  • Personal stories and accounts

  • Older, less common speech, phrases, and words used today

  • Changes in writing systems/ orthographies

In this article, there are suggestions and tips for how to pick the right scanner (or equivalent method) to fit your digitization needs. It is important to consider the types of materials being digitized, their unique needs, technical capacity, and cost.


Identify Your Documents


Before you decide on which scanner to purchase for your project, it is necessary to know the different needs of the materials you will be digitizing. The size, delicateness, visual transparency, and other physical features of your texts or documents will all determine the model you will want to buy.

Common Materials

It will be important to sort through and know what types of resources you will be scanning. The following are some common types you will likely encounter in a cultural centre or archival space:

Type of Material

Description

Digitization Needs

Maintenance and Care

Notes

Type of Material

Description

Digitization Needs

Maintenance and Care

Notes

Documents

These materials likely will be the most variable and will contain different insights to language and culture.

Documents may be (but are not limited to) handwritten notes, manuscripts, handouts, and miscellaneous pages of text. 

Digitization needs will vary depending on the size of materials as well as delicateness.

Standard printer paper size is 8.5 by 11 inches, but older documents might not be printed or written on these paper dimensions.

Some paper materials might be very fragile and cannot be fed through an automatic document feeder, so slower methods might have to be taken to safely digitize them.

Handling documents safely will depend on the fragility of the resources. Some more recent paper-based resources can be fed through a document feeder like most other papers you would copy in an office setting.

Delicate papers will have to be handled more carefully and scanned individually, often with an overhead or flatbed scanner.

If there are paper clips or staples keeping documents together, it is best to remove them as they can cause damage and rust overtime. They are often considered a nuisance in the long run.

If there is a lot of text on the digitized document, it is recommended that once it is in a digital format that you run OCR (Optical Character Recognition) over the file.

These types of programs will allow you to search through and copy text from PDFs as if they were word documents.

Depending on the characters and text amount, it might also be necessary to re-transcribe the document if OCR cannot be successfully run.

Film and Photographs (Photographic Negatives)

Different visual resources might also be related to language materials in a collection.



These resources are often negative strips, 35mm slides, and prints.

Film and photographs should be handled carefully.

They cannot be fed quickly through a document feeder, must have both a high dpi (dots per inch) when scanned, and in most cases require a flatbed scanner.

When handling film and related materials, use cotton gloves and make sure your workplace is uncluttered.

Pay attention to your surroundings and try to minimize dust or other debris that could damage the delicate assets.

Negatives should not be left in direct sunlight.

It is best to scan film and photographs as TIFF files.

This format will provide the highest quality image for original and edited copies.

Most high-end flatbed scanners can digitize negatives, but it is important to vet equipment before purchasing anything.

Booklets

It is common to find many booklets and other handouts in archives. They are usually produced by language programs or other organizations as language learning initiatives.

They might contain hand-drawn images and text. 

Depending on how they are bound, booklets can sometimes be a challenge to digitize and scan.

Pressing the content side of the booklet into a flatbed scanner can distort images and text as a result of the angle of the material.

Booklets also cannot be fed through a typical feeder often due to their dimensions, binding, and the types of paper they are constructed of.

If the binding can be safely removed, it may be easier to scan pages of booklets in a flatbed scanner like one would a typical document.

If a booklet cannot be taken apart, it may be digitized with an overhead scanner (if available).

If this type of scanner is not available, a booklet can be carefully scanned on a flatbed while bound. Alternatively, high-quality digital photographs can be taken of each content rich page.

It is important to digitize the cover art, preface, and back of a booklet. These sections often include important metadata and acknowledgements.

Maps

Maps might be another flat-surfaced material that you possess in your centre or archive.

These resources can provide information about the surrounding territory and regions as well as other geographic insights.

Maps also may document place names and other land-based knowledge.

Maps can be difficult to digitize as a result of their varying sizes and fragilities.

Some maps might be able to fit into a flatbed or wide format scanner; however, larger maps will need to be photographed from above (or in sections).

Another option for scanning maps is to reach out to copy houses (e.g. outsource) who have larger scale equipment to scan and print posters and building plans. These tools can often be adapted for maps too.

Once digitized, maps can be saved like any other image file.

The physical care and storage for oversized/large scale maps however can be a challenge.

Preferably, oversized papers can be kept in a flat file database where they can be safely preserved spread out at full length.

Another storage option is to roll maps and keep them on their sides or roll them around a tube and wrap them in acid-free materials/polyester. 

Maps and posters can be easily damaged by precarious storage.

Investing in oversized boxes is another great option to lay materials flat if more specialized storage containers (e.g. flat file databases) are not available.

Books

Another common textual resource that is likely readily available in every archive is books.

Books can prove a bit of challenge because of their bound spines. Unlike booklets, which can be disassembled potentially and are generally small enough to fit into a flatbed scanner, books may require other digitization strategies.

If it is at your disposal, an overhead scanner is invaluable to capture and safely digitize fragile, bound materials. These scanners tend to be expensive across all makes and models. Depending on how many books are in your collections, this piece of equipment may be worth the price and a smart investment.

A cheaper alternative to an overhead scanner is photographing each page or carefully scanning them on a flatbed (if possible). However, these methods can be prone to human error and irregularity in quality (e.g. Page 1 might not look exactly like Page 65). 

A Note on Plastics

While we might have inclinations to avoid using them in our day-to-day lives, plastics are extremely helpful in preserving and storing textual materials. Plastic covers and cases can prevent damage to original resources from handling and use (e.g. fingerprints).

It is important to stick with only these plastics:

  • Polyester

  • Polypropylene

  • Polyethylene

What you should not use is polyvinyl chloride.

Polyvinyl chloride unfortunately can deteriorate overtime and release hydrochloric acid gradually. Your paper-based or textual materials will not fizzle away if they are stored within polyvinyl chloride casing, but ink and other important content on the page can begin to adhere and stick to this type of plastic.

Unless they are marked as a different material, common page protectors and 3-ring binders usually contain polyvinyl chloride. Be sure to check and make sure you know what types of plastic your storage containers, sleeves, and folders are if you are purchasing them from a general office supplies store.

Makes & Models

As one can see, different types of paper-based resources need different types of scanners and methods to digitize them safely and successfully.

The main strategies and tools that may be at your disposal include:

Strategy

Description

Model(s)

Pros

Cons

Strategy

Description

Model(s)

Pros

Cons

Automatic Document Feeder

Automatic document feeders can quickly expedite and scan regularly shaped documents and pages that do not require special processing or extra care.

For negatives, maps, and other delicate resources, this method of digitization could permanently cause damage if used on them.

Fujitsu fi-6670A

Brother ADS-2800W



  • Quick

  • Good for large batches

  • Not safe for all archival resources

  • Should only be used for regularly dimensioned pages

Flatbed Scanner

Flatbed scanners are the usual go-to's for most delicate materials that need to be scanned in an archive.

This type of scanner is generally safe for photographs and resources that cannot be fed or run through a machine. 

Epson Perfection Flatbed Scanners:

Epson Perfection V850

Epson Perfection Pro 700/750

Epson Perfection V600

  • Can scan various resource types

  • Often comes with attachments to scan negatives

  • Can be expensive

  • A slower process than a document feeder

Wide Format Scanner

Wide format scanners are specially designed to scan oversized formats that would not typically fit inside or through a document scanner or feeder, respectively.

Depending on the model, resources are typically fed through the oversized mouth of the scanner to be digitized.

Fragile materials still might need different methods to ensure their preservation and prevent damage during the digitization process.

DS-32000 Large-format Document Scanner

Contex HD 5450

  • Can handle oversized, irregularly shaped documents & posters

  • Expensive

  • Not suitable for all oversized resources (e.g. if they are fragile)

Overhead Scanner

Overhead scanners are utilized for digitizing delicate, fragile resources or ones that cannot be scanned easily through a flatbed scanner.

Books and other assets that cannot be unbound benefit from being scanned overhead. This strategy can also prevent any distortions from pages pressing into a flatbed scanner.

Fujitsu Scan Snap SV600

Epson DC-21

  • Can safely digitize fragile materials

  • Expensive

  • Requires setup

Outsourcing

As you might outsource A/V materials like reel-to-reels to organizations who have the capacity to digitize them, you can also choose to outsource some textual resources too.

Many printing houses will have the equipment and expertise to digitize and scan oversized papers and formats at their disposal.

These businesses often deal with blueprints, posters, and other custom paper sizes. They also generally know how to package and ship paper-based materials securely.

N/A

  • Frees up in-house work time

  • No additional equipment purchases needed

  • Can be expensive

  • Does not build in-house capacity

  • Requires transferring assets outside the archive and your care

Camera for High Quality Photographs

If resources are too fragile to be scanned in a flatbed or wide format scanner and an overhead scanner is not an option for purchase, then high quality photographs also suffice.

While it might not seem ideal, photographs are often used to document many different types of assets in museums and archives (e.g. textiles, regalia, sculpture etc.)

Preserving delicate documents and other materials through high quality images that can be edited and cropped later on is a great response to technical capacity limits or equipment sourcing issues.

Canon EOS 5D Mark II-VI

  • Can also be used for non-digitizing purposes

  • User-friendly (if familiar with taking photos)

  • Can document non-textual resources too

  • Benefits from some experience in photography

  • Requires additional equipment (e.g. lighting & tripod)

Choose Your Own Scanner Adventure

In the flowchart below, follow the prompts to find out which scanner may be right for you and your archival needs.

Keep in mind this chart provides options, general ideas of what type of equipment, or a potential strategy that might be best for textual or paper-based digitization in a project.

However, details including your technical capacity, collection size, budget, intuition, and other professional considerations specific to your archive will also influence this decision.

Example: You are digitizing a 36"x55" map of waterways with place names ...

 

Example: You are digitizing old 35mm slides ...

If you have questions or concerns about scanning or which scanner might be right for you, please contact Ben to discuss potential options: ben@fpcc.ca 

 

Please follow this link to take you to the interactive module:

https://digi-training.s3.ca-central-1.amazonaws.com/choose-a-scanner/index.html

Additional Resources

For more detailed information about scanners, digitizing textual resources, photographs, and tutorials, follow these links:

 

If you have additional questions, please contact:

Ben Chung
Cell: (604) 319-7094
Email: ben@fpcc.ca 

FPCC Office: (250) 652-5952