Digitising paper records is not simply a matter of running documents through a scanner. A well-structured digitisation workflow ensures that the resulting digital files are usable over time, correctly linked to their descriptive metadata, and — where relevant — legally equivalent to the originals they replace. European organisations undertaking large-scale digitisation projects are increasingly guided by frameworks such as the European Commission's Digitisation and Online Accessibility recommendations and sector-specific standards in healthcare and public administration.
This article outlines a practical workflow for document digitisation applicable to most organisations managing significant volumes of paper records. It addresses preparation, capture, quality control, metadata, storage, and the relationship between digital outputs and the original paper.
Step 1: Preparation and Appraisal
Before any scanning begins, the records earmarked for digitisation should be reviewed. Not all paper records warrant digitisation — particularly if they are already scheduled for disposal in the near term or if they contain no content that will be referenced again. Scanning low-value material wastes resources and can obscure the more significant records in a digital repository.
During preparation, physical records should be:
- Checked for condition — damaged, torn, or fragile documents may require conservation treatment before scanning
- Unfolded and de-stapled where safe to do so
- Ordered logically within each file so the digital output reflects the correct sequence
- Flagged for any special handling (oversized documents, photographic prints, bound volumes)
Step 2: Scanning Standards
Resolution is the most frequently discussed technical parameter in digitisation. The appropriate resolution depends on the nature of the document and its intended use:
- Standard text documents: 300 dpi is generally sufficient for OCR and human reading. The resulting files are manageable in size.
- Documents with fine detail (architectural drawings, maps, handwritten records): 400–600 dpi is more appropriate.
- Photographic material: Minimum 400 dpi, with higher resolutions for images intended for preservation rather than access copies.
File format selection determines long-term accessibility. The two formats most widely recommended for archival preservation are:
- PDF/A (ISO 19005) — a constrained subset of PDF designed for long-term preservation. Widely supported and suitable for most text-based documents.
- TIFF (Tagged Image File Format) — an uncompressed image format preferred for high-resolution preservation masters, particularly where the document will be processed further or where quality must be maintained without loss.
The European Commission's publication Digitisation and Online Accessibility of Cultural Material (2011/711/EU) recommends that member states adopt PDF/A or TIFF as primary formats for preservation copies. For access copies distributed to users, lower-resolution PDF files are typically more practical.
Step 3: Metadata Assignment
A scanned image without metadata is difficult to find and impossible to manage systematically. Metadata should be assigned at the point of digitisation, not retrospectively. The minimum metadata set for most records digitisation projects includes:
- Title or description of the document
- Date of the original document (not the scanning date)
- Classification code from the organisation's file plan
- Format of the digital file
- Resolution and bit depth of the scan
- Reference to whether the physical original has been retained or disposed of
Where an organisation uses an electronic records management system (ERMS) or document management system (DMS), metadata entry is typically structured around the system's data fields. For smaller operations, a spreadsheet-based approach can work provided it is consistently maintained and linked to the file naming convention used for the digital files.
Step 4: Optical Character Recognition (OCR)
Applying OCR to scanned documents makes text searchable and significantly improves usability. For documents where the text is typed or printed clearly, modern OCR engines produce high accuracy rates. The output should be checked against the original, particularly for:
- Proper nouns and names
- Numbers and dates (errors here can have significant consequences)
- Documents with unusual fonts, stamps, or handwritten additions
OCR is generally embedded into PDF/A files as a searchable text layer, allowing the document to function as both an image and a text document. TIFF preservation masters are typically not OCR-processed at the time of scanning — a derivative PDF/A access copy is created separately.
Step 5: Quality Control
A systematic quality control process catches errors before files are committed to long-term storage. Quality checks should cover:
- Completeness — verify that the number of pages in the digital file matches the physical document
- Image quality — check for skewed pages, cut-off edges, blurring, or excessively dark areas
- Metadata accuracy — cross-reference a sample of metadata records against the physical files
- File integrity — generate and record checksums (MD5 or SHA-256) for all preservation files to enable future integrity verification
Step 6: Disposition of the Physical Original
Once digitisation is complete and quality control has been signed off, a decision must be made about the physical original. Three outcomes are common:
- Retain the original: Required where the document has legal, evidential, or historical value that the digital copy cannot fully replicate. Notarised documents, original signed contracts, and certain medical records often fall into this category.
- Retain temporarily: The physical record is kept for a defined period as a backup while the digital workflow is established and verified.
- Securely destroy the original: Where the digital copy is accepted as the legal record, the physical original may be securely disposed of according to the organisation's retention schedule. Destruction must be documented.
The legal status of digital copies of original paper documents varies across EU member states. In several jurisdictions, specific legislation establishes that a certified digital copy holds the same evidential value as the original. Organisations should seek legal advice if there is any doubt about whether a physical original can be destroyed after digitisation.
Further Reading
- ISO 19005-3:2012 (PDF/A-3) — Document management, PDF for long-term preservation
- European Commission — Cultural Heritage Digitisation
- Europeana Foundation — European Digital Library standards and practices
Last updated: 28 May 2026