Basic Introduction to PDF/A
PDF/A is a file format for the long-term archiving of electronic documents. It is in fact a subset of PDF, obtained by leaving out PDF features not suited to long-term archiving.
The feature-rich nature of PDF can create difficulties in preserving information over the long-term, and some useful features of the PDF file format are incompatible with the demands of long-term preservation. For example, PDF documents are not necessarily self-contained, drawing on system fonts and other content stored external to the original file. As time passes, and especially as technology changes, these external connections can be broken, and the dependencies cause information to be lost. Additionally, because of the lack of standardization among the many PDF development tools on the market, there is inconsistency in the implementation of the file format. This lack of standardization could be chaotic for the information managers of the future, especially as it would be difficult (if not impossible) for them to “get under the hood” of the PDF files unless a format specification were put in place that specifically addressed long-term preservation needs.
Tremendous quantities of valuable information are currently be created and saved all over the world as PDF, and a specification solution is needed to ensure that digital PDF documents remain readable, renderable and accessible for the long-term. PDF/A is designed to be that specification.
The PDF/A project was initiated in October 2002 when a group of individuals representing the end user, archival, records management and solution providers communities met to discuss a similar concern about the long-term preservation of electronic documents. The PDF/A project was approved by the AIIM Standards Board in October 2002. In August 2003, the project work was approved as an ISO New Work Item. ISO 19005-1, was published by ISO in September 2005. This standard is based on the Adobe PDF Reference 1.4. The U.S. effort is jointly managed by AIIM and NPES.
The PDF/A, A-1a, A-1b, A-2 "Babylon"
PDF/A has been established as a row of standards with several parts. Currently only PDF/A-1 (Part 1) has been approved. PDF/A-1 is further subdivided into two levels of compliance: PDF/A-1a and PDF/A-1b.
PDF/A-1a (referred to as Level A Conformance) denotes full compliance with the currently approved PDF/A Standard (ISO 19005-1): Part 1. There is also a "minimal compliance" level for PDF/A: PDF/A-1b (referred to as Level B Conformance). PDF/A-1b requirements are meant to ensure that the rendered visual appearance of the file is reproducible over the long-term.
PDF/A-1a and PDF/A-1b differ primarily with respect to text extraction.
- PDF/A-1a ensures the preservation of a document's logical structure and content text stream in natural reading order. The text extraction is especially important when the document must be displayed on a mobile device (for example a PDA) or other devices in accordance with Section 508 of the US Rehabilitation Act. In such cases the text must be reorganized on the limited screen size (re-flow). This feature is also known as "Tagged PDFs".
- PDF/A-1b ensures that the text (and additional content) can be correctly displayed (e.g. on a computer monitor), but does not guarantee that extracted text will be legible or comprehensible. It therefore does not guarantee compliance with Section 508.
A new part to the standard, ISO 19005-2, Part-2 (PDF/A-2), is currently being worked on by the Technical Committee. PDF/A-2 will address some of the new feature added with versions 1.5, 1.6 and 1.7 of the PDF Reference. PDF/A-2 should be backwards compatible, i.e. all valid PDF/A-1 documents should also be compliant with PDF/A-2. However PDF/A-2 compliant files will not necessarily be PDF/A-1 compliant.
PDF vs PDF/A
The PDF/A-1 (ISO 19005-1:2005) standard is based on Adobe’s PDF Reference 1.4, and specifies how to use a subset of PDF components to develop software that creates, renders and otherwise process a flavor of PDF that is more suitable for archival preservation than traditional PDF.
PDF/A-1 aims to preserve the static visual appearance of electronic documents over time and also aims to support future access and future migration needs by providing frameworks for:
- embedding metadata about electronic documents;
- defining the logical structure and semantic properties of electronic documents.
What does PDF/A-1 allow/disallow?
One of the key differences between PDF and PDF/A is the restrictions that PDF/A places on PDF.
PDF/A-1 files must include:
- Embedded fonts
- Device-independent color
- XMP metadata
PDF/A-1 files may not include:
- LZW Compression
- Embedded files
- External content references
- PDF Transparency
What long-term preservation needs does PDF/A-1 address?
Characteristics identified as objectives for PDF/A were:
- Device Independent - Can be reliably and consistently rendered without regard to the hardware or software platform
- Self-contained - Contains all resources necessary for rendering
- Self-documenting -Contains its own description
- Unfettered - Absence of technical file protection mechanisms
- Available - Authoritative specification publicly available
- Adoption - Widespread use may be the best deterrent against preservation risk
When should PDF/A be used?
PDF/A should be used as a way to standardize the use of PDF for electronic document storage and ensure that these documents will be available well into the future. This is important to support business needs that require reliable rendering of electronic documents over the long term.
As a file format specification, users will need to establish their own capture methodology that meets domain specific policies and procedures (e.g., for reliability, integrity, compliance, comprehensiveness).
For example, for permanent records in PDF, US Federal agencies will need to implement PDF/A-1 in conjunction with additional requirements identified in guidance from the National Archives and Records Administration (NARA) for transferring permanent PDF records to NARA, http://www.archives.gov/records_management/initiatives/pdf_records.html.
It is important to be aware that:
- PDF/A-1 alone does not guarantee preservation
- PDF/A-1 alone does not guarantee exact replication of source material