Basic Introduction to PDF

PDF (Portable Document Format) is an open standard for document exchange. The file format created by Adobe Systems in 1993 is used for representing two-dimensional documents in a manner independent of the application software, hardware, and operating system. Each PDF file encapsulates a complete description of a fixed-layout 2D document that includes the text, fonts, images, and 2D vector graphics which compose the documents. Lately, 3D drawings can be embedded in PDF documents with Acrobat 3D using U3D or PRC and various other data formats.

The Adobe PDF format is 17 years old. The first version, 1.0, was introduced in 1993. Subsequent releases have added new functionality to the spec, and Adobe's flagship products — Adobe Acrobat® and Adobe Reader® software — have progressed accordingly. The following list shows corresponding versions of Acrobat and the PDF specification:

  • (1993) – PDF 1.0 / Acrobat 1.0
  • (1994) – PDF 1.1 / Acrobat 2.0
  • (1996) – PDF 1.2 / Acrobat 3.0
  • (1999) – PDF 1.3 / Acrobat 4.0
  • (2001) – PDF 1.4 / Acrobat 5.0
  • (2003) – PDF 1.5 / Acrobat 6.0
  • (2005) – PDF 1.6 / Acrobat 7.0
  • (2006) – PDF 1.7 / Acrobat 8.0
  • (2008) – PDF 1.7, Adobe Extension Level 3 / Acrobat 9.0
  • (2009) – PDF 1.7, Adobe Extension Level 5 / Acrobat 9.1

File Structure
PDF File Structure includes four main parts: the header, the trailer, the xref table and the body. Generally speaking:
The header: contains just one line that identifies the version of PDF. Example: %PDF-1.5;
The trailer: contains pointers to the xref table and to key objects contained in the trailer dictionary. It ends with %%EOF to identify end of file;

The xref table: contains pointers to all the objects included in the PDF file. It identifies how many objects are in the table, where the object begins (the offset), and its length in bytes;
The body: contains all the object information — fonts, images, words, bookmarks, form fields, and so on.
Click here to learn more details of PDF structure.


Imaging model

PDF graphics use a device independent Cartesian coordinate system to describe the surface of a page. A PDF page description can use a matrix to scale, rotate, or skew graphical elements. A key concept in PDF is that of the graphics state, which is a collection of graphical parameters that may be changed, saved, and restored by a page description. PDF has (as of version 1.6) 24 graphics state properties, of which some of the most important are:

  • The current transformation matrix (CTM), which determines the coordinate system
  • The clipping path
  • The color space
  • The alpha constant, which is a key component of transparency

Interactive elements

PDF files may contain interactive elements such as annotations and form fields. Interactive Forms is a mechanism to add forms to the PDF file format. PDF currently supports two different methods for integrating data and PDF forms. Both formats today coexist in PDF specification:

  • AcroForms (also known as Acrobat forms), introduced in the PDF 1.2 format specification and included in all later PDF specifications.
  • Adobe XML Forms Architecture (XFA) forms, introduced in the PDF 1.5 format specification. The XFA specification is not included in the PDF specification, it is only referenced as an optional feature. Adobe XFA Forms are not compatible with AcroForms.<

Security and signatures

A PDF file may be encrypted for security, or digitally signed for authentication.
The standard security provided by Acrobat PDF consists of two different methods and two different passwords, "user password" and "owner password". A PDF document may be protected by password to open ('user' password) and the document may also specify operations that should be restricted even when the document is decrypted: printing; copying text and graphics out of the document; modifying the document; and adding or modifying text notes and AcroForm fields (using 'owner' password). Click here to learn more details about PDF Security.



For some specific purpose, PDF now has 4 subsets, including: PDF/X, PDF/A, PDF/E, PDF/UA.
They are standardized under ISO for several constituencies:

  • PDF/X for the printing and graphic arts as ISO 15930 (working in ISO TC130)
  • PDF/A for archiving in corporate/government/library/etc environments as ISO 19005 (work done in ISO TC171)
  • PDF/E for exchange of engineering drawings (work done in ISO TC171)
  • PDF/UA for universally accessible PDF files
Click here to learn more details about PDF Subsets.