Click or drag to resize

Creating PDF/A documents

This topic contains the following sections:

Overview

PDF is a standard for encoding documents in an "as printed" form that is portable between systems. However, the suitability of a PDF file for archival preservation depends on options chosen when the PDF is created: most notably, whether to embed the necessary fonts for rendering the document; whether to use encryption; and whether to preserve additional information from the original document beyond what is needed to print it.

The PDF/A standard does not define an archiving strategy or the goals of an archiving system. It identifies a "profile" for electronic documents that ensures the documents can be reproduced exactly the same way using various software in years to come. A key element to this reproducibility is the requirement for PDF/A documents to be 100% self-contained. All of the information necessary for displaying the document in the same manner is embedded in the file. This includes, but is not limited to, all content (text, raster images and vector graphics), fonts, and color information. A PDF/A document is not permitted to be reliant on information from external sources (e.g., font programs and data streams), but may include annotations (e.g., hypertext links) that link to external documents.

You can use the Pdfium.Net SDK to create a PDF/A document. Because PDF/A is an archival format for long-term preservation of the document’s content, all fonts are embedded and the file is uncompressed. As a result, a PDF/A document is typically larger than a standard PDF document. In addition, a PDF/A document does not contain audio and video content.

How to create PDF/A document

Detailed knowledge of PDF/A standards is necessary in order to create PDF/A documents. But, in general, for a document to satisfy PDF/A, the following conditions must be met.

  • The PDF/A document must contain an embedded ICC profile for the color spaces being used.

    C#
    public void EmbedIccProfile(PdfDocument doc)
    {
        byte[] iccProfile = System.IO.File.ReadAllBytes(@"sRGB-v4.icc");
        var outputIntent = new PdfOutputIntentItem(doc, "sRGB", PdfStandard.PdfA);
        outputIntent.Profile = PdfColorSpace.ICCBased(doc, 3, iccProfile, PdfColorSpace.DeviceRGB());
        doc.OutputIntents = new PdfOutputIntentsCollection(doc) { outputIntent };
    }
    Note  Note

    PDF/A Part 1 requires ICC profile v2.

  • The logical structure of the conforming file shall be described by a structure hierarchy rooted in the StructTreeRoot entry of the document catalog dictionary, as described in PDF Reference 9.6. If the document does not involve logical structuring, then it is necessary to create an empty structure.

    The document catalog dictionary shall include a MarkInfo dictionary with a Marked entry in it, whose value shall be true.

    C#
    public void CreateEmptyLogicalStructure(PdfDocument doc)
    {
        var structTree = PdfTypeDictionary.Create();
        structTree.SetNameAt("Type", "StructTreeRoot");
        doc.Root.SetIndirectAt("StructTreeRoot", PdfIndirectList.FromPdfDocument(doc), structTree);
    
        var markInfo = PdfTypeDictionary.Create();
        markInfo.SetBooleanAt("Marked", true);
        doc.Root["MarkInfo"] = markInfo;
    }
  • The font programs for all fonts used for rendering within a conforming file shall be embedded within that file, as defined in ISO 32000-1:2008, 9.9.

    C#
    public PdfFont CreateFont(PdfDocument doc)
    {
        byte[] fontProgram = System.IO.File.ReadAllBytes(@"times.ttf");
        return PdfFont.CreateEmbeddedFont(doc, fontProgram, FontCharSet.UNICODE_CHARSET, false);
    }
  • PDF/A documents must include a metadata stream containing information identical to the document catalog Information dictionary. Also, such a stream must contain information about the part and conformance level of the PDF/A standard.

    C#
    public void CreateMetadata(PdfDocument doc, string Title, string Subject = null, string Author = null, string keywords = null)
    {
        doc.Title = Title;
        doc.Subject = Subject;
        doc.Author = Author;
        doc.Keywords = keywords;
        doc.GenerateMetadata("3", "U"); // PDF/A-3U
    }

Now you can create a simple PDF/A.

C#
public void CreatePDFA()
{
    using (var doc = PdfDocument.CreateNew())
    {
        float width = 8.5f * 72;
        float height = 11.0f * 72;
        PdfPage page = doc.Pages.InsertPageAt(0, width, height);

        CreateMetadata(doc, "Example PDF/A document");
        CreateEmptyLogicalStructure(doc);
        EmbedIccProfile(doc);
        var font = CreateFont(doc);

        float xLocation = 10.0f;
        float yLocation = 10.5f * 72;
        float fontSize = 20.0f;

        //Add text
        var txt = PdfTextObject.Create("Hello, ", xLocation, yLocation, font, fontSize);
        page.PageObjects.Add(txt);

        yLocation = 10.0f * 72;
        txt = PdfTextObject.Create("This is an document that complies with the PDF/A-3U standard.", xLocation, yLocation, font, fontSize);
        page.PageObjects.Add(txt);

        //Add image
        var img = PdfImageObject.Create(doc);
        img.LoadJpegFile(@"sample_image.jpg");
        float w = (float)(img.Bitmap.Width) * 72.0f / 300f;
        float h = (float)(img.Bitmap.Height) * 72.0f / 300f;
        yLocation = 9.5f * 72 - h;
        img.Matrix = new FS_MATRIX(w, 0, 0, h, xLocation, yLocation);
        page.PageObjects.Add(img);

        //Save document
        page.GenerateContent();
        doc.Save(@"pdfa_document.pdf", SaveFlags.ObjectStream | SaveFlags.RemoveUnusedObjects | SaveFlags.NoIncremental);
    }
}
See Also

Other Resources

Color Spaces