Click or drag to resize

Producing Hybrid PDF/A-3 Documents (Factur-X & ZUGFeRD)

This topic contains the following sections:

Overview

A Hybrid PDF is an engineering marvel that bridges the gap between human readability and machine automation. It embeds an unalterable, structured XML invoice (the machine-readable payload) directly inside a visually pristine, long-term archival PDF/A-3 document.

When a human opens the file, they see a beautiful invoice layout. When an automated accounting system (like a banking API or ERP) ingests the file, it instantly extracts the embedded XML data without flaky OCR or fragile text parsing.

This article demonstrates how to programmatically compile compliant hybrid electronic invoices (Factur-X and ZUGFeRD) using the Patagames Pdfium.Net SDK.

Technical Requirements for PDF/A-3 and Factur-X Compliance

To pass strict archival validation checks (such as ISO 19005-3 and VeraPDF conformance testing), a hybrid PDF document must satisfy the following architectural criteria:

  • Device-Independent Color Space (Output Intent): PDF/A prohibits the use of unmanaged device color spaces (DeviceRGB or DeviceCMYK). The document must contain a global Output Intent dictionary embedding a valid ICC profile (typically sRGB for electronic invoices).

  • Metadata Synchronicity: The classic Document Information Dictionary (Info) and the Extensible Metadata Platform (XMP) data streams must remain identical.

  • Associated Files Specification: Embedded attachments must be explicitly linked to the document catalog using the /AF key. The corresponding file specification dictionary must contain the /AFRelationship attribute set strictly to /Data for the primary XML payload.

The Role of the GenerateMetadata Method

The GenerateMetadata method is designed to abstract the low-level complexities of PDF/A and Factur-X compliance. It performs the following critical tasks:

  1. Automatic Dictionary Synchronization: The method automatically synchronizes the provided metadata values between the modern XMP XML stream and the legacy internal Document Information Dictionary (Info). This eliminates the risk of "synchronicity deviation" errors during validation.

  2. Automated Structural Tagging for PDF/A-3A: When targeting the PDF/A-3A (Accessible) profile, the standard strictly mandates that the document must contain logical structure tags. To fulfill this requirement without forced manual layout parsing, GenerateMetadata automatically initializes and injects a compliant, empty logical structure tree into the document.

    Note  Note

    While GenerateMetadata creates the required structural scaffolding for PDF/A-3A, attaching non-PDF files (such as raw .txt or .csv logs) will still invalidate an A-level document because loose data streams cannot be tagged. For multi-format attachments, use PDF/A-3B or PDF/A-3U.

Code Example: Compiling a Hybrid Document

The following example initializes a new PDF document, embeds an sRGB color profile, generates synchronized archival and invoicing metadata, and attaches both a mandatory Factur-X XML dataset and a supplementary plain text log file.

C#
public class HybridPdfFactory
{
    /// <summary>
    /// Generates a compliant PDF/A-3B document containing an embedded Factur-X XML invoice.
    /// </summary>
    public static void CreateInvoiceBundle()
    {
        // Initialize a new PDF document instance.
        using (var doc = PdfDocument.CreateNew())
        {
            // PDF/A requires the document to contain at least one visual page.
            var page = doc.Pages.InsertPageAt(0, 595, 842);

            // 1. Embed the ICC Color Profile as an Output Intent.
            // This guarantees identical visual rendering across different hardware platforms.
            byte[] iccProfile = System.IO.File.ReadAllBytes(@"C:\Profiles\sRGB-v4.icc");
            var outputIntent = new PdfOutputIntentItem(doc, "sRGB", PdfStandard.PdfA);
            outputIntent.Profile = PdfColorSpace.ICCBased(doc, 3, iccProfile, PdfColorSpace.DeviceRGB());
            doc.OutputIntents = new PdfOutputIntentsCollection(doc) { outputIntent };

            // Define structural archiving compliance parameters.
            // PDF/A-3B (Basic) is the industry standard for hybrid invoice processing.
            string pdfAPart = "3";
            string pdfAConformance = "B";
            DateTimeOffset currentTimestamp = DateTimeOffset.Now;

            // 2. Broadcast XMP Archival & Factur-X Extended Metadata.
            // This method automatically synchronizes the XMP stream with the internal Info dictionary.
            // If pdfAConformance is set to "A", it also generates the mandatory empty structured data tree.
            doc.GenerateMetadata(
                title: "Invoice #1",
                author: null,
                subject: "Patagames Corporate",
                producer: null,
                keywords: null,
                creator: null,
                creationDate: currentTimestamp,
                modificationDate: currentTimestamp,
                pdfAPart: pdfAPart,
                pdfAConformance: pdfAConformance,
                facturXProfile: FacturXProfile.Extended,
                facturXVer: "1.0"
            );

            // Serialize the timestamp to a compliant internal PDF dictionary string.
            string strictPdfDate = ToPdfDateString(currentTimestamp);

            // 3. Embed the primary Factur-X XML Invoice payload.
            byte[] invoiceXmlBytes = System.IO.File.ReadAllBytes(@"C:\Data\factur-x.xml");
            var invoiceAttachment = new PdfAttachment(doc, "factur-x.xml", invoiceXmlBytes);

            // CRITICAL: Establish the mandatory semantic relationship bond.
            invoiceAttachment.FileSpecification.AFRelationship = Relationship.Data;
            invoiceAttachment.FileSpecification.EmbeddedFile.FileType = "text/xml";
            invoiceAttachment.FileSpecification.EmbeddedFile.ModificationDate = strictPdfDate;

            doc.Attachments.Add(invoiceAttachment);

            // 4. (Optional) Embed supporting supplementary documentation.
            // PDF/A-3 explicitly permits attaching other arbitrary file formats.
            byte[] sourceBytes = System.IO.File.ReadAllBytes(@"C:\Data\list_of_measurement.xlsx");
            var companionAttachment = new PdfAttachment(doc, "invoice.xlsx", sourceBytes);

            companionAttachment.FileSpecification.AFRelationship = Relationship.Source;
            companionAttachment.FileSpecification.EmbeddedFile.FileType = "application/octet-stream";
            companionAttachment.FileSpecification.EmbeddedFile.ModificationDate = strictPdfDate;

            doc.Attachments.Add(companionAttachment);

            // 5. Save the document targeting the PDF 1.7 baseline specification.
            string outputPath = @"C:\Output\FacturX_Invoice.pdf";
            doc.Save(outputPath, SaveFlags.NoIncremental, 17);
        }
    }

    /// <summary>
    /// Converts a <see cref="DateTimeOffset"/> into a PDF compliant date string.
    /// Format: D:YYYYMMDDHHmmSSOHH'mm' (e.g., "D:20231025143000+03'00'").
    /// </summary>
    /// <param name="dto">The date and time offset to convert.</param>
    /// <returns>A string representation of the date formatted for PDF standards.</returns>
    public static string ToPdfDateString(DateTimeOffset dto)
    {
        string baseDate = dto.ToString("'D:'yyyyMMddHHmmss");

        TimeSpan offset = dto.Offset;
        if (offset == TimeSpan.Zero)
            return baseDate + "Z";

        string sign = offset.TotalSeconds >= 0 ? "+" : "-";
        int hours = Math.Abs(offset.Hours);
        int minutes = Math.Abs(offset.Minutes);
        return $"{baseDate}{sign}{hours:D2}'{minutes:D2}'";
    }
}
See Also