Click or drag to resize

PDF Document

Overview

A PdfDocument object can be constructed with an existing PDF file from file path, memory buffer, an input stream and a PdfCustomLoader object. A PdfDocument object is used for document level operation, such as opening and closing files, getting page, annotation, metadata and etc.

Open in full size

Pdf Document
Create PDF Document

Create a PDF document from a scratch

C#
using (var doc = PdfDocument.CreateNew())
{
    //...
}
Note Note

The above code creates a new PDF document without any pages.

Load PDF Document

Load an existing PDF document from file path

The following example shows how to load an existing document from physical path.

C#
using (var doc = PdfDocument.Load(@"c:\sample.pdf"))
{
    //...
}

Load an existing PDF document from an input stream

You can open an existing document from stream as shown below

C#
Stream stream = new System.IO.FileStream(@"c:\sample.pdf", FileMode.Open);

using (var doc = PdfDocument.Load(stream))
{
    //...
}

stream.Close();
Important note Important

The application should maintain the stream resource being valid until the PDF document is closed.

Probably you noticed how even a very large PDF document with hundreds thousands of pages loads very fast. This becomes possible because there is not need to parse the entire document body. At first, only the cross-reference table is read and all the rest of data is loaded when needed. As a result, the stream (as well as other resources such as files or memory buffers) must remain open.

Another consequence is that the stream must support seeking.

The Load(Stream stream, PdfForms forms, string password, bool leaveOpen) method takes four parameters, the last of which is leaveOpen. By default, true is passed through this parameter. If you pass false, then the stream will be automatically closed in the PdfDocument.Dispose() method.

C#
Stream stream = new System.IO.FileStream(@"c:\sample.pdf", FileMode.Open);
using (var doc = PdfDocument.Load(stream, null, null, false))
{
    //...
}
//You shouldn't call stream.Close() anymore.
//stream.Close()

Load an existing PDF document from a memory buffer

You can open an existing document from byte array as shown in the below code snippet.

C#
byte[] content = System.IO.File.ReadAllBytes(@"c:\sample.pdf");

using (var doc = PdfDocument.Load(content))
{
    //...
}

The content resource must be maintained until the document is closed.

Load an existing PDF document by using PdfCustomLoader

C#
byte[] content = System.IO.File.ReadAllBytes(@"c:\sample.pdf");

var loader = new PdfCustomLoader((uint)content.Length, content);
loader.LoadBlock += (s, e) =>
  {
      Array.Copy(e.UserData, e.Position, e.Buffer, 0, e.Buffer.Length);
      e.ReturnValue = true;
  };
using (var doc = PdfDocument.Load(loader))
{
    //...
}

And you can also use a stream to reduce memory consumption (there is no need to load the entire file into memory).

C#
    // ...

    Stream stream = System.IO.File.OpenRead(@"c:\sample.pdf");

    var loader = new PdfCustomLoader((uint)stream.Length);
    loader.Tag = stream;
    loader.LoadBlock += Loader_LoadBlock;

    using (var doc = PdfDocument.Load(loader))
    {
        //...
    }

// ...

private void Loader_LoadBlock(object sender, CustomLoadEventArgs e)
{
    var loader = sender as PdfCustomLoader;
    var stream = loader.Tag as Stream;
    stream.Seek(e.Position, SeekOrigin.Begin);
    stream.Read(e.Buffer, 0, e.Buffer.Length);
    e.ReturnValue = true;
}

Load PDF document and get the first page of the document

C#
using (var doc = PdfDocument.Load(@"c:\sample.pdf"))
{
    var fistPage = doc.Pages[0];
    //...
}

Opening an encrypted PDF document

You can open an existing encrypted PDF document from either the file system or the stream or the byte array using the following overloads as shown below

C#
using (var doc = PdfDocument.Load(@"c:\sample.pdf", null, "password"))
{
    //...
}

using (var doc = PdfDocument.Load(stream, null, "password"))
{
    //...
}

using (var doc = PdfDocument.Load(byteArray, null, "password"))
{
    //...
}

Load PDF document with interactive forms enabled

C#
using (var doc = PdfDocument.Load(@"c:\sample.pdf", new PdfForms(), "password"))
{
    //...
}

The Load(string path, PdfForms forms, string password) method takes three parameters. The PdfForms object, passed through 2nd parameter (forms), is used by the PdfViewer to provide user interaction with acroforms. If forms is null, user interaction with acroforms is not provided by the viewer.

Save PDF Document

Save a PDF to a file

C#
using (var doc = PdfDocument.Load(@"c:\sample.pdf"))
{
    //...
    // Do operations for the PDF document.
    //...
    // Save the changes for the PDF document.
    doc.Save(@"c:\sample-modified.pdf", Patagames.Pdf.Enums.SaveFlags.NoIncremental);
}

As mentioned above, the SDK reads the data as needed and for this purpose input sources must remain open and consistent. Consequently, saving a document to the same resource (file, stream, memory buffer) is impossible.

Save a PDF to an output stream

C#
using (var doc = PdfDocument.Load(@"c:\sample.pdf"))
{
    //...
    // Do operations for the PDF document.
    //...
    // Save the changes for the PDF document.
    using (var stream = new System.IO.FileStream(@"c:\sample-modified.pdf", FileMode.CreateNew))
        doc.Save(stream, Patagames.Pdf.Enums.SaveFlags.NoIncremental);
}

Save a PDF to a stream using a WriteBlock event

C#
using (var doc = PdfDocument.Load(@"c:\sample.pdf"))
{
    //...
    // Do operations for the PDF document.
    //...
    // Save the changes for the PDF document.
    doc.WriteBlock += (s, e) =>
    {
        using (var stream = new System.IO.FileStream(@"c:\sample-modified.pdf", FileMode.OpenOrCreate, FileAccess.ReadWrite))
        {
            stream.Seek(0, SeekOrigin.End);
            stream.Write(e.Buffer, 0, e.Buffer.Length);
        }

    };
    doc.Save(Patagames.Pdf.Enums.SaveFlags.NoIncremental);
}

Incremental saving

The contents of a PDF file can be updated incrementally without rewriting the entire file. Changes are appended to the end of the file, leaving its original contents intact.

The main advantages of updating a file this way:

  • In some cases, incremental saving is the only way to save changes to a document. An accepted practice for minimizing the risk of data loss when saving a document is to write it to a new file and rename the new file to replace the old one. However, in certain contexts, such as when editing a document across an HTTP connection or using OLE embedding (a Windows-specific technology), it is not possible to overwrite the contents of the original file in this manner. Incremental updates can be used to save changes to documents in these contexts.

  • Once a document has been signed, all changes made to the document must be saved using incremental updates, since altering any existing bytes in the file invalidates existing signatures.

  • Small changes to a large document can be saved quickly.

In addition, because the original contents of the document are still present in the file, it is possible to undo saved changes by deleting one or more addenda.

For incremental saving use the SaveFlags.Incremental instead.

C#
using (var doc = PdfDocument.Load(@"c:\sample.pdf"))
{
    //...
    // Do operations for the PDF document.
    //...
    // Save the changes for the PDF document.
    doc.Save(@"c:\sample-modified.pdf", Patagames.Pdf.Enums.SaveFlags.Incremental);
}

Remove orphaned objects and references

In a PDF file not all of objects are referenced from the same one page (an object of the PAGE type). Some objects may be referenced from other pages of the document. This means that while the PDF document is edited, the object in the file should not be changed, because this can lead to distorting the document in multiple places at once. Often, this is not what you need. For instance, if you change the font on one page, it can also change on another page if that page refers to the same object. Additionally, the layout of the pages where no changes were made may break. Therefore, it is advisable to create a copy of the object and modify it, then replace the reference to the original object with the reference to the copy. At the same time, you should not delete the old object, because there still may be references to it from other pages of the document.

This approach leads to a possibility that at some moment the old object may lose its last reference. Typically, such a collision remains unnoticed by PDF viewing and editing programs, and such orphaned objects are simply ignored. As a result, in some cases the file size may uncontrollably grow.

To solve this issue use SaveFlags.RemoveUnusedObjects

C#
using (var doc = PdfDocument.Load(@"c:\sample.pdf"))
{
    //...
    // Do operations for the PDF document.
    //...
    // Save the changes for the PDF document.
    doc.Save(@"c:\sample-modified.pdf", SaveFlags.NoIncremental | SaveFlags.RemoveUnusedObjects);
}

Object Streams

Pdfium.Net SDK supports saving of an object stream, which contains a sequence of PDF objects. The purpose of object streams is to allow a greater number of PDF objects to be compressed, thereby substantially reducing the size of PDF files.

Any PDF object can appear in an object stream, with the following exceptions:

  • Stream objects

  • Objects with a generation number other than zero

  • A document’s encryption dictionary

  • An object representing the value of the Length entry in an object stream dictionary

To save with object streams use SaveFlags.ObjectStream

C#
using (var doc = PdfDocument.Load(@"c:\sample.pdf"))
{
    //...
    // Do operations for the PDF document.
    //...
    // Save the changes for the PDF document.
    doc.Save(@"c:\sample-modified.pdf", SaveFlags.NoIncremental | SaveFlags.ObjectStream);
}
Working with document properties

Patagames PDF SDK allows you to set, read and modify the document information of a PDF like Author, CreationDate, Subject, Title, and Producer etc. The corresponding properties of the PdfDocument class provide access to this information.

The following code snippet illustrates how to set PDF document information.

C#
//Create a new PDF document.
using (var doc = PdfDocument.CreateNew())
{
    //Set document information.
    doc.Author = "Patagames";
    doc.CreationDate = DateTime.Now.ToString();
    doc.Creator = "Patagames PDF SDK";
    doc.Keywords = "PDF";
    doc.ModificationDate = DateTime.Now.ToString();
    doc.Producer = "Patagames";
    doc.Subject = "Document information demo";
    doc.Title = "Hello World Demo";

    //Add a page to the document.
    var page = doc.Pages.InsertPageAt(0, 8.5f * 72, 11.0f * 72);

    //Create the font.
    var font = PdfFont.CreateStock(doc, FontStockNames.Arial);

    //Create text object and add it ot the page
    var txt = PdfTextObject.Create("Hello World!", 10, 10, font, 12);
    page.PageObjects.Add(txt);

    //Generate page content
    page.GenerateContent();

    //Save the document
    doc.Save("Sample.pdf", SaveFlags.NoIncremental);
}
Printing PDF document to default printer

To print a PDF document, the following assemblies have to be added as references to the project.

  • Patagames.Pdf.WinForms.dll - for Windows Forms Applications

  • Patagames.Pdf.WPF.dll - for WPF Applications

C#
using (var doc = PdfDocument.Load(@"c:\sample.pdf"))
{
    var printDoc = new PdfPrintDocument(doc);
    printDoc.Print();
}

The above code shows a standard print progress window. If you want to suppress it, modify the code like shown below

C#
using (var doc = PdfDocument.Load(@"c:\sample.pdf"))
{
    var printDoc = new PdfPrintDocument(doc);
    printDoc.PrintController = new StandardPrintController();
    printDoc.Print();
}

Because the PdfPrintDocument is derived from standard PrintDocument class you can use standard Microsoft Windows print dialog box (PrinterDialog) that configures a PrintDocument according to user input and then prints a document.

C#
using (var doc = PdfDocument.Load(@"c:\sample.pdf"))
{
    //create an instance of PrintDocument class
    var printDoc = new PdfPrintDocument(doc);

    //Create a standard print dialog box
    var dlg = new PrintDialog();
    dlg.AllowCurrentPage = true;
    dlg.AllowSomePages = true;
    dlg.UseEXDialog = true;
    //sets the PrintDocument used to obtain PrinterSettings.
    dlg.Document = printDoc;
    //show PrinterDialog and print pdf document
    if (dlg.ShowDialog() == DialogResult.OK)
        printDoc.Print();
}
Closing a document

After the document manipulation and save operation are completed, you should dispose the instance of PdfDocument, in order to release all the memory consumed by PDF DOM.

Split and Merge PDF Documents

You can merge multiple PDF documents as well as split a PDF document into multiple ones using the page import method described here: Importing pages from an existing document

See Also