![]() | PDF Document |
This topic contains the following sections:
A PdfDocument object can be constructed with an existing PDF file from file path, memory buffer, an input stream and a PdfCustomLoader object. A PdfDocument object is used for document level operation, such as opening and closing files, getting page, annotation, metadata and etc.
using (var doc = PdfDocument.CreateNew()) { //... }
![]() |
---|
The above code creates a new PDF document without any pages. |
The following example shows how to load an existing document from physical path.
using (var doc = PdfDocument.Load(@"c:\sample.pdf")) { //... }
You can open an existing document from stream as shown below
Stream stream = new System.IO.FileStream(@"c:\sample.pdf", FileMode.Open); using (var doc = PdfDocument.Load(stream)) { //... } stream.Close();
![]() |
---|
The application should maintain the stream resource being valid until the PDF document is closed. |
Probably you noticed how even a very large PDF document with hundreds thousands of pages loads very fast. This becomes possible because there is not need to parse the entire document body. At first, only the cross-reference table is read and all the rest of data is loaded when needed. As a result, the stream (as well as other resources such as files or memory buffers) must remain open.
Another consequence is that the stream must support seeking.
The Load(Stream stream, PdfForms forms, string password, bool leaveOpen) method takes four parameters, the last of which is leaveOpen. By default, true is passed through this parameter. If you pass false, then the stream will be automatically closed in the PdfDocument.Dispose() method.
Stream stream = new System.IO.FileStream(@"c:\sample.pdf", FileMode.Open); using (var doc = PdfDocument.Load(stream, null, null, false)) { //... } //You shouldn't call stream.Close() anymore. //stream.Close()
You can open an existing document from byte array as shown in the below code snippet.
byte[] content = System.IO.File.ReadAllBytes(@"c:\sample.pdf"); using (var doc = PdfDocument.Load(content)) { //... }
The content resource must be maintained until the document is closed.
byte[] content = System.IO.File.ReadAllBytes(@"c:\sample.pdf"); var loader = new PdfCustomLoader((uint)content.Length, content); loader.LoadBlock += (s, e) => { Array.Copy(e.UserData, e.Position, e.Buffer, 0, e.Buffer.Length); e.ReturnValue = true; }; using (var doc = PdfDocument.Load(loader)) { //... }
And you can also use a stream to reduce memory consumption (there is no need to load the entire file into memory).
// ... Stream stream = System.IO.File.OpenRead(@"c:\sample.pdf"); var loader = new PdfCustomLoader((uint)stream.Length); loader.Tag = stream; loader.LoadBlock += Loader_LoadBlock; using (var doc = PdfDocument.Load(loader)) { //... } // ... private void Loader_LoadBlock(object sender, CustomLoadEventArgs e) { var loader = sender as PdfCustomLoader; var stream = loader.Tag as Stream; stream.Seek(e.Position, SeekOrigin.Begin); stream.Read(e.Buffer, 0, e.Buffer.Length); e.ReturnValue = true; }
using (var doc = PdfDocument.Load(@"c:\sample.pdf")) { var fistPage = doc.Pages[0]; //... }
You can open an existing encrypted PDF document from either the file system or the stream or the byte array using the following overloads as shown below
using (var doc = PdfDocument.Load(@"c:\sample.pdf", null, "password")) { //... } using (var doc = PdfDocument.Load(stream, null, "password")) { //... } using (var doc = PdfDocument.Load(byteArray, null, "password")) { //... }
using (var doc = PdfDocument.Load(@"c:\sample.pdf", new PdfForms(), "password")) { //... }
The Load(string path, PdfForms forms, string password) method takes three parameters. The PdfForms object, passed through 2nd parameter (forms), is used by the PdfViewer to provide user interaction with acroforms. If forms is null, user interaction with acroforms is not provided by the viewer.
using (var doc = PdfDocument.Load(@"c:\sample.pdf")) { //... // Do operations for the PDF document. //... // Save the changes for the PDF document. doc.Save(@"c:\sample-modified.pdf", Patagames.Pdf.Enums.SaveFlags.NoIncremental); }
As mentioned above, the SDK reads the data as needed and for this purpose input sources must remain open and consistent. Consequently, saving a document to the same resource (file, stream, memory buffer) is impossible.
using (var doc = PdfDocument.Load(@"c:\sample.pdf")) { //... // Do operations for the PDF document. //... // Save the changes for the PDF document. using (var stream = new System.IO.FileStream(@"c:\sample-modified.pdf", FileMode.CreateNew)) doc.Save(stream, Patagames.Pdf.Enums.SaveFlags.NoIncremental); }
using (var doc = PdfDocument.Load(@"c:\sample.pdf")) { //... // Do operations for the PDF document. //... // Save the changes for the PDF document. doc.WriteBlock += (s, e) => { using (var stream = new System.IO.FileStream(@"c:\sample-modified.pdf", FileMode.OpenOrCreate, FileAccess.ReadWrite)) { stream.Seek(0, SeekOrigin.End); stream.Write(e.Buffer, 0, e.Buffer.Length); } }; doc.Save(Patagames.Pdf.Enums.SaveFlags.NoIncremental); }
The contents of a PDF file can be updated incrementally without rewriting the entire file. Changes are appended to the end of the file, leaving its original contents intact.
The main advantages of updating a file this way:
In some cases, incremental saving is the only way to save changes to a document. An accepted practice for minimizing the risk of data loss when saving a document is to write it to a new file and rename the new file to replace the old one. However, in certain contexts, such as when editing a document across an HTTP connection or using OLE embedding (a Windows-specific technology), it is not possible to overwrite the contents of the original file in this manner. Incremental updates can be used to save changes to documents in these contexts.
Once a document has been signed, all changes made to the document must be saved using incremental updates, since altering any existing bytes in the file invalidates existing signatures.
Small changes to a large document can be saved quickly.
In addition, because the original contents of the document are still present in the file, it is possible to undo saved changes by deleting one or more addenda.
For incremental saving use the SaveFlags.Incremental instead.
using (var doc = PdfDocument.Load(@"c:\sample.pdf")) { //... // Do operations for the PDF document. //... // Save the changes for the PDF document. doc.Save(@"c:\sample-modified.pdf", Patagames.Pdf.Enums.SaveFlags.Incremental); }
In a PDF file not all of objects are referenced from the same one page (an object of the PAGE type). Some objects may be referenced from other pages of the document. This means that while the PDF document is edited, the object in the file should not be changed, because this can lead to distorting the document in multiple places at once. Often, this is not what you need. For instance, if you change the font on one page, it can also change on another page if that page refers to the same object. Additionally, the layout of the pages where no changes were made may break. Therefore, it is advisable to create a copy of the object and modify it, then replace the reference to the original object with the reference to the copy. At the same time, you should not delete the old object, because there still may be references to it from other pages of the document.
This approach leads to a possibility that at some moment the old object may lose its last reference. Typically, such a collision remains unnoticed by PDF viewing and editing programs, and such orphaned objects are simply ignored. As a result, in some cases the file size may uncontrollably grow.
To solve this issue use SaveFlags.RemoveUnusedObjects
using (var doc = PdfDocument.Load(@"c:\sample.pdf")) { //... // Do operations for the PDF document. //... // Save the changes for the PDF document. doc.Save(@"c:\sample-modified.pdf", SaveFlags.NoIncremental | SaveFlags.RemoveUnusedObjects); }
Pdfium.Net SDK supports saving of an object stream, which contains a sequence of PDF objects. The purpose of object streams is to allow a greater number of PDF objects to be compressed, thereby substantially reducing the size of PDF files.
Any PDF object can appear in an object stream, with the following exceptions:
Stream objects
Objects with a generation number other than zero
A document’s encryption dictionary
An object representing the value of the Length entry in an object stream dictionary
To save with object streams use SaveFlags.ObjectStream
using (var doc = PdfDocument.Load(@"c:\sample.pdf")) { //... // Do operations for the PDF document. //... // Save the changes for the PDF document. doc.Save(@"c:\sample-modified.pdf", SaveFlags.NoIncremental | SaveFlags.ObjectStream); }
Patagames PDF SDK allows you to set, read and modify the document information of a PDF like Author, CreationDate, Subject, Title, and Producer etc. The corresponding properties of the PdfDocument class provide access to this information.
The following code snippet illustrates how to set PDF document information.
//Create a new PDF document. using (var doc = PdfDocument.CreateNew()) { //Set document information. doc.Author = "Patagames"; doc.CreationDate = DateTime.Now.ToString(); doc.Creator = "Patagames PDF SDK"; doc.Keywords = "PDF"; doc.ModificationDate = DateTime.Now.ToString(); doc.Producer = "Patagames"; doc.Subject = "Document information demo"; doc.Title = "Hello World Demo"; //Add a page to the document. var page = doc.Pages.InsertPageAt(0, 8.5f * 72, 11.0f * 72); //Create the font. var font = PdfFont.CreateStock(doc, FontStockNames.Arial); //Create text object and add it ot the page var txt = PdfTextObject.Create("Hello World!", 10, 10, font, 12); page.PageObjects.Add(txt); //Generate page content page.GenerateContent(); //Save the document doc.Save("Sample.pdf", SaveFlags.NoIncremental); }
To print a PDF document, the following assemblies have to be added as references to the project.
Patagames.Pdf.WinForms.dll - for Windows Forms Applications
Patagames.Pdf.WPF.dll - for WPF Applications
using (var doc = PdfDocument.Load(@"c:\sample.pdf")) { var printDoc = new PdfPrintDocument(doc); printDoc.Print(); }
The above code shows a standard print progress window. If you want to suppress it, modify the code like shown below
using (var doc = PdfDocument.Load(@"c:\sample.pdf")) { var printDoc = new PdfPrintDocument(doc); printDoc.PrintController = new StandardPrintController(); printDoc.Print(); }
Because the PdfPrintDocument is derived from standard PrintDocument class you can use standard Microsoft Windows print dialog box (PrinterDialog) that configures a PrintDocument according to user input and then prints a document.
using (var doc = PdfDocument.Load(@"c:\sample.pdf")) { //create an instance of PrintDocument class var printDoc = new PdfPrintDocument(doc); //Create a standard print dialog box var dlg = new PrintDialog(); dlg.AllowCurrentPage = true; dlg.AllowSomePages = true; dlg.UseEXDialog = true; //sets the PrintDocument used to obtain PrinterSettings. dlg.Document = printDoc; //show PrinterDialog and print pdf document if (dlg.ShowDialog() == DialogResult.OK) printDoc.Print(); }
After the document manipulation and save operation are completed, you should dispose the instance of PdfDocument, in order to release all the memory consumed by PDF DOM.
You can merge multiple PDF documents as well as split a PDF document into multiple ones using the page import method described here: Importing pages from an existing document