Click or drag to resize

PDF Page

Overview

PDF Page is the basic and important component of PDF Document. A PdfPage object is retrieved from a PDF document. Page level APIs provide functions to parse, render, edit (includes creating, deleting and flattening) a page, retrieve PDF annotations, read and set the properties of a page, etc.

Load and parse a PDF page

For most cases, A PDF page needs to be parsed before it is rendered or processed. When you get the page for the first time, it is not loaded or parsed. For example:

C#
using (var doc = PdfDocument.Load(@"c:\sample.pdf"))
{
    PdfPage page = doc.Pages[0];
    Console.WriteLine($"IsLoaded = {page.IsLoaded}, IsParsed = {page.IsParsed}");
    //...
}

The result of running is

Output
IsLoaded = False, IsParsed = False

But as soon as you call any method, or read or set any property that requires the page to be loaded/parsed, it will be done automatically.

C#
//...
float width = page.Width;
Console.WriteLine($"IsLoaded = {page.IsLoaded}, IsParsed = {page.IsParsed}");
//...
Output
IsLoaded = True, IsParsed = True

Thus, for most cases, you do not need to worry about loading/parsing.

Progressive page loading

For some pages, parsing can take a long time. In this case, you can use the progressive page loading technique. Progressive loading, works in such a way that you can pause a parsing, process user input or display a progress, and then continue the parsing.

The fisrt step is to call the StartProgressiveLoad method.

C#
using (var doc = PdfDocument.Load(@"c:\sample.pdf"))
{
    PdfPage page = doc.Pages[0];

    page.StartProgressiveLoad();
    Console.WriteLine($"Start progressive loading: IsLoaded = {page.IsLoaded}, IsParsed = {page.IsParsed}");
    //...
}
Output
Start progressive loading: IsLoaded = True, IsParsed = False

In the situation when the page is loaded but not yet parsed, some properties and methods are available, but some are not. For example, you can read the width or height of the page, but you cannot access the text of the page since the content has not yet been parsed.

C#
//...
float width = page.Width;
Console.WriteLine($"Width = {width}, IsLoaded = {page.IsLoaded}, IsParsed = {page.IsParsed}");

//try to extract text length from the page
int len = page.Text.CountChars;
Console.WriteLine($"Text Length = {len}");
//...

The result of running is

Output
Width = 531, IsLoaded = True, IsParsed = False
Text Length = 0

Besides, accessing the properties of PdfPage object no longer results in automatic parsing. As you can see the IsParsed is still false.

After calling the StartProgressiveLoad method, you need to call the ContinueProgressiveLoad method until the page is parsed.

C#
while (page.ContinueProgressiveLoad() == ProgressiveStatus.ToBeContinued)
{
    Console.WriteLine($"Parsing...");
}
Console.WriteLine($"Parsing completed: IsLoaded = {page.IsLoaded}, IsParsed = {page.IsParsed}");
Output
Parsing...
Parsing...
Parsing...
Parsing completed: IsLoaded = True, IsParsed = True

To indicate whether to interrupt parsing, you need to handle the ProgressiveRenderProgressiveRender event and return True or False via the NeedPause property.

The complete example:

C#
using (var doc = PdfDocument.Load(@"c:\sample.pdf"))
{
    PdfPage page = doc.Pages[0];
    page.ProgressiveRender += (s, e) =>
    {
        e.NeedPause = true; //or false depends on your own logic of page parsing
    };
    page.StartProgressiveLoad();
    Console.WriteLine($"Start progressive loading: IsLoaded = {page.IsLoaded}, IsParsed = {page.IsParsed}");

    while (page.ContinueProgressiveLoad() == ProgressiveStatus.ToBeContinued)
    {
        Console.WriteLine($"Parsing...");
    }
    Console.WriteLine($"Parsing completed: IsLoaded = {page.IsLoaded}, IsParsed = {page.IsParsed}");
}
Output
Start progressive loading: IsLoaded = True, IsParsed = False
Parsing...
Parsing...
Parsing...
Parsing completed: IsLoaded = True, IsParsed = True
Render a PDF page

PDF rendering is realized through the Pdfium renderer, a graphic engine that is used to render page to a bitmap or platform graphics device. Patagames PDF SDK provides APIs to set rendering options/flags, for example set flag to decide whether to render annotations, whether to draw image anti-aliasing and path anti-aliasing.

How to render a page to a bitmap

C#
using Patagames.Pdf;
using Patagames.Pdf.Net;
using Patagames.Pdf.Enums;
using System.Drawing.Imaging;
...
using (var doc = PdfDocument.Load("sample.pdf"))
{
    var page = doc.Pages[0];
    int width = (int)page.Width;
    int height = (int)page.Height;

    using (var bitmap = new PdfBitmap(width, height, true))
    {
        bitmap.FillRect(0, 0, width, height, FS_COLOR.White);
        page.Render(bitmap, 0, 0, width, height, PageRotate.Normal, RenderFlags.FPDF_NONE);

        bitmap.Image.Save("sample.png", ImageFormat.Png);
    }
}

How to render page and annotations

C#
using Patagames.Pdf;
using Patagames.Pdf.Net;
using Patagames.Pdf.Enums;
using System.Drawing.Imaging;
...
using (var doc = PdfDocument.Load("sample.pdf"))
{
    var page = doc.Pages[0];
    int width = (int)page.Width;
    int height = (int)page.Height;

    using (var bitmap = new PdfBitmap(width, height, true))
    {
        bitmap.FillRect(0, 0, width, height, FS_COLOR.White);
        page.Render(bitmap, 0, 0, width, height, PageRotate.Normal, RenderFlags.FPDF_ANNOT);

        bitmap.Image.Save("sample.png", ImageFormat.Png);
    }
}
Progressive page rendering

Just like with progressive page loading, you may want to use the progressive rendering technique. Progressive page rendering works exactly the same as progressive page loading. You can pause rendering, process user input, or display progress and then continue.

The complete example:

C#
using (var doc = PdfDocument.Load(@"sample.pdf"))
{
    PdfPage page = doc.Pages[0];
    int width = (int)page.Width;
    int height = (int)page.Height;
    page.ProgressiveRender += (s, e) =>
    {
        e.NeedPause = true; //or false depends on your own logic
    };

    using (var bitmap = new PdfBitmap(width, height, true))
    {
        bitmap.FillRect(0, 0, width, height, FS_COLOR.White);

        Console.WriteLine($"Start progressive render");
        ProgressiveStatus status = page.StartProgressiveRender(bitmap, 0, 0, width, height, PageRotate.Normal, RenderFlags.FPDF_ANNOT, null);
        while ( status == ProgressiveStatus.ToBeContinued)
        {
            Console.WriteLine($"Render in progress...");
            status = page.ContinueProgressiveLoad();
        }
    }
    Console.WriteLine($"Render complete");
}
Output
Start progressive render
Render in progress...
Render in progress...
Render in progress...
Progressive render complete
Page coordinate system

Coordinate systems define the canvas on which all painting occurs. They determine the position, orientation, and size of the text, graphics, and images that appear on a page. This section describes each of the coordinate systems used in SDK, how they are related, and how transformations among them are specified.

Coordinate spaces

Paths and positions are defined in terms of pairs of coordinates on the Cartesian plane. A coordinate pair is a pair of real numbers x and y that locate a point horizontally and vertically within a two-dimensional coordinate space. A coordinate space is determined by the following properties with respect to the current page:

  • The location of the origin

  • The orientation of the x and y axes

  • The lengths of the units along each axis

SDK defines several coordinate spaces in which the coordinates specifying graphics objects are interpreted. The following sections describe these spaces and the relationships among them.

Device space

The contents of a page ultimately appear on a raster output device such as a display or a printer.

The coordinate system for the PdfPageRender or PdfPageStartProgressiveRender methods is based on device coordinates, and the basic unit of measure when rendering is the device unit (typically, the pixel; and always pixels when rendering into PdfBitmap). Points on the device canvas (PdfBitmap pixels) are described by x- and y-coordinate pairs, with the x-coordinates increasing to the right and the y-coordinates increasing from top to bottom.

User space

To avoid the device-dependent effects of specifying objects in device space, PDF defines a device-independent coordinate system that always bears the same relationship to the current page, regardless of the output device on which printing or displaying occurs. This device-independent coordinate system is called user space.

The PdfPageCropBox property specifies the rectangle of user space corresponding to the visible area of the intended output medium (display window or printed page). The positive x axis extends horizontally to the right and the positive y axis vertically upward, as in standard mathematical practice (subject to alteration by the PdfPageRotation property).

The length of a unit along both the x and y axes is set by the UserUnit entry in the PdfPageDictionary. If that entry is not present or supported, the default value of 1⁄72 inch is used.

Mapping from one type of coordinate to another

As mentioned above, the Rotation property can change the direction of the axes. For example, if you render like this:

C#
//...
page.Render(bitmap, 0, 0, (int)page.Height, (int)page.Width, PageRotate.Rotate270, RenderFlags.FPDF_NONE);
//...

The direction of the axes will be:

page coordinate system sdk
Device Space Coordinate System
Device Space
User Space Coordinate System
User Space

Occasionally, you may need to map from device space coordinates to user space coordinates. You can easily accomplish this by using the PageToDevice and DeviceToPage methods available in the PdfPage class. For example:

C#
//...
page.Render(bitmap, 0, 0, (int)page.Height, (int)page.Width, PageRotate.Rotate270, RenderFlags.FPDF_NONE);

float pagePointX = 10.0f;
float pagePointY = 10.0f;
int devicePointX;
int devicePointY;
page.PageToDevice(0, 0, (int)page.Height, (int)page.Width, PageRotate.Rotate270, pagePointX, pagePointY, out devicePointX, out devicePointY);
//...
How to get the actual page size
C#
//...

var page = doc.Pages[0];

//PDF unit size is
float pdfDpi = 72.0f;
if (page.Dictionary.ContainsKey("UserUnit"))
    pdfDpi = page.Dictionary["UserUnit"].As<PdfTypeNumber>().FloatValue / 72;

//The number of dots per inch for a specific output device. For example, 96 pixels per inch for a monitor.
float deviceDpiX = 96.0f;
float deviceDpiY = 96.0f;

//The actual width and height will be
int width = (int)(page.Width / pdfDpi * deviceDpiX);
int height = (int)(page.Height / pdfDpi * deviceDpiY);

//...
How to create a PDF page and set the size

You can insert an empty page at any location in the existing PDF document. The below code snippet explains the same.

C#
//...

//Add a new Letter-sized page to the beginning of the document.
int pageIndex = 0;
float width = 8.5f * 72;
float height = 11.0f * 72;
PdfPage page = doc.Pages.InsertPageAt(pageIndex, width, height);

//Add a new A4-sized page at the end of your document.
pageIndex = doc.Pages.Count;
width = 8.3f * 72;
height = 11.7f * 72;
page = doc.Pages.InsertPageAt(pageIndex, width, height);

//...
How to delete a PDF page
C#
//...

// Remove a PDF page by page index.
doc.Pages.DeleteAt(pageIndex);

//...
How to flatten a PDF page
C#
//...

//Regenerate the page content to fix any page content issues that could lead to flatten issues.
//Not required but recommended.
page.GenerateContent();

//Flatten a PDF page
page.FlattenPage(FlattenFlags.NormalDisplay);

//...
Importing pages from an existing document.

Patagames PDF SDK allows you to import a page or import a range of pages from one document to the other. The following code sample illustrates how to import a range of pages from an existing document.

C#
//Load the PDF document.
using (var inputDoc = PdfDocument.Load(@"input.pdf")) // C# Read source PDF File #1
{
    //Create a new PDF document.
    using (var targetDoc = PdfDocument.CreateNew())
    {
        //Import all the pages to the new PDF document.
        targetDoc.Pages.ImportPages(
            inputDoc, 
            string.Format("1-{0}", inputDoc.Pages.Count), 
            0);
        //Save the document.
        targetDoc.Save(@"target.pdf", SaveFlags.NoIncremental);
    }
}
Splitting a PDF file to individual pages

The SDK allows to split the pages of an existing PDF document into multiple individual PDF documents. The following code snippet explains the same.

C#
//Load the PDF document.
using (var sourceDoc = PdfDocument.Load(@"source.pdf"))
{
    foreach (var page in sourceDoc.Pages)
    {
        //Create a new PDF document.
        using (var targetDoc = PdfDocument.CreateNew())
        {
            //Import all the pages to the new PDF document.
            targetDoc.Pages.ImportPages(sourceDoc, $"{page.PageIndex + 1}", 0);
            //Save the document.
            targetDoc.Save($"target-page{page.PageIndex}.pdf", SaveFlags.NoIncremental);
            //Close page to reduce memory consumption when splitting documents with many pages.
            page.Dispose();
        }
    }
}
See Also