Click or drag to resize

PDF Page

This topic contains the following sections:

Overview

PDF Page is the basic and important component of PDF Document. A PdfPage object is retrieved from a PDF document. Page level APIs provide functions to parse, render, edit (includes creating, deleting and flattening) a page, retrieve PDF annotations, read and set the properties of a page, etc.

Open in full size

Pdf Page
Load and parse a PDF page

For most cases, A PDF page needs to be parsed before it is rendered or processed. When you get the page for the first time, it is not loaded or parsed. For example:

C#
using (var doc = PdfDocument.Load(@"c:\sample.pdf"))
{
    PdfPage page = doc.Pages[0];
    Console.WriteLine($"IsLoaded = {page.IsLoaded}, IsParsed = {page.IsParsed}");
    //...
}

The result of running is

Output
IsLoaded = False, IsParsed = False

But as soon as you call any method, or read or set any property that requires the page to be loaded/parsed, it will be done automatically.

C#
//...
float width = page.Width;
Console.WriteLine($"IsLoaded = {page.IsLoaded}, IsParsed = {page.IsParsed}");
//...
Output
IsLoaded = True, IsParsed = True

Thus, for most cases, you do not need to worry about loading/parsing.

Progressive page loading

For some pages, parsing can take a long time. In this case, you can use the progressive page loading technique. Progressive loading, works in such a way that you can pause a parsing, process user input or display a progress, and then continue the parsing.

The fisrt step is to call the StartProgressiveLoad method.

C#
using (var doc = PdfDocument.Load(@"c:\sample.pdf"))
{
    PdfPage page = doc.Pages[0];

    page.StartProgressiveLoad();
    Console.WriteLine($"Start progressive loading: IsLoaded = {page.IsLoaded}, IsParsed = {page.IsParsed}");
    //...
}
Output
Start progressive loading: IsLoaded = True, IsParsed = False

In the situation when the page is loaded but not yet parsed, some properties and methods are available, but some are not. For example, you can read the width or height of the page, but you cannot access the text of the page since the content has not yet been parsed.

C#
//...
float width = page.Width;
Console.WriteLine($"Width = {width}, IsLoaded = {page.IsLoaded}, IsParsed = {page.IsParsed}");

//try to extract text length from the page
int len = page.Text.CountChars;
Console.WriteLine($"Text Length = {len}");
//...

The result of running is

Output
Width = 531, IsLoaded = True, IsParsed = False
Text Length = 0

Besides, accessing the properties of PdfPage object no longer results in automatic parsing. As you can see the IsParsed is still false.

After calling the StartProgressiveLoad method, you need to call the ContinueProgressiveLoad method until the page is parsed.

C#
while (page.ContinueProgressiveLoad() == ProgressiveStatus.ToBeContinued)
{
    Console.WriteLine($"Parsing...");
}
Console.WriteLine($"Parsing completed: IsLoaded = {page.IsLoaded}, IsParsed = {page.IsParsed}");
Output
Parsing...
Parsing...
Parsing...
Parsing completed: IsLoaded = True, IsParsed = True

To indicate whether to interrupt parsing, you need to handle the ProgressiveRenderProgressiveRender event and return True or False via the NeedPause property.

The complete example:

C#
using (var doc = PdfDocument.Load(@"c:\sample.pdf"))
{
    PdfPage page = doc.Pages[0];
    page.ProgressiveRender += (s, e) =>
    {
        e.NeedPause = true; //or false depends on your own logic of page parsing
    };
    page.StartProgressiveLoad();
    Console.WriteLine($"Start progressive loading: IsLoaded = {page.IsLoaded}, IsParsed = {page.IsParsed}");

    while (page.ContinueProgressiveLoad() == ProgressiveStatus.ToBeContinued)
    {
        Console.WriteLine($"Parsing...");
    }
    Console.WriteLine($"Parsing completed: IsLoaded = {page.IsLoaded}, IsParsed = {page.IsParsed}");
}
Output
Start progressive loading: IsLoaded = True, IsParsed = False
Parsing...
Parsing...
Parsing...
Parsing completed: IsLoaded = True, IsParsed = True
Render a PDF page

PDF rendering is realized through the Pdfium renderer, a graphic engine that is used to render page to a bitmap or platform graphics device. Patagames PDF SDK provides APIs to set rendering options/flags, for example set flag to decide whether to render annotations, whether to draw image anti-aliasing and path anti-aliasing.

How to render a page to a bitmap

C#
using Patagames.Pdf;
using Patagames.Pdf.Net;
using Patagames.Pdf.Enums;
using System.Drawing.Imaging;
...
using (var doc = PdfDocument.Load("sample.pdf"))
{
    var page = doc.Pages[0];
    int width = (int)page.Width;
    int height = (int)page.Height;

    using (var bitmap = new PdfBitmap(width, height, true))
    {
        bitmap.FillRect(0, 0, width, height, FS_COLOR.White);
        page.Render(bitmap, 0, 0, width, height, PageRotate.Normal, RenderFlags.FPDF_NONE);

        bitmap.GetImage().Save("sample.png", ImageFormat.Png);
    }
}

How to render page and annotations

C#
using Patagames.Pdf;
using Patagames.Pdf.Net;
using Patagames.Pdf.Enums;
using System.Drawing.Imaging;
...
using (var doc = PdfDocument.Load("sample.pdf"))
{
    var page = doc.Pages[0];
    int width = (int)page.Width;
    int height = (int)page.Height;

    using (var bitmap = new PdfBitmap(width, height, true))
    {
        bitmap.FillRect(0, 0, width, height, FS_COLOR.White);
        page.Render(bitmap, 0, 0, width, height, PageRotate.Normal, RenderFlags.FPDF_ANNOT);

        bitmap.GetImage().Save("sample.png", ImageFormat.Png);
    }
}
Progressive page rendering

Just like with progressive page loading, you may want to use the progressive rendering technique. Progressive page rendering works exactly the same as progressive page loading. You can pause rendering, process user input, or display progress and then continue.

The complete example:

C#
using (var doc = PdfDocument.Load(@"sample.pdf"))
{
    PdfPage page = doc.Pages[0];
    int width = (int)page.Width;
    int height = (int)page.Height;
    page.ProgressiveRender += (s, e) =>
    {
        e.NeedPause = true; //or false depends on your own logic
    };

    using (var bitmap = new PdfBitmap(width, height, true))
    {
        bitmap.FillRect(0, 0, width, height, FS_COLOR.White);

        Console.WriteLine($"Start progressive render");
        ProgressiveStatus status = page.StartProgressiveRender(bitmap, 0, 0, width, height, PageRotate.Normal, RenderFlags.FPDF_ANNOT, null);
        while ( status == ProgressiveStatus.ToBeContinued)
        {
            Console.WriteLine($"Render in progress...");
            status = page.ContinueProgressiveLoad();
        }
    }
    Console.WriteLine($"Render complete");
}
Output
Start progressive render
Render in progress...
Render in progress...
Render in progress...
Progressive render complete
Page coordinate system

Coordinate systems define the canvas on which all painting occurs. They determine the position, orientation, and size of the text, graphics, and images that appear on a page. This section describes each of the coordinate systems used in SDK, how they are related, and how transformations among them are specified.

Coordinate spaces

Paths and positions are defined in terms of pairs of coordinates on the Cartesian plane. A coordinate pair is a pair of real numbers x and y that locate a point horizontally and vertically within a two-dimensional coordinate space. A coordinate space is determined by the following properties with respect to the current page:

  • The location of the origin

  • The orientation of the x and y axes

  • The lengths of the units along each axis

SDK defines several coordinate spaces in which the coordinates specifying graphics objects are interpreted. The following sections describe these spaces and the relationships among them.

Device space

The contents of a page ultimately appear on a raster output device such as a display or a printer.

The coordinate system for the Render or StartProgressiveRender methods is based on device coordinates, and the basic unit of measure when rendering is the device unit (typically, the pixel; and always pixels when rendering into PdfBitmap). Points on the device canvas (PdfBitmap pixels) are described by x- and y-coordinate pairs, with the x-coordinates increasing to the right and the y-coordinates increasing from top to bottom.

User space

To avoid the device-dependent effects of specifying objects in device space, PDF defines a device-independent coordinate system that always bears the same relationship to the current page, regardless of the output device on which printing or displaying occurs. This device-independent coordinate system is called user space.

The PdfPageCropBox property specifies the rectangle of user space corresponding to the visible area of the intended output medium (display window or printed page). The positive x axis extends horizontally to the right and the positive y axis vertically upward, as in standard mathematical practice (subject to alteration by the PdfPageRotation property).

The length of a unit along both the x and y axes is set by the UserUnit entry in the PdfPageDictionary. If that entry is not present or supported, the default value of 1⁄72 inch is used.

Mapping from one type of coordinate to another

As mentioned above, the Rotation property can change the direction of the axes. For example, if you render like this:

C#
//...
page.Render(bitmap, 0, 0, (int)page.Height, (int)page.Width, PageRotate.Rotate270, RenderFlags.FPDF_NONE);
//...

The direction of the axes will be:

page coordinate system sdk
Device Space Coordinate System
Device Space
User Space Coordinate System
User Space

Occasionally, you may need to map from device space coordinates to user space coordinates. You can easily accomplish this by using the PageToDevice and DeviceToPage methods available in the PdfPage class. For example:

C#
//...
page.Render(bitmap, 0, 0, (int)page.Height, (int)page.Width, PageRotate.Rotate270, RenderFlags.FPDF_NONE);

float pagePointX = 10.0f;
float pagePointY = 10.0f;
int devicePointX;
int devicePointY;
page.PageToDevice(0, 0, (int)page.Height, (int)page.Width, PageRotate.Rotate270, pagePointX, pagePointY, out devicePointX, out devicePointY);
//...
Page Boundaries

A PDF page may be prepared either for a finished medium, such as a sheet of paper, or as part of a prepress process in which the content of the page is placed on an intermediate medium, such as film or an imposed reproduction plate. In the latter case, it is important to distinguish between the intermediate page and the finished page. The intermediate page may often include additional production-related content, such as bleeds or printer marks, that falls outside the boundaries of the finished page. To handle such cases, a PDF page can define as many as five separate boundaries to control various aspects of the imaging process:

  • The media box defines the boundaries of the physical medium on which the page is to be printed. It may include any extended area surrounding the finished page for bleed, printing marks, or other such purposes. It may also include areas close to the edges of the medium that cannot be marked because of physical limitations of the output device. Content falling outside this boundary can safely be discarded without affecting the meaning of the PDF file.

  • The crop box defines the region to which the contents of the page are to be clipped (cropped) when displayed or printed. Unlike the other boxes, the crop box has no defined meaning in terms of physical page geometry or intended use; it merely imposes clipping on the page contents. However, in the absence of additional information, the crop box determines how the page’s contents are to be positioned on the output medium. The default value is the page’s media box.

  • The bleed box defines the region to which the contents of the page should be clipped when output in a production environment. This may include any extra bleed area needed to accommodate the physical limitations of cutting, folding, and trimming equipment. The actual printed page may include printing marks that fall outside the bleed box. The default value is the page’s crop box.

  • The trim box defines the intended dimensions of the finished page after trimming. It may be smaller than the media box to allow for productionrelated content, such as printing instructions, cut marks, or color bars. The default value is the page’s crop box.

  • The art box defines the extent of the page’s meaningful content (including potential white space) as intended by the page’s creator. The default value is the page’s crop box.

These boundaries are specified by the MediaBox, CropBox, BleedBox, TrimBox, and ArtBox properties, respectively. All of them are rectangles expressed in default user space units. The crop, bleed, trim, and art boxes should not ordinarily extend beyond the boundaries of the media box. If they do, they are effectively reduced to their intersection with the media box. The below figure illustrates the relationships among these boundaries. (The crop box is not shown in the figure because it has no defined relationship with any of the other boundaries.)

page boundaries
How to get the actual page size
C#
//...

var page = doc.Pages[0];

//PDF unit size is
float pdfDpi = 72.0f;
if (page.Dictionary.ContainsKey("UserUnit"))
    pdfDpi = page.Dictionary["UserUnit"].As<PdfTypeNumber>().FloatValue / 72;

//The number of dots per inch for a specific output device. For example, 96 pixels per inch for a monitor.
float deviceDpiX = 96.0f;
float deviceDpiY = 96.0f;

//The actual width and height will be
int width = (int)(page.Width / pdfDpi * deviceDpiX);
int height = (int)(page.Height / pdfDpi * deviceDpiY);

//...
How to create a PDF page and set the size

You can insert an empty page at any location in the existing PDF document. The below code snippet explains the same.

C#
//...

//Add a new Letter-sized page to the beginning of the document.
int pageIndex = 0;
float width = 8.5f * 72;
float height = 11.0f * 72;
PdfPage page = doc.Pages.InsertPageAt(pageIndex, width, height);

//Add a new A4-sized page at the end of your document.
pageIndex = doc.Pages.Count;
width = 8.3f * 72;
height = 11.7f * 72;
page = doc.Pages.InsertPageAt(pageIndex, width, height);

//...
How to delete a PDF page
C#
//...

// Remove a PDF page by page index.
doc.Pages.DeleteAt(pageIndex);

//...
How to flatten a PDF page
C#
//...

//Regenerate the page content to fix any page content issues that could lead to flatten issues.
//Not required but recommended.
page.GenerateContent();

//Flatten a PDF page
page.FlattenPage(FlattenFlags.NormalDisplay);

//...
Importing pages from an existing document.

Patagames PDF SDK allows you to import a page or import a range of pages from one document to the other. The following code sample illustrates how to import a range of pages from an existing document.

C#
//Load the PDF document.
using (var inputDoc = PdfDocument.Load(@"input.pdf")) // C# Read source PDF File #1
{
    //Create a new PDF document.
    using (var targetDoc = PdfDocument.CreateNew())
    {
        //Import all the pages to the new PDF document.
        targetDoc.Pages.ImportPages(
            inputDoc, 
            string.Format("1-{0}", inputDoc.Pages.Count), 
            0);
        //Save the document.
        targetDoc.Save(@"target.pdf", SaveFlags.NoIncremental);
    }
}
Splitting a PDF file to individual pages

The SDK allows to split the pages of an existing PDF document into multiple individual PDF documents. The following code snippet explains the same.

C#
//Load the PDF document.
using (var sourceDoc = PdfDocument.Load(@"source.pdf"))
{
    foreach (var page in sourceDoc.Pages)
    {
        //Create a new PDF document.
        using (var targetDoc = PdfDocument.CreateNew())
        {
            //Import all the pages to the new PDF document.
            targetDoc.Pages.ImportPages(sourceDoc, $"{page.PageIndex + 1}", 0);
            //Save the document.
            targetDoc.Save($"target-page{page.PageIndex}.pdf", SaveFlags.NoIncremental);
            //Close page to reduce memory consumption when splitting documents with many pages.
            page.Dispose();
        }
    }
}
See Also