![]() | PDF Page |
This topic contains the following sections:
PDF Page is the basic and important component of PDF Document. A PdfPage object is retrieved from a PDF document. Page level APIs provide functions to parse, render, edit (includes creating, deleting and flattening) a page, retrieve PDF annotations, read and set the properties of a page, etc.
For most cases, A PDF page needs to be parsed before it is rendered or processed. When you get the page for the first time, it is not loaded or parsed. For example:
using (var doc = PdfDocument.Load(@"c:\sample.pdf")) { PdfPage page = doc.Pages[0]; Console.WriteLine($"IsLoaded = {page.IsLoaded}, IsParsed = {page.IsParsed}"); //... }
The result of running is
IsLoaded = False, IsParsed = False
But as soon as you call any method, or read or set any property that requires the page to be loaded/parsed, it will be done automatically.
//... float width = page.Width; Console.WriteLine($"IsLoaded = {page.IsLoaded}, IsParsed = {page.IsParsed}"); //...
IsLoaded = True, IsParsed = True
Thus, for most cases, you do not need to worry about loading/parsing.
For some pages, parsing can take a long time. In this case, you can use the progressive page loading technique. Progressive loading, works in such a way that you can pause a parsing, process user input or display a progress, and then continue the parsing.
The fisrt step is to call the StartProgressiveLoad() method.
using (var doc = PdfDocument.Load(@"c:\sample.pdf")) { PdfPage page = doc.Pages[0]; page.StartProgressiveLoad(); Console.WriteLine($"Start progressive loading: IsLoaded = {page.IsLoaded}, IsParsed = {page.IsParsed}"); //... }
Start progressive loading: IsLoaded = True, IsParsed = False
In the situation when the page is loaded but not yet parsed, some properties and methods are available, but some are not. For example, you can read the width or height of the page, but you cannot access the text of the page since the content has not yet been parsed.
//... float width = page.Width; Console.WriteLine($"Width = {width}, IsLoaded = {page.IsLoaded}, IsParsed = {page.IsParsed}"); //try to extract text length from the page int len = page.Text.CountChars; Console.WriteLine($"Text Length = {len}"); //...
The result of running is
Width = 531, IsLoaded = True, IsParsed = False Text Length = 0
Besides, accessing the properties of PdfPage object no longer results in automatic parsing. As you can see the IsParsed is still false.
After calling the StartProgressiveLoad method, you need to call the ContinueProgressiveLoad() method until the page is parsed.
while (page.ContinueProgressiveLoad() == ProgressiveStatus.ToBeContinued) { Console.WriteLine($"Parsing..."); } Console.WriteLine($"Parsing completed: IsLoaded = {page.IsLoaded}, IsParsed = {page.IsParsed}");
Parsing... Parsing... Parsing... Parsing completed: IsLoaded = True, IsParsed = True
To indicate whether to interrupt parsing, you need to handle the ProgressiveRenderProgressiveRender event and return True or False via the NeedPause property.
The complete example:
using (var doc = PdfDocument.Load(@"c:\sample.pdf")) { PdfPage page = doc.Pages[0]; page.ProgressiveRender += (s, e) => { e.NeedPause = true; //or false depends on your own logic of page parsing }; page.StartProgressiveLoad(); Console.WriteLine($"Start progressive loading: IsLoaded = {page.IsLoaded}, IsParsed = {page.IsParsed}"); while (page.ContinueProgressiveLoad() == ProgressiveStatus.ToBeContinued) { Console.WriteLine($"Parsing..."); } Console.WriteLine($"Parsing completed: IsLoaded = {page.IsLoaded}, IsParsed = {page.IsParsed}"); }
Start progressive loading: IsLoaded = True, IsParsed = False Parsing... Parsing... Parsing... Parsing completed: IsLoaded = True, IsParsed = True
PDF rendering is realized through the Pdfium renderer, a graphic engine that is used to render page to a bitmap or platform graphics device. Patagames PDF SDK provides APIs to set rendering options/flags, for example set flag to decide whether to render annotations, whether to draw image anti-aliasing and path anti-aliasing.
using Patagames.Pdf; using Patagames.Pdf.Net; using Patagames.Pdf.Enums; using System.Drawing.Imaging; ... using (var doc = PdfDocument.Load("sample.pdf")) { var page = doc.Pages[0]; int width = (int)page.Width; int height = (int)page.Height; using (var bitmap = new PdfBitmap(width, height, true)) { bitmap.FillRect(0, 0, width, height, FS_COLOR.White); page.Render(bitmap, 0, 0, width, height, PageRotate.Normal, RenderFlags.FPDF_NONE); bitmap.GetImage().Save("sample.png", ImageFormat.Png); } }
using Patagames.Pdf; using Patagames.Pdf.Net; using Patagames.Pdf.Enums; using System.Drawing.Imaging; ... using (var doc = PdfDocument.Load("sample.pdf")) { var page = doc.Pages[0]; int width = (int)page.Width; int height = (int)page.Height; using (var bitmap = new PdfBitmap(width, height, true)) { bitmap.FillRect(0, 0, width, height, FS_COLOR.White); page.Render(bitmap, 0, 0, width, height, PageRotate.Normal, RenderFlags.FPDF_ANNOT); bitmap.GetImage().Save("sample.png", ImageFormat.Png); } }
Just like with progressive page loading, you may want to use the progressive rendering technique. Progressive page rendering works exactly the same as progressive page loading. You can pause rendering, process user input, or display progress and then continue.
The complete example:
using (var doc = PdfDocument.Load(@"sample.pdf")) { PdfPage page = doc.Pages[0]; int width = (int)page.Width; int height = (int)page.Height; page.ProgressiveRender += (s, e) => { e.NeedPause = true; //or false depends on your own logic }; using (var bitmap = new PdfBitmap(width, height, true)) { bitmap.FillRect(0, 0, width, height, FS_COLOR.White); Console.WriteLine($"Start progressive render"); ProgressiveStatus status = page.StartProgressiveRender(bitmap, 0, 0, width, height, PageRotate.Normal, RenderFlags.FPDF_ANNOT, null); while ( status == ProgressiveStatus.ToBeContinued) { Console.WriteLine($"Render in progress..."); status = page.ContinueProgressiveLoad(); } } Console.WriteLine($"Render complete"); }
Start progressive render Render in progress... Render in progress... Render in progress... Progressive render complete
Coordinate systems define the canvas on which all painting occurs. They determine the position, orientation, and size of the text, graphics, and images that appear on a page. This section describes each of the coordinate systems used in SDK, how they are related, and how transformations among them are specified.
Paths and positions are defined in terms of pairs of coordinates on the Cartesian plane. A coordinate pair is a pair of real numbers x and y that locate a point horizontally and vertically within a two-dimensional coordinate space. A coordinate space is determined by the following properties with respect to the current page:
The location of the origin
The orientation of the x and y axes
The lengths of the units along each axis
SDK defines several coordinate spaces in which the coordinates specifying graphics objects are interpreted. The following sections describe these spaces and the relationships among them.
The contents of a page ultimately appear on a raster output device such as a display or a printer.
The coordinate system for the Render or StartProgressiveRender methods is based on device coordinates, and the basic unit of measure when rendering is the device unit (typically, the pixel; and always pixels when rendering into PdfBitmap). Points on the device canvas (PdfBitmap pixels) are described by x- and y-coordinate pairs, with the x-coordinates increasing to the right and the y-coordinates increasing from top to bottom.
To avoid the device-dependent effects of specifying objects in device space, PDF defines a device-independent coordinate system that always bears the same relationship to the current page, regardless of the output device on which printing or displaying occurs. This device-independent coordinate system is called user space.
The PdfPage.CropBox property specifies the rectangle of user space corresponding to the visible area of the intended output medium (display window or printed page). The positive x axis extends horizontally to the right and the positive y axis vertically upward, as in standard mathematical practice (subject to alteration by the PdfPage.Rotation property).
The length of a unit along both the x and y axes is set by the UserUnit entry in the PdfPage.Dictionary. If that entry is not present or supported, the default value of 1⁄72 inch is used.
As mentioned above, the Rotation property can change the direction of the axes. For example, if you render like this:
//... page.Render(bitmap, 0, 0, (int)page.Height, (int)page.Width, PageRotate.Rotate270, RenderFlags.FPDF_NONE); //...
The direction of the axes will be:
Device Space Coordinate System ![]() | User Space Coordinate System ![]() |
Occasionally, you may need to map from device space coordinates to user space coordinates. You can easily accomplish this by using the PageToDevice and DeviceToPage methods available in the PdfPage class. For example:
//... page.Render(bitmap, 0, 0, (int)page.Height, (int)page.Width, PageRotate.Rotate270, RenderFlags.FPDF_NONE); float pagePointX = 10.0f; float pagePointY = 10.0f; int devicePointX; int devicePointY; page.PageToDevice(0, 0, (int)page.Height, (int)page.Width, PageRotate.Rotate270, pagePointX, pagePointY, out devicePointX, out devicePointY); //...
A PDF page may be prepared either for a finished medium, such as a sheet of paper, or as part of a prepress process in which the content of the page is placed on an intermediate medium, such as film or an imposed reproduction plate. In the latter case, it is important to distinguish between the intermediate page and the finished page. The intermediate page may often include additional production-related content, such as bleeds or printer marks, that falls outside the boundaries of the finished page. To handle such cases, a PDF page can define as many as five separate boundaries to control various aspects of the imaging process:
The media box defines the boundaries of the physical medium on which the page is to be printed. It may include any extended area surrounding the finished page for bleed, printing marks, or other such purposes. It may also include areas close to the edges of the medium that cannot be marked because of physical limitations of the output device. Content falling outside this boundary can safely be discarded without affecting the meaning of the PDF file.
The crop box defines the region to which the contents of the page are to be clipped (cropped) when displayed or printed. Unlike the other boxes, the crop box has no defined meaning in terms of physical page geometry or intended use; it merely imposes clipping on the page contents. However, in the absence of additional information, the crop box determines how the page’s contents are to be positioned on the output medium. The default value is the page’s media box.
The bleed box defines the region to which the contents of the page should be clipped when output in a production environment. This may include any extra bleed area needed to accommodate the physical limitations of cutting, folding, and trimming equipment. The actual printed page may include printing marks that fall outside the bleed box. The default value is the page’s crop box.
The trim box defines the intended dimensions of the finished page after trimming. It may be smaller than the media box to allow for productionrelated content, such as printing instructions, cut marks, or color bars. The default value is the page’s crop box.
The art box defines the extent of the page’s meaningful content (including potential white space) as intended by the page’s creator. The default value is the page’s crop box.
These boundaries are specified by the MediaBox, CropBox, BleedBox, TrimBox, and ArtBox properties, respectively. All of them are rectangles expressed in default user space units. The crop, bleed, trim, and art boxes should not ordinarily extend beyond the boundaries of the media box. If they do, they are effectively reduced to their intersection with the media box. The below figure illustrates the relationships among these boundaries. (The crop box is not shown in the figure because it has no defined relationship with any of the other boundaries.)
//... var page = doc.Pages[0]; //PDF unit size is float pdfDpi = 72.0f; if (page.Dictionary.ContainsKey("UserUnit")) pdfDpi = page.Dictionary["UserUnit"].As<PdfTypeNumber>().FloatValue / 72; //The number of dots per inch for a specific output device. For example, 96 pixels per inch for a monitor. float deviceDpiX = 96.0f; float deviceDpiY = 96.0f; //The actual width and height will be int width = (int)(page.Width / pdfDpi * deviceDpiX); int height = (int)(page.Height / pdfDpi * deviceDpiY); //...
You can insert an empty page at any location in the existing PDF document. The below code snippet explains the same.
//... //Add a new Letter-sized page to the beginning of the document. int pageIndex = 0; float width = 8.5f * 72; float height = 11.0f * 72; PdfPage page = doc.Pages.InsertPageAt(pageIndex, width, height); //Add a new A4-sized page at the end of your document. pageIndex = doc.Pages.Count; width = 8.3f * 72; height = 11.7f * 72; page = doc.Pages.InsertPageAt(pageIndex, width, height); //...
//... // Remove a PDF page by page index. doc.Pages.DeleteAt(pageIndex); //...
//... //Regenerate the page content to fix any page content issues that could lead to flatten issues. //Not required but recommended. page.GenerateContent(); //Flatten a PDF page page.FlattenPage(FlattenFlags.NormalDisplay); //...
Patagames PDF SDK allows you to import a page or import a range of pages from one document to the other. The following code sample illustrates how to import a range of pages from an existing document.
//Load the PDF document. using (var inputDoc = PdfDocument.Load(@"input.pdf")) // C# Read source PDF File #1 { //Create a new PDF document. using (var targetDoc = PdfDocument.CreateNew()) { //Import all the pages to the new PDF document. targetDoc.Pages.ImportPages( inputDoc, string.Format("1-{0}", inputDoc.Pages.Count), 0); //Save the document. targetDoc.Save(@"target.pdf", SaveFlags.NoIncremental); } }
The SDK allows to split the pages of an existing PDF document into multiple individual PDF documents. The following code snippet explains the same.
//Load the PDF document. using (var sourceDoc = PdfDocument.Load(@"source.pdf")) { foreach (var page in sourceDoc.Pages) { //Create a new PDF document. using (var targetDoc = PdfDocument.CreateNew()) { //Import all the pages to the new PDF document. targetDoc.Pages.ImportPages(sourceDoc, $"{page.PageIndex + 1}", 0); //Save the document. targetDoc.Save($"target-page{page.PageIndex}.pdf", SaveFlags.NoIncremental); //Close page to reduce memory consumption when splitting documents with many pages. page.Dispose(); } } }