艾力洁免洗手免洗消毒液哪个牌子好能够针对病毒和细菌进行杀除作用吗?

// Documentation
PDFNet Intro
PDFNet is a high-quality, industry-strength PDF
library meeting requirements of the most demanding and diverse applications.
Using PDFNet you can write stand-alone, cross-platform
and reliable commercial applications able to read, write, and edit
PDF documents.
PDFNet is offered on a wide range of platforms
(Windows, Mac OS X, Linux, Android, iOS, WinRT, Windows Phone, etc.) and programming
environments (C/C++, C#, VB.Net, Java, Python, Ruby, PHP, and Objective-C).
The PDFNet API is divided into multiple namespaces.
The most important are the PDF, SDF, and Filters namespaces.
Figure 1. PDFNet API Modules.
API is a simple, high-level
API for manipulating PDF constructs such as pages, interactive forms,
bookmarks, graphical elements on the page, and so on.
API is a powerful, low-level
API for manipulating every aspect of a PDF document.
To use the SDF
API, you must be familiar with PDF file structure (documented in the PDF Reference Manual). Using the SDF
API, it is possible to implement functionality not present in the PDF
API deals with various
compression and encryption schemes used in PDF documents. Unless
planning to implement a custom encryption or compression scheme in PDF
documents, you need only very basic knowledge of the Filters API.
In this section, we present the basic structure of a PDF document.
For details, please refer to the . Below is a listing of a very simple PDF document.
It displays a &Hello World& string on a single page.
<font color="#01
1 0 obj &&
/Parent 5 0 R
/Resources 3 0 R
/Contents 2 0 R
/Length 51
1 0 0 1 260 330 Tm
(Hello World)Tj
/ProcSet [/PDF/Text]
/Font &&/F1 4 0 R &&
4 0 obj &&
/Type /Font
/Subtype /Type1
/BaseFont/Helvetica
/Type /Pages
/Kids [ 1 0 R ]
/MediaBox [0 0 612 714]
/Type /Catalog
/Pages 5 0 R
<font color="#FF
<font color="#FF
/Root 6 0 R
A PDF document consists of four sections:
A one-line header identifying the version of the PDF specification
to which the file conforms (Line 0). In the above sample, the header
string is &%PDF-1.4&. It identifies this file as a
PDF document adhering to the 1.4 specification.
A body containing the objects that make up the document contained
in the file (Lines 1-45). Our sample file shows 6 objects each
beginning with &obj& and ending with &endobj&.
Each object has its own number and a zero. The zero is the revision
level (also known as the generation number) because PDF allows
updates to the file to be made without rewriting the entire file.
A cross-reference table containing information about the indirect
objects in the file (Lines 46-54). The cross-reference table in our
sample notes that it contains 7 a dummy for object zero
and one for each of the 6 objects. The table maps implicit object
index into a byte offset from the beginning of the file to the
location where the object is located. For example, Object 1 is
represented first indicating that it begins at byte 9; Object
3 is represented with the fourth entry indicating that it is located
at byte 204 in the file. etc.
A trailer giving the location of the cross-reference table and
of certain special objects within the body of the file (Lines
Note that objects refer to each other using a notation like
&5 0 R&. The &R& stands for reference and
uses the two preceding numbers to name a specific object and revision.
Therefore, the file body consists of a collection of objects, each
object potentially referencing any or all objects, including itself.
This set of nodes and directed references constitutes a graph.
could represent the &Hello World& sample file using the
following abstract graph representation.
Figure 2. Object Graph.
Each object in the graph is represented with an ellipse and each
object cross reference is represented with an arrow.
Each PDF document must have a &Root& node. It must reference
a &Catalog& node which must reference a &Pages&
node. The &Pages& node further branches and points to each of the pages
in the document. Note that a &Pages& node points to a
group of pages whereas a &Page& node represents a single
The &Page& node references the page&#39;s &Contents&
and the page&#39;s &Resources&. The resource dictionary, in turn,
references the &Fonts& used on the page. The resource dictionary
can reference many other resource types, including Color Spaces, Patterns,
Shadings, Images, Forms, and more. The page contents stream contains markup
operators used to draw the page.
Each PDF document uses this basic object structure to represent a PDF
Before going into details of PDFNet SDF/COS object model, we should
review the basics. For a detailed description of the SDF syntax and
semantics, please refer to Chapter 3 (Syntax) of the .
In PDF there are five atomic objects:
Also, there are two compound objects:
Objects can be arbitrarily nested using the dictionary and array
compounding operations.
All of the objects in the above tables are &direct objects&
because they are not surrounded by &obj& and &endobj&
keywords. The body of the PDF document is actually made up of a sequence
of &indirect objects&. An indirect object
is created by taking a single direct object (whether it be
atomic or compound) and enclosing it with the &n m obj& and
&endobj& keywords (where n and m are non-negative integers).
Note that, because indirect objects are numbered and can be
referenced by other objects, they can be shared & that is,
referenced by more than one other object.
However, since direct
objects are not numbered, they can&#39;t be shared.
In the above PDF example, the object &3 0 obj& is an indirect object
because the &obj& and &endobj& keywords wrap a dictionary
object containing two entries.
/ProcSet [/PDF /Text]
The &ProcSet& key is mapped to an array which is a direct
object containing atomic direct objects. In a similar way, the
&Font& key is mapped to a direct dictionary. On the other
hand, &F1& in the inner dictionary is mapped to an indirect
object with object number 4 and generation number 0. Because the
&Font& key points to an indirect object, the same font
resource can be shared across many different pages.
Real-life PDF documents are much more complex than the &Hello
World& sample from the previous section. Streams in a PDF document
can be compressed and encrypted, objects can form complex networks,
and, in PDF 1.5, parts of the object graph can be compressed and embedded
in so-called &object streams&. All this makes manual editing of
PDF documents extremely difficult & even impossible. The good news is that
PDFTron Systems released CosEdit & a graphical
utility for browsing and editing PDFdocuments at the object level,
offering unprecedented ease and control. PDFNet also provides a full
SDF/COS level API making it very easy to read, write, and edit PDF
and FDF at the atomic level. Furthermore, PDFNet also provides a
high-level API for reading, writing, and editing PDF documents at the
level of pages, bookmarks, graphical primitives, and so on.
SDF (Structured Document Format) and COS (Carousel Object S Carousel was a codename for Acrobat 1.0)
are synonyms for PDF low-level object model. SDF is the acronym
used in PDFNet, whereas COS is a legacy word used in the Acrobat SDK.
In many ways, SDF is to PDF what XML and DOM are to SVG
(Scalable Vector Graphics). The SDF/COS object system provides the
object type and file structure used in PDF documents. PDF documents
are graphs of SDF objects. SDF objects can represent document components
such as bookmarks, pages, fonts, and annotations, and so on.
PDF is not the only document format built on top of SDF/COS. FDF
(Form Data Format) and PJTF (Portable Job Ticket Format) are also
built on top of SDF/COS.
The SDF layer deals directly with the data that is in a PDF document.
The data types are referred to as SDF objects. There are eight data
types found in PDF documents. They are arrays, dictionaries, numbers,
boolean values, names, strings, streams, and the null object. PDFNet
implements these objects as shown in the following graph:
Figure 3. SDF Obj Hierarchy.
In C#, all objects ultimately derive from the Object class.
Similarly, all SDF objects ultimately derive from the Obj class.
Following the Composite design pattern, Obj implements each method
found in its derived classes.
Thus you can invoke a member function of
any derived object through the base Obj interface.
This is illustrated
in the following code sample:
SDFDoc doc = new SDFDoc(&in.pdf&);
// Get the trailer
Obj trailer = doc.GetTrailer();
// Get the info dictionary.
Obj info = trailer.Get(&Info&).Value();
// Replace the Producer entry
info.PutString(&Producer&, &PDFNet&);
// Create a custom inline dictionary within
// Info dictionary
Obj custom_dict = info.PutDict(&My Direct Dict&);
// Add some key/value pairs
custom_dict.PutNumber(&My Number&, 100);
Obj my_array = custom_dict.PutArray(&My Array&);
// Create a custom indirect array within Info dictionary
Obj custom_array = doc.CreateIndirectArray();
info.Put(&My Indirect Array&, custom_array);
// Create indirect link to root
custom_array.PushBack(trailer.Get(&Root&).Value());
doc.Save(&out.pdf&, 0, &%PDF-1.4&);
// Save PDF
If a member function is not
supported on a given object (e.g. if you are invoking obj.GetName()
on a Bool object), an Exception will be thrown. Learn more about
PDFNet exception handling under the .
In order to find out type-information at run-time, use obj.GetType()
or obj.Istype() methods (where type could be Array,
Number, Bool, Str, Dict, or Stream).
Usually, an object&#39;s type
can be inferred from the PDF/FDF specification. For example, when you
call doc.GetTrailer(), you can assume that the returned object is a
dictionary object because this is mandated by PDF specification.
an object is not a dictionary, calling a dictionary method on it throws
an exception.
These semantics are important for stylistic reasons
& since type casts and type checks are not required, you can
keep your code efficient and elegant.
In case there is an ambiguity in
PDF/FDF specification, you can use GetType() or Istype()
methods. As mentioned in the previous section, SDF objects can
be either direct or indirect.
Direct objects can be
created using Obj.Createtype() methods. The following
example illustrates how to create direct number and direct name objects
inside Dict objects.
Note that the same approach will work for Array
// you can create direct objects inside container objects.
doc.GetRoot().PutNumber(&My number key&, 100);
doc.GetRoot().PutDict(&My dict key&);
doc.GetRoot().PutName(&My name key&, &My name value&);
New indirect objects can be created using
doc.CreateIndirecttype() methods on an SDF document. The
following code shows how to create new Number and Dictionary indirect
Obj mynumber = doc.CreateIndirectNumber(100);
Obj mydict = doc.CreateIndirectDict();
PDFNet SDF provides many utility methods that can be
used to efficiently traverse an SDF object graph. Here is an example
on how to get to a document&#39;s page root:
Obj pages = doc.GetTrailer()
.Get(&Root&).Value()
.Get(&Pages&).Value();
Note that because the
mandates that &Root& is always a
dictionary, we can directly reference the &Pages& object
by calling Get(&key&).
Note also that some so-called
&PDF& documents are corrupt, meaning that the
documents are not in compliance with the PDF specification.
corrupt PDF documents, the &Root& may be missing or may not
be a dictionary object.
In these and similar cases, the PDFNet SDK
throws an exception.
In order to retrieve an object that may or may not be present in
a dictionary, use the dict.FindObj(&key&) method. For example:
Obj my_value = dict.FindObj(&my_key&);
if (my_value != null)
// ...use my_value...
// &my_key& does not exist in dict
You can use DictIterator in order to traverse key-value pairs within
a dictionary:
for (DictIterator itr = dict.GetDictIterator();
itr.HasNext();
itr.Next())
// itr.Key();
// itr.Value();
To retrieve objects from an Array object, use array.GetAt(idx)
for (int i = 0; i & array.Size(); ++i)
Obj obj = array.GetAt(i);
In the previous section, we learned how to create indirect objects
by calling the SDFDoc.CreateIndirecttype() methods.
Now, let&#39;s
look at how to create references to those indirect objects.
following code shows how:
Obj indirect_dict = doc.CreateIndirectDict();
indirect_dict.PutName(&My key&, &My value&);
Obj trailer_dict = doc.GetTrailer();
if (trailer_dict != null)
Obj info_dict = trailer_dict.Get(&Info&).Value();
if (info_dict != null)
// Add indirect reference to &#39;shared_dict&#39;.
info_dict.Put(&MyDict&, shared_dict);
Obj root_dict = trailer_dict.Get(&Root&).Value();
if (root != null)
// Add a second indirect reference to &#39;shared_dict&#39;.
root.Put(&MyDict&, shared_dict);
So it&#39;s possible for multiple objects to refer to the same object.
We call such objects shared objects.
But shared objects
must always be
you want to share an object, it must have been created using
SDFDoc.CreateIndirecttype, or you should test Obj.IsIndirect() to
make sure it&#39;s an indirect object.
Because the PDF document format disallows creating multiple links to
direct objects, PDFNet will throw an exception should you try to create
multiple links/references to a direct object. This is shown
Obj trailer_dict = mydoc.GetTrailer();
if (trailer_dict != null)
Obj info_dict = trailer_dict.Get(&Info&).Value();
if (info_dict != null)
Obj direct_obj = info_dict.PutDict(&Link1&);
Obj root_dict = trailer_dict.Get(&Root&).Value();
if (root_dict != null)
// Attempt to create a second link to direct_obj.
// This will copy the object. If you want to
// share objects, create them using the
// PDFDoc.CreateIndirect() methods.
root_dict.Put(&Link2&, direct_obj);
In addition to the basic types of objects mentioned so far, PDF
also supports . A stream object is
essentially a dictionary with an attached binary stream. In PDFNet, all
methods that apply to dictionaries apply to streams as well.
In addition to the methods provided by Dict, streams provide an
interface used to access an associated data stream. Given a stream Obj,
you can use GetDecodedStream() to get decoded data or GetRawStream() to
get raw, undecoded data.
GetRawStreamLength() returns the length of
the raw data stream. This number is the same as the one stored under
&#8220;Length&#8221; key in the stream dictionary.
PDFNet supports all compression and encryption schemes used in the
PDF format.
It provides transparent access to decoded stream data.
The following code decodes and extracts the contents of a given stream
to an external file:
Obj stream = ...
Filter dec_stm = stream.GetDecodedStream();
dec_stm.WriteToFile(&out.bin&, false);
For a more complete discussion on PDFNet Filters
see PDFNet .
Our overview of the SDF object model could not be complete without mentioning
SDFDoc. SDFDoc brings together document security, document utility
methods, and all SDF objects.
An SDF document can be created from scratch using a default constructor:
SDFDoc sdfdoc = new SDFDoc();
sdfdoc.InitSecurityHandler();
Obj trailer = sdfdoc.GetTrailer();
An SDF document can be also created from an existing
file, such as an external PDF document:
SDFDoc sdfdoc = new SDFDoc(&in.pdf&);
sdfdoc.InitSecurityHandler();
Obj trailer = sdfdoc.GetTrailer();
Or it can be created from a memory buffer or some other Filter/Stream:
MemoryFilter memory = ....
SDFDoc sdfdoc = new SDFDoc(memory);
sdfdoc.InitSecurityHandler();
Obj trailer = sdfdoc.GetTrailer();
Finally, an SDF document can be accessed from a high-level
PDF document as follows:
PDFDoc pdfdoc = new PDFDoc(&in.pdf&);
pdfdoc.InitSecurityHandler();
SDFDoc sdfdoc = pdfdoc.GetSDFDoc();
sdfdoc.InitSecurityHandler();
Obj trailer = sdfdoc.GetTrailer();
Note that the examples above use sdfdoc.GetTrailer()
in order to access the document trailer, which is the starting SDF
object (root node) in every document. Following the trailer links, we
can visit all low-level objects in a document (e.g. all pages,
outlines, fonts, and so on).
SDFDoc also provides utility methods used to import objects and
object collections from one document to another. These methods can
be useful for copy operations between documents such as a high-level
page merge and document assembly.
One of the basic building blocks of a PDF document is an SDF . For example, in a PDF document all page content, images,
embedded fonts, and files are represented using object streams that
can be compressed and encrypted using various Filter chains. See the
&Stream Objects& and &Filters& chapters in the for more details.
PDFNet supports an efficient and flexible architecture for processing
stream using Filter pipelines.
A Filter is an abstraction of a sequence of bytes, such as a file,
an input/output device, an inter-process communication pipe, or
a TCP/IP socket. A filter can also perform certain transformations
of input/output data (e.g. data compression/decompression, color
conversion, and so on).
PDFNet enables generic input from external files using the
MappedFile filter. Use MappedFile to open, read from, and close files
on a file system. For example:
MappedFile myfile = new MappedFile(&in.jpg&);
opens an external image file for reading. MappedFile
buffers input and output for better performance. Although it is
possible to read input data directly through the Filter interface
(MappedFile is a subclass of Filter), it is more convenient to attach
a FilterReader to the filter and then read data through FilterReader
interface:
FilterReader reader = new FilterReader(myfile);
while((bytes = reader.Read(buffer)) != 0)
Data associated with SDF stream objects can be accessed
using Stream.GetRawStream() or Stream.GetDecodedStream() methods.
void Extract(Obj stream)
Filter dec_stm = stream.GetDecodedStream();
FilterReader reader = new FilterReader(dec_stm);
while((bytes = reader.Read(buffer)) != 0)
Stream.GetRawStream() creates a Filter used to extract
raw data as it appears in a serialized SDF document (or a decrypted version
of the stream if the document is secured). Stream.GetDecodedStream()
creates a Filter pipeline and returns the last filter in the chain.
For example, a given stream may be compressed using JPEG (DCTDecode)
compression and encoded using ASCII85 into an ASCII stream. When
GetDecodedStream() is invoked on this SDF stream, it will return
the last filter in a chain that composed of three filters (the file segment
input Filter, the DCTDecode Filter, and the ASCII85Decode Filter, respectively).
Data extracted from the returned Filter will be raw image data (i.e.
RGB byte triples).
It&#39;s possible to iterate through the Filter chain using the Filter.GetAttachedFilter()
method. For example, the following code prints out all the Filter
names in the filter chain.
Filter attached_
Filter cur_flt = dec_
while ((attached_flt = cur_flt.GetAttachedFilter()) != null)
Console.WriteLine(cur_flt.GetName());
cur_flt = attached_
It&#39;s also possible to construct new filter chains, and to edit existing
ones, using the Filter.AttachFilter() method.
To write a filter to a file, simply use Filter.WriteToFile():
dec_stm.WriteToFile(&out.bin&, false);
After the output file filter/stream is opened you
can output data using FilterWriter class:
FilterWriter writer = new FilterWriter(myfile);
writer.WriteString(&Hello World&);
writer.Flush();
PDFNet provides full support for all common Filters used in PDF.
Although included Filters should cover all common use case scenarios,
advanced users may want to provide custom implementations for certain
filters (e.g. custom color conversion, or a new compression method).
PDFNet provides an open and expandable architecture for creation
of custom filters. To implement a custom Filter, derive a new class
from Filter base class and implement the required interface. A more
detailed guide for implementing custom Filters is available through
PDFTron Systems developer program. Please contact
for more details.
PDF documents can be secured and encrypted using various encryption
Control over document security in PDFNet is performed through
security handlers.
Security handlers perform user
authorization and sets various permissions over PDF documents.
Although PDFNet offers an extension mechanism through which users can
register custom security handlers, it also provides a standard security
This built-in security handler is the Standard Security Handler
(StdSecurityHandler).
The Standard Security Handler supports two
passwords:
A user password that permits a user to open and read
a protected document only with whatever permissions the owner
An owner password that grants a document&#39;s owner
free reign over what permissions are granted to users.
An application can also create its own implementation of
SecurityHandler.
For example, a custom SecurityHandler could perform
user authorization requiring the presence of a hardware dongle or even
feedback from a biometric system.
A Security Handler is used when:
A document is opened. The security handler must determine
whether a user is authorized to open the file. It must also set up
the RC4 decryption key used to decrypt the file.
A document is saved. The security handler must set up the RC4
encryption key and write security information into the PDF
file&#39;s encryption dictionary.
A user tries to change a document&#39;s security settings.
that the Standard Security Handler in PDFNet does not enforce a
document&#39;s permissions. For example, it is possible to edit a
document although document modification permission is not granted.
Therefore, it is up to the application to respect PDF
permissions.
The number of security handlers associated with a document change
over time.
When the document is first opened it isn&#39;t associated with
any security handlers.
When InitSecurityHandler (or
InitStdSecurityHandler) is called on the document, that security
handler is associated with the document.
And when SetSecurityHandler
is called on a document, that security handler is also associated with
the document&albeit in a pending state until the document
Until the document is saved with the new security handler,
the old security handler rules the document&#39;s security.
A document may have both a current and a new security handler associated
with it. A PDF document is not fully loaded in memory and decrypted
when it is loaded. So to fully decrypt the document, even after
applying a new security handler, the original security handler is
still required.
PDFNet fully supports the reading of secured and encrypted PDF documents.
To test whether a document requires a password, check the return value of
PDFDoc.InitSecurityHandler():
// Open a potentially encrypted document
PDFDoc doc = new PDFDoc(&in.pdf&);
if (!doc.InitSecurityHandler())
Console.WriteLine(
&in.pdf requires a password.&);
Console.WriteLine(
&in.pdf does not require a password.&);
Because InitSecurityHandler() doesn&#39;t have any side effects on
documents that are not encrypted you should always invoke this
method, or InitStdSecurityHandler(), after constructing a
If a document doesn&#39;t require authentication data (such as a
user password) in order to view its content, InitSecurityHandler() is
enough to work with encrypted documents.
If, on the other hand, the
document requires a password, InitStdSecurityHandler allows you to
provide one:
// Open a potentially encrypted document
PDFDoc doc = new PDFDoc(&in.pdf&);
if (!doc.InitStdSecurityHandler(&test&))
Console.WriteLine(
&in.pdf&#39;s password is &#39;test&#39;.&);
Console.WriteLine(
&in.pdf&#39;s password is not &#39;test&#39;.&);
After the document&#39;s security handler is initialized, you can
access it using the doc.GetSecurityHandler() method. You can edit
permissions and authorization data on an existing handler, or set a
completely new security handler using the
doc.SetSecurityHandler(handler) method.
To remove PDF security, set the document&#39;s current SecurityHandler
PDFDoc doc = new PDFDoc(&encrypted.pdf&);
doc.InitSecurityHandler();
doc.SetSecurityHandler(null);
To secure a document, create a new SecurityHandler, set permission
and authentication data, and call doc.SetSecurityHandler(handler) to
set it as the new handler.
For example:
PDFDoc doc = new PDFDoc(&in.pdf&);
if (!doc.InitSecurityHandler())
Console.WriteLine(
&Document authentication error...&);
StdSecurityHandler new_handler = new StdSecurityHandler();
// Set a user password required to open a document
string user_password = &test&;
new_handler.ChangeUserPassword(user_password);
// Set Permissions
new_handler.SetPermission(
SecurityHandler.Permission.e_print, true);
new_handler.SetPermission(
SecurityHandler.Permission.e_extract_content, false);
// Associate the new_handler with the document.
doc.SetSecurityHandler(new_handler);
Besides providing full support for standard PDF security, PDFNet
allows users to work with custom security handlers and proprietary
encryption algorithms. To define a custom security handler, derive
a class from SecurityHandler and implement SecurityHandler&#39;s
interface.
Please see the
PDFNet Knowledge Base or contact
High-level PDF constructs such as pages, interactive forms, a
page&#39;s graphical elements, or bookmarks are all implemented in a
namespace called PDF. PDF classes contain methods for copying pages
between documents, reading and writing graphical Elements (including
images, paths, and text), manipulating interactive forms, and more.
Although classes in the PDF namespace implement most commonly-used PDF
functionality, you can always access their underlying SDF objects and
thus leverage the full power of the low-level object model.
The PDFDoc constructor creates a PDF document from scratch:
PDFDoc new_doc = new PDFDoc();
A newly-created document does not yet contain any pages. See the Working with Pages section for details on
creating new pages and working with existing pages.
Using PDFNet, you can open a document from a serialized file, from
a memory buffer, or from a Filter stream.
To open an existing PDF document from a file, specify its file path
in the PDFDoc constructor:
PDFDoc mydoc = new PDFDoc(&in.pdf&);
Here&#39;s how to open an existing PDF document from a memory buffer:
FileStream stm = new FileStream(&in.pdf&,
FileMode.Open, FileAccess.Read);
BinaryReader reader = new BinaryReader(stm);
byte[] buffer = reader.ReadBytes(
(int) reader.BaseStream.Length);
reader.Close();
PDFDoc mydoc = new PDFDoc(buffer);
It&#39;s also easy to open a PDF document from a MemoryFilter or a custom Filter such as
HTTPFilter.
After creating a PDFDoc object, it&#39;s good practice to call
InitSecurityHandler() on it.
If the document is encrypted, calling the
method will decrypt it. If the document is not encrypted, calling the
method is harmless.
PDFDoc doc = new PDFDoc(&in.pdf&);
if (!doc.InitSecurityHandler())
Console.WriteLine(&Document authentication error...&);
PDFNet security API is explained
in details in the
PDF document can be serialized (or saved) to a file on a disk,
to a memory buffer, or to an arbitrary data stream such as MemoryFilter
or HTTPFilter.
To save a PDF document to a file on disk, invoke its Save() method:
doc.Save(&out.pdf&, 0);
The second argument is a bitwise disjunction of flags used as
options during serialization.
PDFNet allows a document to be saved incrementally (see section 2.2.7
&Incremental Update& in the ). Because applications may allow users to modify
PDF documents, users should not have to wait for the entire file
(which can contain hundreds of pages) to be rewritten each time
modifications to the document are saved. PDFNet allows modifications
to be appended to a file, leaving the original data intact. The
addendum appended when a file is incrementally updated contains
only those objects that were actually added or modified. Incremental
update allows an application to save modifications to a PDF document
in an amount of time proportional to the size of the modification
rather than the size of the file. In addition, because the original
contents of the document are still present in the file, it is possible
to undo saved changes by deleting one or more file updates.
Changes can be appended to an existing document using e_incremental
doc.Save(&in.pdf&, PDFDoc.e_incremental);
Note that the file output name matches the input name.
Over time, PDF documents may accumulate unused objects like old updates,
modifications, unused fonts, images, and so on.
To trim down the file
size by removing these unused objects, use the e_remove_unused
doc.Save(&out.pdf&, PDFDoc.e_remove_unused);
In order to provide user feedback, the PDFDoc.Save() method accepts
objects which inherit from the ProgressMonitor class. ProgressMonitor
provides a callback interface that keeps the client application up to
date about the Save function&#39;s progress.
A PDF document can also be serialized into a memory buffer as follows:
byte[] buf = doc.Save(0);
Document&#39;s page sequence
A high-level PDF document contains a sequence of Page objects, as
illustrated in the following figure:
Figure 4. PDFDoc Page sequence.
To find the number of pages in a PDF document, call PDFDoc.GetPageCount().
To retrieve a specific page of a document, use
PDFDoc.GetPage(page_num).
Page numbers in the document&#39;s page
sequence are indexed from 1.
If the given page number doesn&#39;t
index a page in the current document, GetPage(page_num) returns null.
For example:
Page page = doc.GetPage(page_num);
if (page != null)
Console.WriteLine(
&Document does contain page#: {0}&, page_num);
Console.WriteLine(
&Document does not contain page#: {0}&, page_num);
While GetPage(i) is convenient for retrieving an individual page,
it&#39;s an inefficient way to enumerate every page of a document. It&#39;s
better to traverse the pages with a PageIterator.
To do so, simply call PDFDoc.GetPageIterator().
This returns a PageIterator
object, which provides HasNext(), Next() and Current() methods.
following code snippet shows how to print the page size for every page
in document page sequence:
for (PageIterator itr=doc.GetPageIterator(); itr.HasNext(); itr.Next())
Rect mediabox = itr.Current().GetMediaBox();
Console.WriteLine(&Media box: {0}, {1}, {2}, {3}&
mediabox.x1, mediabox.y1,
mediabox.x2, mediabox.y2);
(This code finds the page size using the page&#39;s media box, which
we&#39;ll talk more about in the following sections.)
To jump to a specific page with a PageIterator, call
PDFDoc.GetPageIterator(page_num).
If no such page exists,
PageIterator.GetPageNumber() returns 0.
For example:
PageIterator itr = doc.GetPageIterator(page_num);
if (itr.GetPageNumber() > 0)
Console.WriteLine(
&Document does contain page#: {0}&, page_num);
Console.WriteLine(
&Document does not contain page#: {0}&, page_num);
To create a new page, use the PDFDoc.PageCreate(media_box) method.
PageCreate() takes an optional Rect argument that can be used to
specify page size.
This Rect is called a media box.
A media box is a rectangle, expressed in default user space
units, defining the boundaries of the physical medium on which the page
is intended to be displayed or printed. A user space unit is 1/72 of an
inch. If media_box is unspecified, the default dimensions of the page
are 8.5 x 11 inches (or 8.5*72, 11*72 units).
Page x = doc.PageCreate();
doc.PagePushBack(x);
The above code snippet creates a new 8.5x11 page and adds it at the
end of document&#39;s page sequence.
Note that, after the page is created, it does not yet belong to a
document&#39;s page sequence. The page needs to be placed within the
page sequence in order to become &visible&.
PagePushBack()
inserts page x into the position of the document's last page.
The recommended way to copy pages from one document to another is
with PDFDoc.InsertPages(). Its arguments are:
insertBeforeThisPage: An integer specifying where the
pages should be inserted
sourceDoc: A PDFDoc from which the pages should be
startPage: An integer specifying the first page number
endPage: An integer specifying the last page number to
flag: A PDFDoc.InsertFlag value (either
e_insert_bookmark, meaning bookmarks should be inserted, or e_none)
For example, suppose we want to insert the third page of one
document after the first page of a second document.
The following code
snippet performs this with an insertBeforeThisPage value of 2
and startPage and endPage values of 3.
Console.WriteLine(
&dest_doc has {0} pages prior to calling InsertPages. &,
dest_doc.GetPageCount());
dest_doc.InsertPages(2, source_doc, 3, 3, PDFDoc.InsertFlag.e_none);
Console.WriteLine(
&dest_doc has {0} pages following its call to InsertPages. &,
dest_doc.GetPageCount());
We can also insert a range of pages.
For example, the following
code will insert the second, third, and fourth pages of one document
into the end of the second document. We specify that we&#39;re
inserting into the end of the document by using an
insertBeforeThisPage value higher than the number of pages in
the document:
Console.WriteLine(
&dest_doc has {0} pages prior to calling InsertPages. &,
dest_doc.GetPageCount());
dest_doc.InsertPages(dest_doc.GetPageCount() + 1,
source_doc, 2, 4, PDFDoc.InsertFlag.e_none);
Console.WriteLine(
&dest_doc has {0} pages following its call to InsertPages. &,
dest_doc.GetPageCount());
A Page can also be copied from one document to another (or replicated
within an existing document) using the PDFDoc.PageInsert(where, pg),
PDFDoc.PagePushFront(pg), PDFDoc.PagePushBack(pg) and PDFDoc.ImportPages(list)
PagePushBack(page) appends the given Page at the end of page sequence,
whereas PagePushFront(page) inserts the Page at the front of the
sequence. PageInsert(where, page) inserts the page in front the
page currently pointed to by the where PageIterator.
// Append three copies of the page to the document.
doc.PagePushBack(x);
doc.PagePushBack(x);
doc.PagePushFront(x);
// Create a new page and insert it just before
// the second page
doc.PageInsert(doc.GetPageIterator(2), doc.PageCreate());
Note that it is possible to replicate a given page
within a document by repeatedly adding the same page.
The same methods can also be used to merge documents or copy pages
from one document to another.
In a PDF document, every page object contains references to images,
fonts, color spaces, and other objects required to render the page.
order to accurately copy a page from one document to another, these
PageInsert / PagePushFront / PagePushBack methods must copy all
referenced resources.
If you are copying several pages between two documents, it&#39;s better
to use PDFDoc.ImportPages(page_list) because the resulting document
will be much smaller and the copy operation will be faster.
ImportPages() is better than other methods for multi page copy
because it preserves resource sharing in the target document. This
is illustrated in following figures.
Figure 5. Copying pages between two documents using PageInsert/PagePushFront/PagePushBack
In a PDF document, page resources (such as fonts, images,
color-spaces, or forms) can be shared across several pages.
these resources reduces file size and speeds up page processing. In
Figure 5 above, all three pages of &#39;Document 1&#39; share the same
font and color space object. &#39;Document 2&#39; was created by direct
page copy using PageInsert, PagePushFront or PagePushBack methods. Note
that each page now refers to its own separate instances of resource
On the other hand, the result of page copy using ImportPages() is
identical to the original document. Note that in &#39;Document 2&#39;,
in Figure 6 below, resource objects are shared across pages.
Figure 6. Copying pages between two documents
using ImportPages()
Also note that, if pages are copied/replicated within the same
document (not between two different documents), all methods behave the
same and resources are always shared.
The following code copies pages individually, as in Figure 5:
using (PDFDoc in_doc = new PDFDoc(&in.pdf&))
in_doc.InitSecurityHandler();
using (PDFDoc new_doc = new PDFDoc())
for (PageIterator itr=in_doc.GetPageIterator();
itr.HasNext(); itr.Next())
new_doc.PagePushBack(itr.Current());
// save new_doc...
But, as explained above, it&#39;s better to import multiple pages
with PDFDoc.ImportPages(), as shown in Figure 6.
ImportPages(page_list) creates a copy of pages given
in the argument list, while preserving shared resources. Note that the
pages in the returned list are ordered in the same way as pages
in the argument list and that, although pages are copied, they are
not inserted into the document&#39;s page sequence. Therefore, in order
to be visible, imported or copied pages should be appended or inserted at
a specific location within the document&#39;s page sequence. For example:
using (PDFDoc in_doc = new PDFDoc(&in.pdf&))
in_doc.InitSecurityHandler();
using (PDFDoc new_doc = new PDFDoc())
// Create a list of pages to copy.
ArrayList copy_pages = new ArrayList();
for (PageIterator itr=in_doc.GetPageIterator();
itr.HasNext(); itr.Next())
copy_pages.Add(itr.Current());
// Import all the pages in &#39;copy_pages&#39; list
ArrayList imported_pages = new_doc.ImportPages(copy_pages);
// Note that pages in &#39;imported_pages&#39; list are not yet placed in
// document&#39;s page sequence. This is done in the following step:
for (int i=0; i!=imported_pages.C ++i)
new_doc.PagePushBack((Page)imported_pages[i]);
// save new_doc...
Given a PageIterator itr pointing to a page, that page can be
deleted using PDFDoc.PageRemove(itr).
For example:
// Remove the fifth page from the page sequence.
doc.PageRemove(doc.GetPageIterator(5));
// Remove the third page.
PageIterator i = doc.GetPageIterator();
doc.PageRemove(i);
PDFDoc.PageRemove(itr) only removes the page from document&#39;s
page sequence. The page and its resources are still available until the
document is saved in &#39;full save mode&#39; with the
&#39;remove
unused objects&#39; flag.
If you are saving the file in
&#39;incremental mode&#39;, the serialized document may contain the
content of the removed page.
and delete page operations described in
previous sections it is easy to re-arrange and sort pages. For example,
the order of pages in the document can be reversed as follows.
int page_num = doc.GetPageCount();
for (int i=1; i&=page_ ++i)
PageIterator itr = doc.GetPageIterator(i);
Page page = itr.Current();
doc.PageRemove(itr);
doc.PagePushFront(page);
A page can be rotated clockwise, by multiples of 90 degrees, when
displayed or printed.
The Page.GetRotation() method returns the
Page.Rotate enum specifying the current rotation.
Similarly,
Page.SetRotation() sets the current rotation.
For example:
// Rotate the first page 90 degrees clockwise.
Page.Rotate originalRotation = doc.GetPage(1).GetRotation();
switch (originalRotation)
case Page.Rotate.e_0:
rotation = Page.Rotate.e_90;
case Page.Rotate.e_90:
rotation = Page.Rotate.e_180;
case Page.Rotate.e_180: rotation = Page.Rotate.e_270;
case Page.Rotate.e_270: rotation = Page.Rotate.e_0;
rotation = Page.Rotate.e_0;
doc.GetPage(1).SetRotation(rotation);
The crop box defines the region to which the contents of the page
are to be clipped (cropped) when displayed or printed. Unlike the
other boxes, the crop box has no defined meaning in terms of physical
page geom it merely imposes clipping on the
page contents. The default value is the page&#39;s media box.
A new crop box can be imposed on a page with Page.SetCropBox(), as
page.SetCropBox(Rect.CreateSDFRect(0, 0, 500, 600));
The existing crop box of a page can be discovered with Page.GetCropBox():
Rect crop_box = page.GetCropBox();
// Crop box is:
// rect.x1, rect.y1,
// rect.x2, rect.y2
The media box defines the boundaries of the physical medium on
which the page is to be printed. It may include any extended area
surrounding the finished page for bleed, printing marks, or other
such purposes. It may also include areas close to the edges of the
medium that cannot be marked because of physical limitations of
the output device. Content falling outside this boundary can safely
be discarded without affecting the visible output of the PDF document. A
new value for a page&#39;s media box can be specified as follows:
page.SetMediaBox(Rect.CreateSDFRect(0, 0, 500, 600));
Page content can be horizontally and vertically translated by . For example, the following code will translate
all page contents 2 inches= 72 units per inch * 2 inches to the
Rect media_box = page.GetMediaBox();
// translate the page 2 inches horizontally
media_box.x1 += 144;
media_box.x2 += 144;
page.SetMediaBox(mediabox);
PDFNet provides a powerful, easy-to-use API that can be used to
read, write and edit text, images, and other graphical entities, called
the Element API.
Because the Element API is very efficient,
PDFNet is an good match for interactive applications (such as PDF
viewers and editors) and for content extraction applications (such as
PDF conversion and validation), as well as for dynamic PDF generation.
Page content, a major component of a PDF document, is made up of the
visible marks on a page drawn by PDF marking operators. For details on
PDF content streams and thorough operator descriptions please refer to
Section 3.7.1, &#8220;Content Streams,&#8221; in the PDF Reference Manual.
Although the PDFNet SDF and Filter APIs provide everything required
to decode and parse low-level content streams, using the Element API is
easier and more intuitive.
The reason why is that the Element API
allows you to treat a page&#39;s contents as a list of objects
(i.e. a display list or a sequence of Elements) rather than as sets of
cryptic marking operators.
An Element (such as text, a path, or an image) is constructed from a
set of marking operators from the page content stream.
Elements represents a display list.
Figure 7. A sequence of page marking operators
represents an Element.
Therefore, the PDFNet Element interface allows you to treat page contents
as a list of objects whose values and attributes can be modified.
Using the Element interface, applications can read, write, edit, and
create page contents and resources.
These contents and resource may in
turn contain fonts, images, shadings, patterns, extended graphics
states, and so on.
An application may use Element methods to modify the appearance
of a page, or it can create page content from scratch.
Each Element is independent of other Elements. Therefore, every
Element encapsulates all the relevant information about itself. A text
object, for example, contains all font attributes.
Element is the concrete base class for all Elements. PDFNet
supports all content elements allowed by the PDF format, namely: path,
text_begin, text, text_new_line, text_end, image, inline_image,
shading, form, group_begin, group_end, marked_content_begin, and
marked_content_end.
Note that some Elements & such as path, text, image,
inline-image, and shading & represent concrete graphical
elements. However, other Elements & such as text_begin/end,
text_new_line, group_begin/end, and marked_content_begin/end &
don&#39;t have graphical representation but are used for logical
grouping of Element sequences or to provide meta-data associated with
Element groups.
The Element class hierarchy implements a composite pattern &
that is, the Element class provides the methods of all derived classes.
Figure 8. Element hierarchy. Only methods listed in the Element
group or base class can be invoked for the given type.
To find the type of an Element object, use the element.GetType()
Be forewarned: it is not allowed to call methods on an
object that are not related to that object&#39;s Element type. The
behavior when doing so is undefined. For example, it is illegal to call
element.GetImageData() on an e_path element.
Note that, in Figure 8 above, e_group_begin/end and e_text_begin/end
don&#39;t add any functionality to the common Element interface (i.e.
GetType()/GetGState()/GetCTM()). The main purpose of these Elements is
to mark sequences of Elements into logical groups. The Element
e_group_begin corresponds to the PDF &#39;q&#39; operator (saveState),
e_group_end corresponds to the &#39;Q&#39; operator, e_text_begin
corresponds to the &#39;BT&#39; (begin text) operator, and e_text_end
corresponds to the &#39;ET&#39; operator.
e_text_begin initializes a text object, initializing the text matrix
and the text line matrix to the identity matrix. Because PDF text
objects can&#39;t be nested, a second e_text_begin element cannot
appear before e_text_end. A text object contains one or more text runs
(that is, e_text elements) and new line markers (that is,
e_text_new_line elements).
e_text and e_text_new_line are not allowed
outside of the text group (that is, outside element sequence surrounded
by e_text_begin/end).
Every Element has an associated CTM (current transformation matrix)
and graphics state. Element.GetCTM() returns the transformation matrix
used while processing the current Element. Element.GetGState() returns
the Element&#39;s associated graphics state. GState keeps track of a
number of style attributes used to visually define graphical Elements.
The methods available through the GState class are listed below:
Figure 9. Graphics State.
For a detailed description of graphics state attributes refer to
section 4.3 &Graphics State& in the .
Page content is represented as a sequence of graphical Elements such
as paths, text, images, and forms. The only effect of the ordering of
Elements in the display list is the order in which Elements are
painted. Elements that occur later in the display list can obscure
earlier elements.
A display list can be traversed using an ElementReader object. For example:
void ReadDoc()
// Open an existing document
PDFDoc doc = new PDFDoc(&in.pdf&);
doc.InitSecurityHandler();
ElementReader reader = new ElementReader();
Read page content on every page in the document
PageIterator end = doc.PageEnd();
for (itr=doc.PageBegin(); itr!= itr.Next())
// Read the page
reader.Begin(itr.Current());
ProcessElements(reader);
void ProcessElements(ElementReader reader)
// Traverse the page display list
while ((element = reader.Next()) != null)
switch (element.GetType())
case Element.ElementType.e_path:
if (element.IsClippingPath())
case Element.ElementType.e_text:
Matrix2D text_mtx = element.GetTextMatrix();
case Element.ElementType.e_form:
reader.FormBegin();
ProcessElements(reader);
reader.End();
To start traversing the display list, call reader.Begin().
Then, reader.Next() will return subsequent Elements until null is
returned (marking the end of the display list).
Note that, while ElementReader only works with one page at a time,
the same ElementReader object may be reused to process multiple
Note that a PDF page display list may contain child display lists of
Form XObjects, Type3 font glyphs, and tiling patterns. A form XObject
is a self-contained description of any sequence of graphics objects
(such as path objects, text objects, and sampled images), defined as a
PDF content stream. It may be painted multiple times & either on
several pages or at several locations on the same page & and will
produce the same results each time (subject only to the graphics state
at the time the Form XObject is painted). In order to open a child
display list for a Form XObject, call the reader.FormBegin() method.
To return processing to the parent display list call reader.End().
Processing of the form XObject display is illustrated in Figure 10
Figure 10. Traversing the child display list.
Note that, in the above sample code, a child display list is opened
when an element with type Element.ElementType.e_form is encountered by
the reader.FormBegin() method. The child display list becomes the
current display list until it is closed using reader.End(). At this
point the processing is returned to the parent display list and the
next Element returned will be the Element following the Form XObject.
Also note that, because Form XObjects may be nested, a sub-display list
could have its own child display lists.
The above sample code
traverses these nested Form XObjects recursively.
Similarly, a pattern display list can be opened using
reader.PatternBegin(), and a Type3 glyph display list can be opened
using the reader.Type3FontBegin() method.
After reading an Element using ElementReader.Next(), it is possible
to access all graphical attributes of the Element through its graphics state. Some applications are more
interested in changes in the graphics state than attribute
values. For example, a transition from one Element to another may not
involve changes in the graphics state. Or, perhaps, there may be
changes only to a couple of attributes. In these cases, it isn&#39;t
efficient to make memberwise comparisons between the old and
current graphics states.
To make this easier and more efficient, PDFNet offers an API to
enumerate the list of changes between subsequent Elements.
The list of changes in a graphics state can be traversed using the
ElementReader.ChangesBegin/End() methods, as illustrated by the
following example:
GSChangesIterator itr = reader.ChangesBegin();
GSChangesIterator end = reader.ChangesEnd();
for (; itr != itr.Next())
switch(itr.Current())
case GState.GStateAttribute.e_transform:
// Get transform matrix for this element.
// Unlike path.GetCTM() that returns full
// transformation matrix gs.GetTransform()
// returns only the transformation matrix
// that was installed for this element (a
// cm operator preceding this Element).
// gs.GetTransform();
case GState.GStateAttribute.e_line_width:
// gs.GetLineWidth();
case GState.GStateAttribute.e_line_cap:
// gs.GetLineCap();
case GState.GStateAttribute.e_line_join:
// gs.GetLineJoin();
case GState.GStateAttribute.e_miter_limit:
// gs.GetMiterLimit();
case GState.GStateAttribute.e_dash_pattern:
It&#39;s also possible to query ElementReader for changes to a specific
attribute:
if (reader.IsChanged(
GState.GStateAttribute.e_line_width))
// line width was changed.
Note that the list of modified attributes is accumulated when
calling ElementReader.Next(). To clear the list of modified attributes
use ElementReader.ClearChangeList() method. A call to ClearChangeList()
serves as a marker in the display list from which further changes in
the graphics state are tracked.
New page content can be added to an existing page or a
using ElementBuilder and ElementWriter. ElementBuilder
is used to instantiate one or more
can be written to one or more pages using ElementWriter:
Figure 11. Adding new content to a page.
The following sample illustrates how to write page content to a new
PDFDoc doc = new PDFDoc();
doc.InitSecurityHandler();
// ElementBuilder is used to build new Element objects
ElementBuilder f = new ElementBuilder();
// ElementWriter is used to write Elements to the page
ElementWriter writer = new ElementWriter();
// Start a new page
// Position an image stream on several places on the page
Page page = doc.PageCreate();
// Begin writing to this page
writer.Begin(page);
// Attach ElementBuilder to the page
f.Begin(page);
// Import an Image that can be reused multiple
// times in the document or
multiple times on the
// same page.
MappedFile img_file = new MappedFile(&peppers.jpg&);
FilterReader img_data = new FilterReader(img_file);
Image img = Image.Create(doc.GetSDFDoc(),
Image.ImageCompression.e_jpeg,
400, 600, 8,
ColorSpace.CreateDeviceRGB());
Element element = f.CreateImage(img,
new Matrix2D(200, -145, 20, 300, 200, 150));
writer.WritePlacedElement(element);
GState gstate = element.GetGState();
// Use the same image (just change its matrix)
gstate.SetTransform(200, 0, 0, 300, 50, 450);
writer.WritePlacedElement(element);
// Use the same image (just change its matrix)
writer.WritePlacedElement(
f.CreateImage(img, 300, 600, 200, -150));
// save changes to the current page
writer.End();
// Add a new page to the document sequence
doc.PagePushBack(page);
// Start a new page
page = doc.PageCreate();
writer.Begin(page);
f.Begin(page);
// Construct and draw a path object using
// different GState attributes
f.PathBegin();
f.MoveTo(306, 396);
f.CurveTo(681, 771, 399.75, 864.75, 306, 771);
f.CurveTo(212.25, 864.75, -69, 771, 306, 396);
f.ClosePath();
// path is now constructed
element = f.PathEnd();
element.SetPathFill(true);
// Set the path color space and color.
gstate = element.GetGState();
gstate.SetFillColorSpace(
ColorSpace.CreateDeviceCMYK());
gstate.SetFillColor(
new ColorPt(1, 0, 0, 0));
gstate.SetTransform(
0.5, 0, 0, 0.5, -20, 300);
writer.WritePlacedElement(element);
// Draw the same path using a different
// stroke color.
// The path will be filled and stroked.
element.SetPathStroke(true);
gstate.SetFillColor(
new ColorPt(0, 0, 1, 0)); // yellow
gstate.SetStrokeColorSpace(
ColorSpace.CreateDeviceRGB());
gstate.SetStrokeColor(new ColorPt(1, 0, 0)); // red
gstate.SetTransform(0.5, 0, 0, 0.5, 280, 300);
gstate.SetLineWidth(20);
writer.WritePlacedElement(element);
// Draw the same path with with a given dash pattern.
// This path is should be only stroked.
element.SetPathFill(false);
gstate.SetStrokeColor(new ColorPt(0, 0, 1)); // blue
gstate.SetTransform(0.5, 0, 0, 0.5, 280, 0);
double[] dash_pattern = {30};
gstate.SetDashPattern(ref dash_pattern, 0);
writer.WritePlacedElement(element);
writer.End();
// save changes to the current page
doc.PagePushBack(page);
doc.Save(&out.pdf&, PDFDoc.SaveOptions.e_remove_unused);
Note that once the
instantiated using ElementBuilder, you have full control over its
properties and .
Page content can also come from existing pages. For example, you can
use ElementReader to read paths, text, and images from existing pages
and copy them to the current page. Note that, along the way, you can
fully modify an Element&#39;s properties or its graphics state.
is how to perform page content editing.
For example, the following
copies all Elements except images from an existing page and changes
text color to blue:
ElementWriter writer = new ElementWriter();
ElementReader reader = new ElementReader();
reader.Begin(doc.PageBegin().Current());
Page new_page = doc.PageCreate(new Rect(0, 0, 612, 794));
doc.PagePushBack(new_page);
writer.Begin(new_page);
while ((element = reader.Next()) != null)
if (element.GetType() == Element.ElementType.e_text)
// Set all text to blue color.
GState gs = element.GetGState();
gs.SetFillColorSpace(
ColorSpace.CreateDeviceRGB());
gs.SetFillColor(new ColorPt(0, 0, 1));
else if (element.GetType()
== Element.ElementType.e_image)
// remove all images
writer.WriteElement(element);
writer.End();
reader.End();
A PDF document may display a document outline on the screen,
allowing the user to navigate interactively from one part of the
document to another. The outline consists of a tree-structured
hierarchy of Bookmarks (sometimes called outline items), which serve as
a &visual table of contents& to display the document&#39;s
structure to the user.
Each Bookmark has a title that appears on screen, and an Action that
specifies what happens when a user clicks on the Bookmark.
The typical
Action for a user-created Bookmark is to move to another location in
the current document & although any Action can be specified.
While it&#39;s possible to work with outline items using the SDF API
(See section 8.2.2, &Document Outline&, in the PDF Reference
Manual for more details), PDFNet simplifies the process by providing
the high-level utility class PDF.Bookmark.
You can use Bookmark.GetNext(), Bookmark.GetPrev(),
Bookmark.GetFirstChild() and Bookmark.GetLastChild() to navigate the
whole outline tree, as the following shows:
// Prints out the outline tree to the standard output
void PrintIdent(Bookmark item)
int ident = item.GetIdent() - 1;
for (int i=0; i & ++i)
Console.Write(&
void PrintOutlineTree(Bookmark item)
for (; item.IsValid(); item=item.GetNext())
PrintIdent(item);
Console.WriteLine(&{0:s}{1:s}&,
(item.IsOpen() ? &- & : &+ &), item.GetTitle());
if (item.HasChildren())
// Recursively print child subtrees
PrintOutlineTree(item.GetFirstChild());
static void Main(string[] args)
PDFDoc doc = new PDFDoc(&../../../Data/out1.pdf&);
doc.InitSecurityHandler();
Bookmark root = doc.GetFirstBookmark();
PrintOutlineTree(root);
Note that we obtain the root Bookmark using GetFirstBookmark(). If
GetFirstBookmark() returns an invalid Bookmark (where
GetFirstBookmark().IsValid() is false), the document has no outline
A new outline three can be created as follows:
PDFDoc doc = new PDFDoc(&../Data/in.pdf&);
doc.InitSecurityHandler();
Bookmark myitem = Bookmark.Create(doc, &My Item&);
doc.AddRootBookmark(myitem);
Sub-items can be added using the Bookmark.AddChild() method:
Bookmark sub_item = myitem.AddChild(&My Sub-Item&);
myitem.AddChild(&My Sub-Item 2&);
Note that a Bookmark can be associated with different kinds of
Actions. The most common action is to move to another location in the
current document. This type of Action is called a Destination Action
(See section 8.2.1, &Destinations&, in the PDF Reference
Manual for more details). The following sample creates a new page
Destination and sets the Bookmark&#39;s action:
// The following example creates an &#39;explicit&#39; destination
Destination dest = Destination.CreateFit(doc.GetPage(1));
Action action = Action.CreateGoto(dest);
myitem.SetAction(action);
Using PDFNet, it is also possible to quickly create named
destinations (see section 8.2.1 &Destinations& in the PDF
Reference for more details). Named destinations have an advantage over
explicit destinations & they allow the location of the
destination to change without invalidating existing links.
To create a named destination, pass the key (under which the
destination will be stored) to the Action.Create() method:
Action blue_action
= Action.CreateGoto(
Destination.CreateFit(doc.GetPage(1));
The Bookmarks class also allows you to quickly find Bookmarks based
on the title text. For example, the following code snippet looks for a
Bookmark called foo and then removes it from the outline
Bookmark foo = doc.GetFirstBookmark().Find(&foo&);
if (foo.IsValid())
foo.Delete();
The Bookmark API allows you to change any property on an outline
item, including title text, action, color, and formatting. Color and
other formatting can help readers navigate large PDF documents more
The following code adjusts color and formatting properties on
three Bookmark items:
red.SetColor(1, 0, 0);
green.SetColor(0, 1, 0);
// use bold font for green title text
green.SetFlags(2);
blue.SetColor(0, 0, 1);
// use bold and italic font for blue title text
blue.SetFlags(3);
An interactive form (sometimes referred to as an AcroForm)
is a collection of fields (such as text boxes, checkboxes, radio
buttons, drop-down lists, and pushbuttons) for gathering information
interactively from the user. A PDF document may contain any number of
Fields appearing on any combination of pages. All these fields make up
a single, global interactive form spanning the entire document. While
PDF forms are similar to HTML forms, there are some important
differences:
Unlike HTML pages, a PDF document has a single, global
interactive form spanning the entire document.
In PDF documents, the field and value appearance can be
completely customized.
Although field appearances give incredible
customization power to PDF forms, developers need to learn to work
with forms where field&#39;s value and appearance are two different
The PDF format supports combo boxes with text editing.
In the PDF format, it&#39;s possible to associate fields with
different kinds of Actions (or Action chains).
PDFNet fully supports reading, writing, and editing PDF forms and
provides utility methods to make working with forms simple and
efficient. Using the PDFNet forms API, arbitrary subsets of form fields
can be imported or exported from the document, new forms can be created
from scratch, and the appearance of existing forms can be modified.
The form shown in the following figure consists of a number of
Every field has its name and value, as well as its annotation appearance.
In the PDFNet SDK, Fields are accessed through FieldIterators.
For example, the list of all Fields present in the document can be
traversed using the following code snippet:
for(itr=doc.GetFieldIterator(); itr.HasNext(); itr.Next())
Field field = itr.Current();
Console.WriteLine(&Field name: {0}&,field.GetName());
PDF offers six different field types. Each type of form field
is used for a different purpose, and they have different properties,
appearances, options, and actions that can be associated with the
fields. In this section, we will explain how to create all the seven
field types and some attributes specific to each one.
Common field types are text-box, checkbox, radio-button, combo-box,
and push-button. To find out the type of the Field use Field.GetType()
Field.FieldType type = field.GetType();
switch(type)
case Field.FieldType.e_button:
Console.WriteLine(&Button&);
case Field.FieldType.e_check:
Console.WriteLine(&Check&);
case Field.FieldType.e_radio:
Console.WriteLine(&Radio&);
case Field.FieldType.e_text:
Console.WriteLine(&Text&);
case Field.FieldType.e_choice:
Console.WriteLine(&Choice&);
case Field.FieldType.e_signature:
Console.WriteLine(&Signature&);
Regardless of which field type you create, you must provide a Field
Field myfiled
= doc.FieldCreate(
&address&,
Field.FieldType.e_text);
Under most circumstances, field names must be unique. If you
have a field you name as &address& and you create a second
field you likewise call &address&, you cannot supply
different data in the two fields.
Field names can use alphanumeric characters to identify a
field. All field names are case-sensitive. For example, you can use
names such as empFirstName, empSecondName, empNumber, and so on for a
group of fileds that are related to the same concept (in our sample
employee entity).
Another technique for naming fields is to use a parent and child
For example, you could name the above fields as follows:
employee.name.first, employee.name.second, employee.number.
This naming convention is not only useful for organizing purposes
but is well-suited for automatic operations on Fields.
In the PDFNet
SDK, Field.GetName() returns a string representing the fully qualified
name of the field (e.g. &employee.name.first&).
To get the
child name (&first&) use the Field.GetPartialName() method.
For more information about adding Fields, see the FDF code
Form Fields can be populated using the Field.SetValue() method:
field.SetValue(&New Value&);
// Regenerate appearance stream.
field.RefreshAppearance();
Note that, after modifying the Field&#39;s value, we refreshed its
appearance stream. In the PDF format, Field&#39;s value and appearance
are two different entities.
Therefore, if you don&#39;t call
RefreshAppearance(), the initial value on a PDF page will remain
unchanged & it may have retain the old value or it may be blank.
One approach used by other PDF libraries is to let the PDF
viewer automatically pre-generate appearance streams by setting the
&#39;NeedAppearances&#39; flag in AcroForm dictionary:
doc.GetAcroForm().PutBool(&NeedAppearances&, true);
This will force viewer applications to auto-generate appearance
streams every time the document is opened.
This method is unreliable
& Acrobat does not always generate appearance streams correctly.
Another disadvantage of this approach is that the user will always be
prompted to save the document even if the document was never
Field.GetValueAsString() returns the field&#39;s value as a string.
The value returned varies based on the field type.
A text field
type varies depending on the field type. A text field will return a string:
if (type == Field.FieldType.e_text
&& field.GetValue())
Console.WriteLine(&Field value: {0}&, field.GetValueAsString());
Console.WriteLine(&Field is blank&);
Other field types, such as check boxes and radio buttons, can also
return text from GetValueAsString().
Similarly, the
Field.GetValueAsString() method is available.
Form flattening refers to the operation that changes
active form fields into a static area that is part of the PDF document,
just like other text and images in the document. A completely flattened
PDF form does not have any widget annotations or interactive fields.
Using the Field.Flatten() or Page.FlattenField() methods, it is
possible to merge individual field appearances with the page content.
PDFNet also allows you to flatten all forms in the document in a single
function call, with PDFDoc.FlattenAnnotations(true). (The true argument
instructs PDFNet to were this method passed false,
or no argument, it would flatten all annotations as well as all
Note that it is not possible to undo the Field.Flatten() operation.
An alternative approach, one that can be programmatically reversed,
would be to set the field as read only using the
Field.SetFlag(Field.e_read_only, true) method.
The security mechanism for the high-level document works in the same
way as with an SDF document. To secure a document, use the
PDFDoc.SetSecurityHandler() method. To open a secured document, call
PDFDoc.InitSecurityHandler().
To open a document with a password, call
PDFDoc.InitStdSecurityHandler().
The following table lists security permissions available through the
standard security handler:
PDFNet uses exceptions to report exceptional program states and
corrupt input. For example:
PDFDoc doc = new PDFDoc(&file.pdf&);
doc.InitSecurityHandler();
doc.GetPage(9999999);
catch (PDFNetException e)
Console.WriteLine(e.Message);
Next Steps:

我要回帖

更多关于 免洗消毒液 的文章

 

随机推荐