Cherry-pick code to improve PDF object tracking

This cherry-picks the following CLs and squashes them into a single CL
for easier merging:

Add RemoveTextObjectWithTwoPagesSharingContentStreamAndResources test

Change-Id: I5fc0f0888d71368c0dd257931e4a1013301f639f
Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/105212
(cherry picked from commit 7bfe65fa528d99232031ac203b2e56da44643db3)

Add RemoveTextObjectWithTwoPagesSharingContentArrayAndResources test

Change-Id: I396d1cf0a9d3da88337c459aa7ef6f6ec189bb1d
Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/105491
(cherry picked from commit f3d0f929f461effa86ac27716bd304ba7d534445)

Switch CPDF_PageContentManager to have a CPDF_Document pointer

Change-Id: Iebc74c98071241ea220a185d43937749b9885c76
Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/105531
(cherry picked from commit 0a111609332bb25f78e7485778f3457648954493)

Fix nits in cpdf_pagecontentmanager.h

Change-Id: I512e5b3d5b976a2485196ae765dd8c1c98275e67
Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/105472
(cherry picked from commit 4e8a0feaafffd19f747e373bd9f98927ddd3a61f)

Encapsulate more in CPDF_PageContentManager

Change-Id: I198bcd4972603989123b01ff53bd455395f26d5b
Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/105530
(cherry picked from commit 06a0689d429fde72152dc6c9c2245b86c045ee8a)

Add some using statements in fpdf_save_embeddertest.cpp

Change-Id: If1b9064532740e5de7074f747c23cc0de878ca0c
Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/105552
(cherry picked from commit 5018240ebf009a67fc9b2613a6f3f8bda1e80c03)

Test trailer generation in FPDFSaveEmbedderTest.SaveLinearizedDoc

Change-Id: I3dfc67b5839af85f73c15555cf23f2fe9b9c687a
Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/105533
(cherry picked from commit 1f4904f680cd7a63d96f072a371ef20ce939cd7a)

Check for removed resources in saved output in RemoveTextObject test

Change-Id: Ia934158b5fd72a42fba0125aa1637d1c980bda3c
Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/105570
(cherry picked from commit 4d48929e3ffc3d43f2b72383a761d3df6e859e57)

Add RemoveTextObjectWithTwoPagesSharingResources test

Change-Id: If14cb333430a5bb11e50fbb6fd86f1898cf5f29a
Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/105590
(cherry picked from commit 01ea024b74dc8fcc16001792ef5b20b53f3959f0)

Save the trailer's object number when parsing

Change-Id: I86f980a09d2214c50412ce65a905dd92ebc85a6a
Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/105610
(cherry picked from commit 1e5dee361ee112e2b152ae7890a0fef567ccc4e9)

Add object tree traversal utility functions

Change-Id: I28817068d50c79f36d12bd50c1910937241194a5
Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/105611
(cherry picked from commit 01b85b79a2bba7e895a18a32dbee840eae81eb31)

Avoid generating PDFs with unreferenced objects

Change-Id: I4c9d447ab745732909c4f7b5c6061886428a92dd
Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/105612
(cherry picked from commit 28f8db4c040f33c1d0955747e2aef11d3803f321)

Keep track of Font and XObject resources

Change-Id: I510e6c51eda28535ed00e87b6e10971f7178122c
Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/105613
(cherry picked from commit 69703b37cc02aceac37b504d34b50f1c3c24302a)

Keep track of ExtGState resources

Change-Id: I786a515b4ddcfa9ea2dccb94d3d7ad6a189ec7ce
Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/105614
(cherry picked from commit 9b9bf7539e7795c44a2b9985232a8cda652b53f0)

Remove a duplicate FPDFEditEmbedderTest test case

Change-Id: Ieb0806520af652501930ad10fcdbfd33b5952c9d
Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/105710
(cherry picked from commit 02516521c768c4a07ffeef6f1f32ad72cfd7c1d1)

Add RemoveTextObjectWithTwoPagesSharingResourcesDict test case

Change-Id: I30d4f35e41c5be78568aab118eb90d460d1c030e
Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/105730
(cherry picked from commit 6c3577ca2ff7f3e6dc498f1b61be384f29db18d3)

Split CPDF_PageContentGenerator::UpdateContentStreams()

Change-Id: I9a688fd486bb851dceedca633856bbe5471b9b71
Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/105732
(cherry picked from commit 31f23b9263ab8de97b6884bca23dc30a3c520e1a)

Do copy-on-write in CPDF_PageContentGenerator

Change-Id: I9e5659421ee6e6d8b7807bc4159fe086f70982ef
Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/105733
(cherry picked from commit 692f0719e4cd4387daca51fb3a0151929648aa11)

Do copy-on-write in CPDF_PageContentManager

Change-Id: I4b52894ab44889bae0df9415542f018c91436c1a
Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/105630
(cherry picked from commit ef30200275bbfdea90782f1a1d62c0474aab0e74)

Bug: chromium:1428724,pdfium:2012
Change-Id: I7148a4d6c30666792ea0c8cc6ae5186495afb343
Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/106190
Reviewed-by: Tom Sepez <tsepez@chromium.org>
29 files changed
tree: dc17a6e598cc1fa6a8b2846152ebd151b0e0dade
  1. build_overrides/
  2. constants/
  3. core/
  4. docs/
  5. fpdfsdk/
  6. fxbarcode/
  7. fxjs/
  8. public/
  9. samples/
  10. skia/
  11. testing/
  12. third_party/
  13. tools/
  14. xfa/
  15. .clang-format
  16. .gitattributes
  17. .gitignore
  18. .gn
  19. .style.yapf
  20. .vpython3
  21. AUTHORS
  22. BUILD.gn
  23. codereview.settings
  24. CONTRIBUTING.md
  25. DEPS
  26. DIR_METADATA
  27. LICENSE
  28. navbar.md
  29. OWNERS
  30. pdfium.gni
  31. PRESUBMIT.py
  32. PRESUBMIT_test.py
  33. PRESUBMIT_test_mocks.py
  34. README.md
README.md

PDFium

Prerequisites

PDFium uses the same build tooling as Chromium. See the platform-specific Chromium build instructions to get started, but replace Chromium's “Get the code” instructions with PDFium's.

CPU Architectures supported

The default architecture for Windows, Linux, and Mac is “x64”. On Windows, “x86” is also supported. GN parameter “target_cpu = "x86"” can be used to override the default value. If you specify Android build, the default CPU architecture will be “arm”.

It is expected that there are still some places lurking in the code which will not function properly on big-endian architectures. Bugs and/or patches are welcome, however providing this support is not a priority at this time.

Google employees

Run: download_from_google_storage --config and follow the authentication instructions. Note that you must authenticate with your @google.com credentials. Enter “0” if asked for a project-id.

Once you've done this, the toolchain will be installed automatically for you in the Generate the build files step below.

The toolchain will be in depot_tools\win_toolchain\vs_files\<hash>, and windbg can be found in depot_tools\win_toolchain\vs_files\<hash>\win_sdk\Debuggers.

If you want the IDE for debugging and editing, you will need to install it separately, but this is optional and not needed for building PDFium.

Get the code

The name of the top-level directory does not matter. In the following example, the directory name is “repo”. This directory must not have been used before by gclient config as each directory can only house a single gclient configuration.

mkdir repo
cd repo
gclient config --unmanaged https://pdfium.googlesource.com/pdfium.git
gclient sync
cd pdfium

On Linux, additional build dependencies need to be installed by running the following from the pdfium directory.

./build/install-build-deps.sh

Generate the build files

PDFium uses GN to generate the build files and Ninja to execute the build files. Both of these are included with the depot_tools checkout.

Selecting build configuration

PDFium may be built either with or without JavaScript support, and with or without XFA forms support. Both of these features are enabled by default. Also note that the XFA feature requires JavaScript.

Configuration is done by executing gn args <directory> to configure the build. This will launch an editor in which you can set the following arguments. By convention, <directory> should be named out/foo, and some tools / test support code only works if one follows this convention. A typical <directory> name is out/Debug.

use_goma = true  # Googlers only. Make sure goma is installed and running first.
is_debug = true  # Enable debugging features.

# Set true to enable experimental Skia backend.
pdf_use_skia = false

pdf_enable_xfa = true  # Set false to remove XFA support (implies JS support).
pdf_enable_v8 = true  # Set false to remove Javascript support.
pdf_is_standalone = true  # Set for a non-embedded build.
is_component_build = false # Disable component build (Though it should work)

For sample applications like pdfium_test to build, one must set pdf_is_standalone = true.

By default, the entire project builds with C++17.

When complete the arguments will be stored in <directory>/args.gn, and GN will automatically use the new arguments to generate build files. Should your files fail to generate, please double-check that you have set use_sysroot as indicated above.

Building the code

You can build the sample program by running: ninja -C <directory> pdfium_test You can build the entire product (which includes a few unit tests) by running: ninja -C <directory> pdfium_all.

Running the sample program

The pdfium_test program supports reading, parsing, and rasterizing the pages of a .pdf file to .ppm or .png output image files (Windows supports two other formats). For example: <directory>/pdfium_test --ppm path/to/myfile.pdf. Note that this will write output images to path/to/myfile.pdf.<n>.ppm. Run pdfium_test --help to see all the options.

Testing

There are currently several test suites that can be run:

  • pdfium_unittests
  • pdfium_embeddertests
  • testing/tools/run_corpus_tests.py
  • testing/tools/run_javascript_tests.py
  • testing/tools/run_pixel_tests.py

It is possible the tests in the testing directory can fail due to font differences on the various platforms. These tests are reliable on the bots. If you see failures, it can be a good idea to run the tests on the tip-of-tree checkout to see if the same failures appear.

Pixel Tests

If your change affects rendering, a pixel test should be added. Simply add a .in or .pdf file in testing/resources/pixel and the pixel runner will pick it up at the next run.

Make sure that your test case doesn't have any copyright issues. It should also be a minimal test case focusing on the bug that renders the same way in many PDF viewers. Try to avoid binary data in streams by using the ASCIIHexDecode simply because it makes the PDF more readable in a text editor.

To try out your new test, you can call the run_pixel_tests.py script:

$ ./testing/tools/run_pixel_tests.py your_new_file.in

To generate the expected image, you can use the make_expected.sh script:

$ ./testing/tools/make_expected.sh your_new_file.pdf

Please make sure to have optipng installed which optimized the file size of the resulting png.

.in files

.in files are PDF template files. PDF files contain many byte offsets that have to be kept correct or the file won't be valid. The template makes this easier by replacing the byte offsets with certain keywords.

This saves space and also allows an easy way to reduce the test case to the essentials as you can simply remove everything that is not necessary.

A simple example can be found here.

To transform this into a PDF, you can use the fixup_pdf_template.py tool:

$ ./testing/tools/fixup_pdf_template.py your_file.in

This will create a your_file.pdf in the same directory as your_file.in.

There is no official style guide for the .in file, but a consistent style is preferred simply to help with readability. If possible, object numbers should be consecutive and /Type and /SubType should be on top of a dictionary to make object identification easier.

Embedding PDFium in your own projects

The public/ directory contains header files for the APIs available for use by embedders of PDFium. The PDFium project endeavors to keep these as stable as possible.

Outside of the public/ directory, code may change at any time, and embedders should not directly call these routines.

Code Coverage

Code coverage reports for PDFium can be generated in Linux development environments. Details can be found here.

Chromium provides code coverage reports for PDFium here. PDFium is located in third_party/pdfium in Chromium‘s source code. This includes code coverage from PDFium’s fuzzers.

Waterfall

The current health of the source tree can be found here.

Community

There are several mailing lists that are setup:

Note, the Reviews and Bugs lists are typically read-only.

Bugs

PDFium uses this bug tracker, but for security bugs, please use Chromium's security bug template and add the “Cr-Internals-Plugins-PDF” label.

Contributing code

See the CONTRIBUTING document for more information on contributing to the PDFium project.