Henrique Nakashima | 5f7b8f4 | 2017-08-15 14:44:35 -0400 | [diff] [blame] | 1 | # SafetyNet - Performance regression detection for PDFium |
| 2 | |
| 3 | [TOC] |
| 4 | |
| 5 | This document explains how to use SafetyNet to detect performance regressions |
| 6 | in PDFium. |
| 7 | |
| 8 | ## Comparing performance of two versions of PDFium |
| 9 | |
| 10 | safetynet_compare.py is a script that compares the performance between two |
| 11 | versions of pdfium. This can be used to verify if a given change has caused |
| 12 | or will cause any positive or negative changes in performance for a set of test |
| 13 | cases. |
| 14 | |
| 15 | The supported profilers are exclusive to Linux, so for now this can only be run |
| 16 | on Linux. |
| 17 | |
| 18 | An illustrative example is below, comparing the local code version to an older |
| 19 | version. Positive % changes mean an increase in time/instructions to run the |
| 20 | test - a regression, while negative % changes mean a decrease in |
| 21 | time/instructions, therefore an improvement. |
| 22 | |
| 23 | ``` |
| 24 | $ testing/tools/safetynet_compare.py ~/test_pdfs --branch-before beef5e4 |
| 25 | ================================================================================ |
| 26 | % Change Time after Test case |
| 27 | -------------------------------------------------------------------------------- |
| 28 | -0.1980% 45,703,820,326 ~/test_pdfs/PDF Reference 1-7.pdf |
| 29 | -0.5678% 42,038,814 ~/test_pdfs/Page 24 - PDF Reference 1-7.pdf |
| 30 | +0.2666% 10,983,158,809 ~/test_pdfs/Rival.pdf |
| 31 | +0.0447% 10,413,890,748 ~/test_pdfs/dynamic.pdf |
| 32 | -7.7228% 26,161,171 ~/test_pdfs/encrypted1234.pdf |
| 33 | -0.2763% 102,084,398 ~/test_pdfs/ghost.pdf |
| 34 | -3.7005% 10,800,642,262 ~/test_pdfs/musician.pdf |
| 35 | -0.2266% 45,691,618,789 ~/test_pdfs/no_metadata.pdf |
| 36 | +1.4440% 38,442,606,162 ~/test_pdfs/test7.pdf |
| 37 | +0.0335% 9,286,083 ~/test_pdfs/testbulletpoint.pdf |
| 38 | ================================================================================ |
| 39 | Test cases run: 10 |
| 40 | Failed to measure: 0 |
| 41 | Regressions: 0 |
| 42 | Improvements: 2 |
| 43 | ``` |
| 44 | |
| 45 | ### Usage |
| 46 | |
| 47 | Run the safetynet_compare.py script in testing/tools to perform a comparison. |
| 48 | Pass one or more paths with test cases - each path can be either a .pdf file or |
| 49 | a directory containing .pdf files. Other files in those directories are |
| 50 | ignored. |
| 51 | |
| 52 | The following comparison modes are supported: |
| 53 | |
| 54 | 1. Compare uncommitted changes against clean branch: |
| 55 | ```shell |
| 56 | $ testing/tools/safetynet_compare.py path/to/pdfs |
| 57 | ``` |
| 58 | |
| 59 | 2. Compare current branch with another branch or commit: |
| 60 | ```shell |
| 61 | $ testing/tools/safetynet_compare.py path/to/pdfs --branch-before another_branch |
| 62 | $ testing/tools/safetynet_compare.py path/to/pdfs --branch-before 1a3c5e7 |
| 63 | ``` |
| 64 | |
| 65 | 3. Compare two other branches or commits: |
| 66 | ```shell |
| 67 | $ testing/tools/safetynet_compare.py path/to/pdfs --branch-after another_branch --branch-before yet_another_branch |
| 68 | $ testing/tools/safetynet_compare.py path/to/pdfs --branch-after 1a3c5e7 --branch-before 0b2d4f6 |
| 69 | $ testing/tools/safetynet_compare.py path/to/pdfs --branch-after another_branch --branch-before 0b2d4f6 |
| 70 | ``` |
| 71 | |
| 72 | 4. Compare two build flag configurations: |
| 73 | ```shell |
| 74 | $ gn args out/BuildConfig1 |
| 75 | $ gn args out/BuildConfig2 |
| 76 | $ testing/tools/safetynet_compare.py path/to/pdfs --build-dir out/BuildConfig2 --build-dir-before out/BuildConfig1 |
| 77 | ``` |
| 78 | |
| 79 | safetynet_compare.py takes care of checking out the appropriate branch, building |
| 80 | it, running the test cases and comparing results. |
| 81 | |
| 82 | ### Profilers |
| 83 | |
| 84 | safetynet_compare.py uses callgrind as a profiler by default. Use --profiler |
| 85 | to specify another one. The supported ones are: |
| 86 | |
| 87 | #### perfstat |
| 88 | |
| 89 | Only works on Linux. |
| 90 | Make sure you have perf by typing in the terminal: |
| 91 | ```shell |
| 92 | $ perf |
| 93 | ``` |
| 94 | |
| 95 | This is a fast profiler, but uses sampling so it's slightly inaccurate. |
| 96 | Expect variations of up to 1%, which is below the cutoff to consider a |
| 97 | change significant. |
| 98 | |
| 99 | Use this when running over large test sets to get good enough results. |
| 100 | |
| 101 | #### callgrind |
| 102 | |
| 103 | Only works on Linux. |
| 104 | Make sure valgrind is installed: |
| 105 | ```shell |
| 106 | $ valgrind |
| 107 | ``` |
| 108 | |
Henrique Nakashima | 5f7b8f4 | 2017-08-15 14:44:35 -0400 | [diff] [blame] | 109 | This is a slow and accurate profiler. Expect variations of around 100 |
| 110 | instructions. However, this takes about 50 times longer to run than perf stat. |
| 111 | |
| 112 | Use this when looking for small variations (< 1%). |
| 113 | |
| 114 | One advantage is that callgrind can generate `callgrind.out` files (by passing |
| 115 | --output-dir to safetynet_compare.py), which contain profiling information that |
| 116 | can be analyzed to find the cause of a regression. KCachegrind is a good |
| 117 | visualizer for these files. |
| 118 | |
Henrique Nakashima | 7348681 | 2018-10-10 19:02:54 +0000 | [diff] [blame] | 119 | #### none |
| 120 | |
| 121 | Run without any profiler, giving a performance score of 1 always. useful for |
| 122 | running image comparisons or debugging the script. |
| 123 | |
Henrique Nakashima | 5f7b8f4 | 2017-08-15 14:44:35 -0400 | [diff] [blame] | 124 | ### Common Options |
| 125 | |
| 126 | Arguments commonly passed to safetynet_compare.py. |
| 127 | |
| 128 | * --profiler: described above. |
| 129 | * --build-dir: this specified the build config with a relative path from the |
| 130 | pdfium src directory to the build directory. Defaults to out/Release. |
| 131 | * --output-dir: where to place the profiling output files. These are |
| 132 | callgrind.out.[test_case] files for callgrind, perfstat does not produce them. |
| 133 | By default they are not written. |
| 134 | * --case-order: sort test case results according to this metric. Can be "after", |
| 135 | "before", "ratio" and "rating". If not specified, sort by path. |
| 136 | * --this-repo: use the repository where the script is instead of checking out a |
| 137 | temporary one. This is faster and does not require downloads. Although it |
| 138 | restores the state of the local repo, if the script is killed or crashes the |
| 139 | uncommitted changes can remain stashed and you may be on another branch. |
| 140 | |
| 141 | ### Other Options |
| 142 | |
| 143 | Most of the time these don't need to be used. |
| 144 | |
| 145 | * --build-dir-before: if comparing different build dirs (say, to test what a |
| 146 | flag flip does), specify the build dir for the “before” branch here and the |
| 147 | build dir for the “after” branch with --build-dir. |
| 148 | * --interesting-section: only the interesting section should be measured instead |
| 149 | of all the execution of the test harness. This only works in debug, since in |
| 150 | release the delimiters are stripped out. This does not work to compare branches |
| 151 | that don’t have the callgrind delimiters, as it would otherwise be unfair to |
| 152 | compare a whole run vs the interesting section of another run. |
| 153 | * --machine-readable: output a json with the results that is easier to read by |
| 154 | code. |
| 155 | * --num-workers: how many workers to use to parallelize test case runs. Defaults |
| 156 | to # of CPUs in the machine. |
| 157 | * --threshold-significant: highlight differences that exceed this value. |
| 158 | Defaults to 0.02. |
| 159 | * --tmp-dir: directory in which temporary repos will be cloned and downloads |
| 160 | will be cached, if --this-repo is not enabled. Defaults to /tmp. |
| 161 | |
| 162 | ## Setup a nightly job |
| 163 | |
Henrique Nakashima | 7eccfb6 | 2018-10-10 16:55:00 +0000 | [diff] [blame] | 164 | Create a separate checkout of pdfium in a new directory, for example `~/job`. |
| 165 | The safetynet_job.py script will run from this directory. This checkout needs to |
| 166 | be `git pull`'ed when there are changes to the SafetyNet scripts, but otherwise |
| 167 | it can be left alone. |
| 168 | |
| 169 | Create a directory to contain the job results, for example `~/job_results`. In |
| 170 | each run, a `.log` file with the results will be written to this directory and a |
| 171 | subdirectory will be created with the other artifacts. |
| 172 | |
| 173 | Setup a cron job to run safetynet_job.py nightly. The example below runs it at |
| 174 | 1:42 AM, over the corpus in two directories: `~/pdf_samples/thousand_pdfs` and |
| 175 | `~/pdf_samples/i18n` |
| 176 | |
| 177 | ```shell |
| 178 | @ crontab -e |
| 179 | 42 1 * * * bash -lc '~/job/pdfium/testing/tools/safetynet_job.py ~/job_results ~/pdf_samples/thousand_pdfs ~/pdf_samples/i18n --output-to-log >> ~/job_results/cron_nightly.log 2>&1' |
| 180 | ``` |
| 181 | |
| 182 | The first time the job runs, it will just create a checkpoint as |
| 183 | `~/job_results/last_revision_covered`. From then on, since a checkpoint is |
| 184 | available, each run will compare performance with the last checkpoint and update |
| 185 | the checkpoint. |
Henrique Nakashima | 7348681 | 2018-10-10 19:02:54 +0000 | [diff] [blame] | 186 | |
| 187 | ## Run image comparison |
| 188 | |
| 189 | Pass the `--png-dir` option pointing at an output directory to compare the output |
| 190 | images from rendering the "before" and the "after" branches with pdfium_test. |
| 191 | |
| 192 | ```shell |
| 193 | $ mkdir ~/output_images |
| 194 | $ testing/tools/safetynet_compare.py ~/pdf_samples --branch-before before_visual_changes --branch-after after_visual_changes --png-dir ~/output_images |
| 195 | ``` |
| 196 | |
| 197 | This will output and automatically open a `~/output_images/compare.html` file |
| 198 | showing the before/after and the diff. Hover the mouse cursor over the |
| 199 | before/after image on the left for an easier visual comparison. The "before" |
| 200 | image is displayed until the cursor hovers over the image, which is then |
| 201 | replaced with the "after" image. |
| 202 | |
| 203 | It is recommended to use `--profiler=none` with this option. |