Add a note about FPDFText_GetText() behavior. Explain that the returned results contain characters outside the cropbox, and suggest some APIs to call to determine if that is the case. Bug: pdfium:1842 Change-Id: I2ac79668184f5313a14a3d3d6f3f4a8eda06d678 Reviewed-on: https://pdfium-review.googlesource.com/c/pdfium/+/95251 Reviewed-by: Nigi <nigi@chromium.org> Commit-Queue: Lei Zhang <thestig@chromium.org>

commit: 9b8ab7d0d124b7140e02b1342878933719870833 [log] [tgz]
author: Lei Zhang <thestig@chromium.org> Mon Jul 11 22:55:35 2022 +0000
committer: Pdfium LUCI CQ <pdfium-scoped@luci-project-accounts.iam.gserviceaccount.com> Mon Jul 11 22:55:35 2022 +0000
tree: 940dd7938444ecce42eda098751b754c17b4ab53
parent: 2f0309845646e06dd88701a651522b2a989c0b6d [diff]
diff --git a/public/fpdf_text.h b/public/fpdf_text.h
index 6d4a020..65604d8 100644
--- a/public/fpdf_text.h
+++ b/public/fpdf_text.h

@@ -341,6 +341,10 @@
 //          trailing terminator.
 // Comments:
 //          This function ignores characters without unicode information.
+//          It returns all characters on the page, even those that are not
+//          visible when the page has a cropbox. To filter out the characters
+//          outside of the cropbox, use FPDF_GetPageBoundingBox() and
+//          FPDFText_GetCharBox().
 //
 FPDF_EXPORT int FPDF_CALLCONV FPDFText_GetText(FPDF_TEXTPAGE text_page,
                                                int start_index,
commit	9b8ab7d0d124b7140e02b1342878933719870833	[log] [tgz]
author	Lei Zhang <thestig@chromium.org>	Mon Jul 11 22:55:35 2022 +0000
committer	Pdfium LUCI CQ <pdfium-scoped@luci-project-accounts.iam.gserviceaccount.com>	Mon Jul 11 22:55:35 2022 +0000
tree	940dd7938444ecce42eda098751b754c17b4ab53
parent	2f0309845646e06dd88701a651522b2a989c0b6d [diff]