Skip to content

Commit d154028

Browse files
authored
Update README.md
1 parent 955dd6d commit d154028

File tree

1 file changed

+1
-5
lines changed

1 file changed

+1
-5
lines changed

README.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -83,14 +83,10 @@ and you compare to this one:
8383

8484

8585

86-
This is done on purpose: imagine a word that, between the two versions, changes its location because it goes to the next line, and here you want to see it as unchanged. Therefore, its a feature, not a bug. In the same way, changes in the font (color, formatting, etc.) is ignored in the same way. In other words, changes everything else that is not text (say, images, shapes, etc.) is ignored.
87-
88-
86+
This is done on purpose: imagine a word that, between the two versions, changes its location because it goes to the next line, and here you want to see it as unchanged. Therefore, its a feature, not a bug. Changes in the font (color, formatting, etc.) is ignored in the same way. In other words, changes everything else that is not text (say, images, shapes, etc.) are ignored.
8987

9088
Anyway, you should be aware of this behaviour, because it might not always be what you expect.
9189

92-
93-
9490
Given this script only compares the text within the PDFs, files have to contain text. Scanned PDF needs to be OCRed first (I have no plan to implement OCR, though PyMuPDF supports OCR through Tesseract). I also noted that some PDF generated by printing from browsers might contain "fake" text (meaning that test is actually rendered by shapes that looks like text, but are not text). For being able to compare these PDFs with this script you first need to OCR them as well.
9591

9692

0 commit comments

Comments
 (0)