Commit 2e6ef33
committed
refactor: deduplicate image extraction logic
Extract shared image processing and retrieval logic into reusable helper functions, eliminating ~150 lines of duplicate code between extractImagesFromPage and extractPageContent.
**Changes:**
- Add processImageData() - converts raw PDF.js image data to ExtractedImage
- Add retrieveImageData() - handles image retrieval strategy (commonObjs -> sync -> async with timeout)
- Refactor extractImagesFromPage to use shared helpers
- Refactor extractPageContent to use shared helpers while preserving yPosition
**Benefits:**
- Reduces code duplication by ~150 lines
- Improves maintainability - fixes/improvements in one place
- Increases test coverage from 90.7% to 95.37%
- Consistent error handling and timeout behavior across both functions1 parent 7893cf6 commit 2e6ef33
1 file changed
+164
-266
lines changed
0 commit comments