-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
1、服务化部署方式如下:
paddlex --install serving && paddlex --serve --pipeline PaddleOCR-VL
2、测试pdf为:
3、不同dpi结果不同:
使用如下方式读取pdf:
doc = fitz.open(pdf_path)
page_index = 0
page = doc[page_index]
pix = page.get_pixmap(dpi=xxx)
(1) dpi为100时,是有解析内容的
{'logId': '45db168a-9913-43e3-ade5-cde8168cddaa', 'result': {'layoutParsingResults': [{'prunedResult': {'model_settings': {'use_doc_preprocessor': False, 'use_layout_detection': True, 'use_chart_recognition': False, 'format_block_content': False}, 'parsing_res_list': [{'block_label': 'table', 'block_content': '<tablxxxxxxxxxxxxx/table>', 'block_bbox': [18, 393, 811, 838], 'block_id': 1, 'block_order': None}], 'layout_det_res': {'boxes': [{'cls_id': 21, 'label': 'table', 'score': 0.6057453751564026, 'coordinate': [18.153076171875, 12.545944213867188, 811.0789184570312, 312.51019287109375]}, {'cls_id': 21, 'label': 'table', 'score': 0.5716715455055237, 'coordinate': [18.54168701171875, 393.09271240234375, 811.5581665039062, 838.2356567382812]}]}}, 'markdown': {'text': "\n", 'images': {}}}], 'dataInfo': {'width': 827, 'height': 1170, 'type': 'image'}}, 'errorCode': 0, 'errorMsg': 'Success'}
(2) 但dpi为200时,解析结果为空
{'logId': '89fe54d0-49a2-4f7f-bc17-bfbe98b07f1d', 'result': {'layoutParsingResults': [{'prunedResult': {'model_settings': {'use_doc_preprocessor': False, 'use_layout_detection': True, 'use_chart_recognition': False, 'format_block_content': False}, 'parsing_res_list': [], 'layout_det_res': {'boxes': []}}, 'markdown': {'text': '', 'images': {}}}], 'dataInfo': {'width': 1653, 'height': 2339, 'type': 'image'}}, 'errorCode': 0, 'errorMsg': 'Success'}
这是为什么?辛苦看下,谢谢~