|
| 1 | +<div align="center"> |
| 2 | + <div align="center"> |
| 3 | + <h1><b>📊RapidTableDetection</b></h1> |
| 4 | + </div> |
| 5 | + <a href=""><img src="https://img.shields.io/badge/Python->=3.8,<3.12-aff.svg"></a> |
| 6 | + <a href=""><img src="https://img.shields.io/badge/OS-Linux%2C%20Mac%2C%20Win-pink.svg"></a> |
| 7 | +<a href="https://semver.org/"><img alt="SemVer2.0" src="https://img.shields.io/badge/SemVer-2.0-brightgreen"></a> |
| 8 | + <a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg"></a> |
| 9 | + <a href="https://github.com/RapidAI/TableStructureRec/blob/c41bbd23898cb27a957ed962b0ffee3c74dfeff1/LICENSE"><img alt="GitHub" src="https://img.shields.io/badge/license-Apache 2.0-blue"></a> |
| 10 | +</div> |
| 11 | + |
| 12 | +### Recent Updates |
| 13 | + |
| 14 | +- **2024.10.15** |
| 15 | + - Completed the initial version of the code, including three modules: object detection, semantic segmentation, and corner direction recognition. |
| 16 | +- **2024.11.2** |
| 17 | + - Added new YOLOv11 object detection models and edge detection models. |
| 18 | + - Increased automatic downloading and reduced package size. |
| 19 | + - Added ONNX-GPU inference support and provided benchmark test results. |
| 20 | + - Added online example usage. |
| 21 | + |
| 22 | +### Introduction |
| 23 | + |
| 24 | +💡✨ RapidTableDetection is a powerful and efficient table detection system that supports various types of tables, including those in papers, journals, magazines, invoices, receipts, and sign-in sheets. |
| 25 | + |
| 26 | +🚀 It supports versions derived from PaddlePaddle and YOLO, with the default model combination requiring only 1.2 seconds for single-image CPU inference, and 0.4 seconds for the smallest ONNX-GPU (V100) combination, or 0.2 seconds for the PaddlePaddle-GPU version. |
| 27 | + |
| 28 | +🛠️ It supports free combination and independent training optimization of three modules, providing ONNX conversion scripts and fine-tuning training solutions. |
| 29 | + |
| 30 | +🌟 The whl package is easy to integrate and use, providing strong support for downstream OCR, table recognition, and data collection. |
| 31 | + |
| 32 | +Refer to the implementation solution of the [2nd place in the Baidu Table Detection Competition](https://aistudio.baidu.com/projectdetail/5398861?searchKeyword=%E8%A1%A8%E6%A0%BC%E6%A3%80%E6%B5%8B%E5%A4%A7%E8%B5%9B&searchTab=ALL), and retrain with a large amount of real-world scenario data. |
| 33 | + \ |
| 34 | +The training dataset is acknowledged. The author works on open-source projects during spare time, please support by giving a star. |
| 35 | + |
| 36 | + |
| 37 | +### Usage Recommendations |
| 38 | + |
| 39 | +- Document scenarios: No perspective rotation, use only object detection. |
| 40 | +- Photography scenarios with small angle rotation (-90~90): Default top-left corner, do not use corner direction recognition. |
| 41 | +- Use the online experience to find the suitable model combination for your scenario. |
| 42 | + |
| 43 | +### Online Experience |
| 44 | +[modelscope](https://www.modelscope.cn/studios/jockerK/RapidTableDetDemo) [huggingface](https://huggingface.co/spaces/Joker1212/RapidTableDetection) |
| 45 | + |
| 46 | +### Effect Demonstration |
| 47 | + |
| 48 | + |
| 49 | + |
| 50 | +### Installation |
| 51 | + |
| 52 | +Models will be automatically downloaded, or you can download them from the repository [modelscope model warehouse](https://www.modelscope.cn/models/jockerK/TableExtractor). |
| 53 | + |
| 54 | +``` python {linenos=table} |
| 55 | +pip install rapid-table-det |
| 56 | +``` |
| 57 | + |
| 58 | +#### Parameter Explanation |
| 59 | + |
| 60 | +Default values: |
| 61 | +- `use_cuda: False`: Enable GPU acceleration for inference. |
| 62 | +- `obj_model_type="yolo_obj_det"`: Object detection model type. |
| 63 | +- `edge_model_type="yolo_edge_det"`: Edge detection model type. |
| 64 | +- `cls_model_type="paddle_cls_det"`: Corner direction classification model type. |
| 65 | + |
| 66 | + |
| 67 | +Since ONNX has limited GPU acceleration, it is still recommended to directly use YOLOX or install PaddlePaddle for faster model execution (I can provide the entire process if needed). |
| 68 | +The PaddlePaddle S model, due to quantization, actually slows down and reduces accuracy, but significantly reduces model size. |
| 69 | + |
| 70 | + |
| 71 | +| `model_type` | Task Type | Training Source | Size | Single Table Inference Time (V100-16G, cuda12, cudnn9, ubuntu) | |
| 72 | +|:---------------------|:---------|:-------------------------------------|:-------|:-------------------------------------| |
| 73 | +| **yolo_obj_det** | Table Object Detection | `yolo11-l` | `100m` | `cpu:570ms, gpu:400ms` | |
| 74 | +| `paddle_obj_det` | Table Object Detection | `paddle yoloe-plus-x` | `380m` | `cpu:1000ms, gpu:300ms` | |
| 75 | +| `paddle_obj_det_s` | Table Object Detection | `paddle yoloe-plus-x + quantization` | `95m` | `cpu:1200ms, gpu:1000ms` | |
| 76 | +| **yolo_edge_det** | Semantic Segmentation | `yolo11-l-segment` | `108m` | `cpu:570ms, gpu:200ms` | |
| 77 | +| `yolo_edge_det_s` | Semantic Segmentation | `yolo11-s-segment` | `11m` | `cpu:260ms, gpu:200ms` | |
| 78 | +| `paddle_edge_det` | Semantic Segmentation | `paddle-dbnet` | `99m` | `cpu:1200ms, gpu:120ms` | |
| 79 | +| `paddle_edge_det_s` | Semantic Segmentation | `paddle-dbnet + quantization` | `25m` | `cpu:860ms, gpu:760ms` | |
| 80 | +| **paddle_cls_det** | Direction Classification | `paddle pplcnet` | `6.5m` | `cpu:70ms, gpu:60ms` | |
| 81 | + |
| 82 | +Execution parameters: |
| 83 | +- `det_accuracy=0.7` |
| 84 | +- `use_obj_det=True` |
| 85 | +- `use_edge_det=True` |
| 86 | +- `use_cls_det=True` |
| 87 | + |
| 88 | +### Quick Start |
| 89 | + |
| 90 | +``` python {linenos=table} |
| 91 | +from rapid_table_det.inference import TableDetector |
| 92 | + |
| 93 | +img_path = f"tests/test_files/chip.jpg" |
| 94 | +table_det = TableDetector() |
| 95 | + |
| 96 | +result, elapse = table_det(img_path) |
| 97 | +obj_det_elapse, edge_elapse, rotate_det_elapse = elapse |
| 98 | +print( |
| 99 | + f"obj_det_elapse:{obj_det_elapse}, edge_elapse={edge_elapse}, rotate_det_elapse={rotate_det_elapse}" |
| 100 | +) |
| 101 | +# Output visualization |
| 102 | +# import os |
| 103 | +# import cv2 |
| 104 | +# from rapid_table_det.utils.visuallize import img_loader, visuallize, extract_table_img |
| 105 | +# |
| 106 | +# img = img_loader(img_path) |
| 107 | +# img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) |
| 108 | +# file_name_with_ext = os.path.basename(img_path) |
| 109 | +# file_name, file_ext = os.path.splitext(file_name_with_ext) |
| 110 | +# out_dir = "rapid_table_det/outputs" |
| 111 | +# if not os.path.exists(out_dir): |
| 112 | +# os.makedirs(out_dir) |
| 113 | +# extract_img = img.copy() |
| 114 | +# for i, res in enumerate(result): |
| 115 | +# box = res["box"] |
| 116 | +# lt, rt, rb, lb = res["lt"], res["rt"], res["rb"], res["lb"] |
| 117 | +# # With detection box and top-left corner position |
| 118 | +# img = visuallize(img, box, lt, rt, rb, lb) |
| 119 | +# # Perspective transformation to extract table image |
| 120 | +# wrapped_img = extract_table_img(extract_img.copy(), lt, rt, rb, lb) |
| 121 | +# cv2.imwrite(f"{out_dir}/{file_name}-extract-{i}.jpg", wrapped_img) |
| 122 | +# cv2.imwrite(f"{out_dir}/{file_name}-visualize.jpg", img) |
| 123 | + |
| 124 | +``` |
| 125 | +### Using PaddlePaddle Version |
| 126 | +You must download the models and specify their locations! |
| 127 | +``` python {linenos=table} |
| 128 | +#(default installation is GPU version, you can override with CPU version paddlepaddle) |
| 129 | +pip install rapid-table-det-paddle |
| 130 | +``` |
| 131 | +```python |
| 132 | +from rapid_table_det_paddle.inference import TableDetector |
| 133 | + |
| 134 | +img_path = f"tests/test_files/chip.jpg" |
| 135 | + |
| 136 | +table_det = TableDetector( |
| 137 | + obj_model_path="models/obj_det_paddle", |
| 138 | + edge_model_path="models/edge_det_paddle", |
| 139 | + cls_model_path="models/cls_det_paddle", |
| 140 | + use_obj_det=True, |
| 141 | + use_edge_det=True, |
| 142 | + use_cls_det=True, |
| 143 | +) |
| 144 | +result, elapse = table_det(img_path) |
| 145 | +obj_det_elapse, edge_elapse, rotate_det_elapse = elapse |
| 146 | +print( |
| 147 | + f"obj_det_elapse:{obj_det_elapse}, edge_elapse={edge_elapse}, rotate_det_elapse={rotate_det_elapse}" |
| 148 | +) |
| 149 | +# more than one table in one image |
| 150 | +# img = img_loader(img_path) |
| 151 | +# file_name_with_ext = os.path.basename(img_path) |
| 152 | +# file_name, file_ext = os.path.splitext(file_name_with_ext) |
| 153 | +# out_dir = "rapid_table_det_paddle/outputs" |
| 154 | +# if not os.path.exists(out_dir): |
| 155 | +# os.makedirs(out_dir) |
| 156 | +# extract_img = img.copy() |
| 157 | +# for i, res in enumerate(result): |
| 158 | +# box = res["box"] |
| 159 | +# lt, rt, rb, lb = res["lt"], res["rt"], res["rb"], res["lb"] |
| 160 | +# # With detection box and top-left corner position |
| 161 | +# img = visuallize(img, box, lt, rt, rb, lb) |
| 162 | +# # Perspective transformation to extract table image |
| 163 | +# wrapped_img = extract_table_img(extract_img.copy(), lt, rt, rb, lb) |
| 164 | +# cv2.imwrite(f"{out_dir}/{file_name}-extract-{i}.jpg", wrapped_img) |
| 165 | +# cv2.imwrite(f"{out_dir}/{file_name}-visualize.jpg", img) |
| 166 | + |
| 167 | +``` |
| 168 | + |
| 169 | +## FAQ (Frequently Asked Questions) |
| 170 | + |
| 171 | +1. **Q: How to fine-tune the model for specific scenarios?** |
| 172 | + - A: Refer to this project, which provides detailed visualization steps and datasets. You can get the PaddlePaddle inference model from [Baidu Table Detection Competition](https://aistudio.baidu.com/projectdetail/5398861?searchKeyword=%E8%A1%A8%E6%A0%BC%E6%A3%80%E6%B5%8B%E5%A4%A7%E8%B5%9B&searchTab=ALL). For YOLOv11, use the official script, which is simple enough, and convert the data to COCO format for training as per the official guidelines. |
| 173 | +2. **Q: How to export ONNX?** |
| 174 | + - A: For PaddlePaddle models, use the `onnx_transform.ipynb` file in the `tools` directory of this project. For YOLOv11, follow the official method, which can be done in one line. |
| 175 | +3. **Q: Can distorted images be corrected?** |
| 176 | + - A: This project only handles rotation and perspective scenarios for table extraction. For distorted images, you need to correct the distortion first. |
| 177 | + |
| 178 | +### Acknowledgments |
| 179 | + |
| 180 | +- [2nd Place Solution in Baidu Table Detection Competition](https://aistudio.baidu.com/projectdetail/5398861?searchKeyword=%E8%A1%A8%E6%A0%BC%E6%A3%80%E6%B5%8B%E5%A4%A7%E8%B5%9B&searchTab=ALL) |
| 181 | +- [WTW Natural Scene Table Dataset](https://tianchi.aliyun.com/dataset/108587) |
| 182 | +- [FinTabNet PDF Document Table Dataset](https://developer.ibm.com/exchanges/data/all/fintabnet/) |
| 183 | +- [TableBank Table Dataset](https://doc-analysis.github.io/tablebank-page/) |
| 184 | +- [TableGeneration Table Auto-Generation Tool](https://github.com/WenmuZhou/TableGeneration) |
| 185 | + |
| 186 | +### Contribution Guidelines |
| 187 | + |
| 188 | +Pull requests are welcome. For major changes, please open an issue to discuss what you would like to change. |
| 189 | + |
| 190 | +If you have other good suggestions and integration scenarios, the author will actively respond and support them. |
| 191 | + |
| 192 | +### Open Source License |
| 193 | + |
| 194 | +This project is licensed under the [Apache 2.0](https://github.com/RapidAI/TableStructureRec/blob/c41bbd23898cb27a957ed962b0ffee3c74dfeff1/LICENSE) open source license. |
| 195 | + |
0 commit comments