-
Notifications
You must be signed in to change notification settings - Fork 32
Reduce complexity #210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: model-zoo
Are you sure you want to change the base?
Reduce complexity #210
Conversation
layout: Everything not OCR or extra ocr: trocr/cnnrnn models extra: obsolete or niche models
61fab65 to
177d555
Compare
1d3ca0d to
766ed50
Compare
55e7d7a to
4277bc2
Compare
72dc885 to
b161e33
Compare
bertsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised you did not only make -light and -tll defaults, but removed all the other modes entirely.
I was under the impression that perhaps in certain cases these other modes could still serve a purpose. But even if they don't – removing them now (instead of just the extraction and OCR parts) creates much larger diffs, making merge of #206 and my jdeskew branch much more difficult...
| help="upper limit of columns in document image", | ||
| ) | ||
| @click.option( | ||
| "--save_org_scale/--no_save_org_scale", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we get rid of these redundant negative forms.
It is confusing to users, as they don't know what's the default, so to be sure, they must list all things they don't want, making calls quite long.
| help="directory of input images (instead of --image)", | ||
| type=click.Path(exists=True, file_okay=False), | ||
| ) | ||
| @click.option( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| raise ValueError("Must pass either a opencv2 image or an image_path") | ||
| if image_path is not None: | ||
| image = cv2.imread(image_path) | ||
| img_last = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, I suggest merging in 086c188 to get separate run and run_single
| @@ -585,7 +479,7 @@ def resize_image_with_column_classifier(self, is_image_enhanced, img_bin): | |||
|
|
|||
| return img, img_new, is_image_enhanced | |||
|
|
|||
| def resize_and_enhance_image_with_column_classifier(self, light_version): | |||
| def resize_and_enhance_image_with_column_classifier(self): | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this function should now be renamed resize_image_with_column_classifier and the one already existing under that name should be renamed resize_and_enhance_image_with_column_classifier, because in effect of these changes we would have the paradoxical effect that predict_enhancement only gets called from the other one.
|
|
||
| return img_scaled_padded#, label_scaled_padded | ||
|
|
||
| def do_prediction_new_concept_scatter_nd( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I highly recommend against removing this function. It is still experimental, but can further speed up prediction immensely (because patch processing will be on GPU only instead of CPU-GPU back and forth).
| prediction_regions = self.do_prediction(patches, img, model_region, marginal_of_patch_percent=0.1) | ||
| prediction_regions = resize_image(prediction_regions, img_height_h, img_width_h) | ||
| self.logger.debug("exit extract_text_regions") | ||
| return prediction_regions, prediction_regions2 | ||
| return prediction_regions, None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something went wrong here. Did you want to remove that entire (unused) function extract_text_regions? Better remove cleanly instead of cripple it...
| rotation_not_90_func(image_page, textline_mask_tot, text_regions_p, | ||
| table_prediction, slope_deskew) | ||
|
|
||
| text_regions_p_1_n = resize_image(text_regions_p_1_n, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: this section will yield lots of conflicts with #206 ...
| def build_pagexml_no_full_layout( | ||
| self, found_polygons_text_region, | ||
| page_coord, order_of_texts, id_of_texts, | ||
| all_found_textline_polygons, | ||
| all_box_coord, | ||
| found_polygons_text_region_img, | ||
| found_polygons_marginals_left, found_polygons_marginals_right, | ||
| all_found_textline_polygons_marginals_left, all_found_textline_polygons_marginals_right, | ||
| all_box_coord_marginals_left, all_box_coord_marginals_right, | ||
| slopes, slopes_marginals_left, slopes_marginals_right, | ||
| cont_page, polygons_seplines, | ||
| found_polygons_tables, | ||
| **kwargs): | ||
| self, | ||
| *, | ||
| found_polygons_text_region, | ||
| page_coord, | ||
| order_of_texts, | ||
| all_found_textline_polygons, | ||
| all_box_coord, | ||
| found_polygons_text_region_img, | ||
| found_polygons_marginals_left, | ||
| found_polygons_marginals_right, | ||
| all_found_textline_polygons_marginals_left, | ||
| all_found_textline_polygons_marginals_right, | ||
| all_box_coord_marginals_left, | ||
| all_box_coord_marginals_right, | ||
| slopes, | ||
| slopes_marginals_left, | ||
| slopes_marginals_right, | ||
| cont_page, | ||
| polygons_seplines, | ||
| found_polygons_tables, | ||
| ): | ||
| return self.build_pagexml_full_layout( | ||
| found_polygons_text_region, [], | ||
| page_coord, order_of_texts, id_of_texts, | ||
| all_found_textline_polygons, [], | ||
| all_box_coord, [], | ||
| found_polygons_text_region_img, found_polygons_tables, [], | ||
| found_polygons_marginals_left, found_polygons_marginals_right, | ||
| all_found_textline_polygons_marginals_left, all_found_textline_polygons_marginals_right, | ||
| all_box_coord_marginals_left, all_box_coord_marginals_right, | ||
| slopes, [], slopes_marginals_left, slopes_marginals_right, | ||
| cont_page, polygons_seplines, | ||
| **kwargs) | ||
| found_polygons_text_region=found_polygons_text_region, | ||
| found_polygons_text_region_h=[], | ||
| page_coord=page_coord, | ||
| order_of_texts=order_of_texts, | ||
| all_found_textline_polygons=all_found_textline_polygons, | ||
| all_found_textline_polygons_h=[], | ||
| all_box_coord=all_box_coord, | ||
| all_box_coord_h=[], | ||
| found_polygons_text_region_img=found_polygons_text_region_img, | ||
| found_polygons_tables=found_polygons_tables, | ||
| found_polygons_drop_capitals=[], | ||
| found_polygons_marginals_left=found_polygons_marginals_left, | ||
| found_polygons_marginals_right=found_polygons_marginals_right, | ||
| all_found_textline_polygons_marginals_left=all_found_textline_polygons_marginals_left, | ||
| all_found_textline_polygons_marginals_right=all_found_textline_polygons_marginals_right, | ||
| all_box_coord_marginals_left=all_box_coord_marginals_left, | ||
| all_box_coord_marginals_right=all_box_coord_marginals_right, | ||
| slopes=slopes, | ||
| slopes_h=[], | ||
| slopes_marginals_left=slopes_marginals_left, | ||
| slopes_marginals_right=slopes_marginals_right, | ||
| cont_page=cont_page, | ||
| polygons_seplines=polygons_seplines, | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not really in favour of turning these identifiers into proper kwargs. But if you must, then why still repeat all of them (in the delegation pattern), when you could just
def build_pagexml_no_full_layout(**kwargs):
return self.build_pagexml_full_layout(
found_polygons_text_region_h=[],
all_found_textline_polygons_h=[],
all_box_coord_h=[],
found_polygons_drop_capitals=[],
slopes_h=[],
**kwargs)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...or (as discussed at last) even better provide None as default everywhere, so the caller does not need to pass empty lists. Then there will also no longer be any need to differentiate no/full layout calls, only build_pagexml().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I converted this to kwargs only because there are so many arguments to that method now, it was hard to find the errors resulting from the refactoring. I am open to simplifying this again once the dust has settled.
If it's just about reducing conflict, I can reinstate those code branches temporarily - @vahidrezanezhad knows best under what circumstances the original "heavy" approach might still be useful - but we have agreed that the burden of maintaining both is too high to keep "heavy". Let's discuss in a call how I can make your life easier with the git conflicts. |
| @@ -1615,52 +1370,10 @@ def extract_text_regions(self, img, patches, cols): | |||
| img_width_h = img.shape[1] | |||
| model_region = self.model_zoo.get("region_fl") if patches else self.model_zoo.get("region_fl_np") | |||
|
|
|||
| if not patches: | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO for self: Why only delete these if-clauses, deleted from wrong part of the code?
Starting to decouple the functionality:
eynollah layout(which has a dedicatedeynollah ocrcommand)all: contains everythinglayout: everything needed for the - arguably - most used functionality: layout/readingorder/binarization/enhancement (~3.8GB)ocr: TrOCR and CNN/RNN models (~1.8GB)extraeverything else, alternative/older/niche models (~400MB)light_modethe defaultlight_modespecific models toextra.eynollah ocreynollah ocrinto manageable chunks- [ ] Refactorout of scope It's not difficult but error-prone and better done in a separate PR based on thiseynollah ocrto use the PAGE API instead of etree