-
Notifications
You must be signed in to change notification settings - Fork 165
Open
Description
On some pages I get the error 'KeyValue' object has no attribute 'reading_order' when trying to export to markdown in lambda.
I'm running a setup where I trigger textract with boto3 textract -> sns topic -> sqs -> lambda where the textractor library exports the markdown and tables to s3.
Some of the pages fail that error on page.to_markdown(). I tried retrieving the job locally with the Textractor client and it works.
I'm using the most recent lambda layer for pdfium as well.
2025-03-21T15:51:59.099-07:00
[ERROR] 2025-03-21T22:51:59.099Z 15546c43-451d-47eb-8d9c-c78a3834b88f Traceback (most recent call last):
File "/var/task/functions/complete_extraction.py", line 62, in process_md
md = self.page.to_markdown()
^^^^^^^^^^^^^^^^^^^^^^^
File "/var/task/textractor/entities/linearizable.py", line 59, in to_markdown
return self.get_text(config)
^^^^^^^^^^^^^^^^^^^^^
File "/var/task/textractor/entities/linearizable.py", line 24, in get_text
text, _ = self.get_text_and_words(config=config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/task/textractor/entities/page.py", line 169, in get_text_and_words
page_texts_and_words = [l.get_text_and_words(config) for l in sorted_layouts]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/task/textractor/entities/layout.py", line 142, in get_text_and_words
sorted(self.children, key=lambda x: x.reading_order)
File "/var/task/textractor/entities/layout.py", line 142, in <lambda>
sorted(self.children, key=lambda x: x.reading_order)
^^^^^^^^^^^^^^^
AttributeError: 'KeyValue' object has no attribute 'reading_order'
Metadata
Metadata
Assignees
Labels
No labels