Skip to content

Intermittent failure of to_markdown() in lambda #422

@samiam376

Description

@samiam376

On some pages I get the error 'KeyValue' object has no attribute 'reading_order' when trying to export to markdown in lambda.

I'm running a setup where I trigger textract with boto3 textract -> sns topic -> sqs -> lambda where the textractor library exports the markdown and tables to s3.

Some of the pages fail that error on page.to_markdown(). I tried retrieving the job locally with the Textractor client and it works.

I'm using the most recent lambda layer for pdfium as well.

	
2025-03-21T15:51:59.099-07:00
[ERROR]	2025-03-21T22:51:59.099Z	15546c43-451d-47eb-8d9c-c78a3834b88f	Traceback (most recent call last):
  File "/var/task/functions/complete_extraction.py", line 62, in process_md
    md = self.page.to_markdown()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/task/textractor/entities/linearizable.py", line 59, in to_markdown
    return self.get_text(config)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/var/task/textractor/entities/linearizable.py", line 24, in get_text
    text, _ = self.get_text_and_words(config=config)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/task/textractor/entities/page.py", line 169, in get_text_and_words
    page_texts_and_words = [l.get_text_and_words(config) for l in sorted_layouts]
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/task/textractor/entities/layout.py", line 142, in get_text_and_words
    sorted(self.children, key=lambda x: x.reading_order)
  File "/var/task/textractor/entities/layout.py", line 142, in <lambda>
    sorted(self.children, key=lambda x: x.reading_order)
                                        ^^^^^^^^^^^^^^^
AttributeError: 'KeyValue' object has no attribute 'reading_order'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions