Skip to content

Error during docx conversion #416

@karlg-dyad

Description

@karlg-dyad

Hey!

We are using pypandoc-binary and have noticed that it is erroring during conversion of docx files on 1.16.2 (the same error is not seen on 1.15.0). Unfortunately, I can't share the documents throwing this error, which may make recreation difficult.

Using this simple little script:

import pypandoc
from pathlib import Path

input_file = "./myfile.docx"
file_path = Path(input_file)
file_format = "docx"

doc_content = file_path.read_bytes()

content_string = pypandoc.convert_text(
    doc_content,
    "html5",
    file_format,
    extra_args = ["--standalone", "--embed-resources", "--mathjax"]
)

Running with 1.16.2:

Traceback (most recent call last):
  File "/home/me/python/pandoc/pandoc1.16.2/script.py", line 10, in <module>
    content_string = pypandoc.convert_text(
  File "/home/me/python/pandoc/pandoc1.16.2/venv/lib/python3.10/site-packages/pypandoc/__init__.py", line 118, in convert_text
    return _convert_input(
  File "/home/me/python/pandoc/pandoc1.16.2/venv/lib/python3.10/site-packages/pypandoc/__init__.py", line 528, in _convert_input
    raise RuntimeError(
RuntimeError: Pandoc died with exitcode "63" during conversion: couldn't unpack docx container: not enough bytes

Running (same document) with 1.15.0:

[WARNING] This document format requires a nonempty <title> element.
  Defaulting to '-' as the title.
  To specify a title, use 'title' in metadata or --metadata title="...".

EDIT: I've had another quick look at this today - I can see the version of pandoc changes between these two releases (we're using pypandoc_binary). I've tried running the bundled version of pandoc in 1.16.2 (3.8.2.1) directly, however, cannot reproduce the error.

I've tried doing this as both a direct file reference or catting the file into pandoc:
cat myfile.docx | ./pandoc -f docx -t html5 --standalone --embed-resources --mathjax
./pandoc myfile.docx -f docx -t html5 --standalone --embed-resources --mathjax

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions