-
-
Notifications
You must be signed in to change notification settings - Fork 121
Description
Hey!
We are using pypandoc-binary and have noticed that it is erroring during conversion of docx files on 1.16.2 (the same error is not seen on 1.15.0). Unfortunately, I can't share the documents throwing this error, which may make recreation difficult.
Using this simple little script:
import pypandoc
from pathlib import Path
input_file = "./myfile.docx"
file_path = Path(input_file)
file_format = "docx"
doc_content = file_path.read_bytes()
content_string = pypandoc.convert_text(
doc_content,
"html5",
file_format,
extra_args = ["--standalone", "--embed-resources", "--mathjax"]
)
Running with 1.16.2:
Traceback (most recent call last):
File "/home/me/python/pandoc/pandoc1.16.2/script.py", line 10, in <module>
content_string = pypandoc.convert_text(
File "/home/me/python/pandoc/pandoc1.16.2/venv/lib/python3.10/site-packages/pypandoc/__init__.py", line 118, in convert_text
return _convert_input(
File "/home/me/python/pandoc/pandoc1.16.2/venv/lib/python3.10/site-packages/pypandoc/__init__.py", line 528, in _convert_input
raise RuntimeError(
RuntimeError: Pandoc died with exitcode "63" during conversion: couldn't unpack docx container: not enough bytes
Running (same document) with 1.15.0:
[WARNING] This document format requires a nonempty <title> element.
Defaulting to '-' as the title.
To specify a title, use 'title' in metadata or --metadata title="...".
EDIT: I've had another quick look at this today - I can see the version of pandoc changes between these two releases (we're using pypandoc_binary). I've tried running the bundled version of pandoc in 1.16.2 (3.8.2.1) directly, however, cannot reproduce the error.
I've tried doing this as both a direct file reference or catting the file into pandoc:
cat myfile.docx | ./pandoc -f docx -t html5 --standalone --embed-resources --mathjax
./pandoc myfile.docx -f docx -t html5 --standalone --embed-resources --mathjax