Skip to content

Can't get contents inside the text box #417

@kungmo

Description

@kungmo

I made a function which uses pypandoc. It converts docx files well, but I can't get any contents inside the text box.
Here's my codes.

Thank you for developing the wonderful package. :)

def convert_docx_to_text_with_images(file_path):
    # 이미지를 저장할 폴더 이름 지정 (현재 경로의 extracted_media 폴더)
    media_dir = './extracted_media'
    
    if not os.path.exists(file_path):
        return "파일을 찾을 수 없습니다."

    try:
        print(f"변환을 시작합니다... (이미지 저장 경로: {media_dir})")
        
        output = pypandoc.convert_file(
            file_path,
            to='markdown',     # 마크다운 형식으로 변환 (이미지 링크가 포함됨)
            format='docx',
            extra_args=[
                '--wrap=none',
                '--standalone',
                f'--extract-media={media_dir}' 
            ]
        )
        return output

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions