Handle the full-text content in other language

The result of retrieving non English webpage is not encoded well. It returned the strings of hex digits (e.g. "&#x4E2D;&#x65B0;&#x7F51;") instead of encoded text. Is there a way to fix it? I tried the CLI version of Mercury Parser and pass the parameter `--format markdown`, which resulting in correct text. But I have no idea how to add this kind of parameter in calling the mercury-parser-api. Please try the example URLs below to reproduce the problem:
1. https://news.sina.com.cn/c/2021-01-23/doc-ikftssan9988691.shtml
2. http://www.chinanews.com/sh/2021/01-24/9395190.shtml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle the full-text content in other language #249

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Handle the full-text content in other language #249

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions