Modules with non utf8 values in dictionaries can lead to scan aborting exception

When using a modules_callback during a match, the module values are converted to Python. However, the conversion of the "dictionary" type is buggy: it uses `PyDict_SetItemString` with the dictionary key as the key. However, this function expects a utf-8 string, and the dictionary key is not guaranteed to be utf-8.

This can happen with the pe module and the `version_info` dictionary: keys come from the version info of the binary and are not guaranteed at all to be utf-8.

For example, by taking the `mtxex.dll` from yara tests, and simply changing a byte from the version info to 0xFF, I get this result:

```py
import yara
rules = yara.compile(source="""
import "pe"
rule a { condition: true }
""")

def cb(_):
    pass
rules.match("mtxex.dll", modules_callback=cb)
```

gives:

```
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 8: invalid start byte

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: <function cb at 0x7f21c3722340> returned a result with an exception set
```

The ideal fix would be to use bytestrings as keys instead of strings, but that would be a breaking change

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Modules with non utf8 values in dictionaries can lead to scan aborting exception #273

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Modules with non utf8 values in dictionaries can lead to scan aborting exception #273

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions