Skip to content

Modules with non utf8 values in dictionaries can lead to scan aborting exception #273

@vthib

Description

@vthib

When using a modules_callback during a match, the module values are converted to Python. However, the conversion of the "dictionary" type is buggy: it uses PyDict_SetItemString with the dictionary key as the key. However, this function expects a utf-8 string, and the dictionary key is not guaranteed to be utf-8.

This can happen with the pe module and the version_info dictionary: keys come from the version info of the binary and are not guaranteed at all to be utf-8.

For example, by taking the mtxex.dll from yara tests, and simply changing a byte from the version info to 0xFF, I get this result:

import yara
rules = yara.compile(source="""
import "pe"
rule a { condition: true }
""")

def cb(_):
    pass
rules.match("mtxex.dll", modules_callback=cb)

gives:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 8: invalid start byte

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: <function cb at 0x7f21c3722340> returned a result with an exception set

The ideal fix would be to use bytestrings as keys instead of strings, but that would be a breaking change

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions