Merge pull request #20 from WhatTheFuzz/feature/rename-variables

WhatTheFuzz · web-flow · commit 3893583205c9 · 2022-12-29T13:14:02.000-05:00
Feature/rename variables
diff --git a/README.md b/README.md
@@ -2,10 +2,13 @@
 
 # BinaryNinja-OpenAI
 
-Integrates OpenAI's GPT3 with Binary Ninja via a plugin. Creates a query asking
-"What does this function do?" followed by the instructions in the High Level IL
-function or the decompiled pseudo-C. Returns the response to the user in Binary
-Ninja's console.
+Integrates OpenAI's GPT3 with Binary Ninja via a plugin and currently supports
+two actions:
+
+- Queries OpenAI to determine what a given function does (in Pseudo-C and HLIL).
+  - The results are logged to Binary Ninja's log to assist with RE.
+- Allows users to rename variables in HLIL using OpenAI.
+  - Variable are renamed immediately and the decompiler is reloaded.
 
 ## Installation
 
@@ -60,20 +63,42 @@ You can find your API key at https://beta.openai.com.
 
 ## Usage
 
+### What Does this Function Do?
+
 After installation, you can right-click on any function in Binary Ninja and
-select `Plugins > OpenAI > What Does this Function Do (HLIL)?`. Alternatively,
-select a function in Binary Ninja (by clicking on any instruction in the
-function) and use the menu bar options
-`Plugins > OpenAI > What Does this Function Do (HLIL)?`. If your cursor has
-anything else selected other than an instruction inside a function, `OpenAI`
-will not appear as a selection inside the `Plugins` menu. This can happen if
-you've selected data or instructions that Binary Ninja determined did not belong
-inside of the function.
+select `Plugins > OpenAI > What Does this Function Do (HLIL/Pseudo-C)?`.
+Alternatively, select a function in Binary Ninja (by clicking on any instruction
+in the function) and use the menu bar options `Plugins > OpenAI > ...`. If your
+cursor has anything else selected other than an instruction inside a function,
+`OpenAI` will not appear as a selection inside the `Plugins` menu. This can
+happen if you've selected data or instructions that Binary Ninja determined did
+not belong inside of the function. Additionally, the HLIL options are context
+sensitive; if you're looking at the decompiled results in LLIL, you will not see
+the HLIL options; this is easily fixed by changing the user view to HLIL
+(Pseudo-C should always be visible).
 
 The output will appear in Binary Ninja's Log like so:
 
 ![The output of running the plugin.](https://github.com/WhatTheFuzz/binaryninja-openai/blob/main/resources/output.png?raw=true)
 
+### Renaming Variables
+
+I feel like half of reverse engineering is figuring out variable names (which
+in-turn assist with program understanding). This plugin is an experimental look
+to see if OpenAI can assist with that. Right click on an instruction where a
+variable is initialized and select `OpenAI > Rename Variable (HLIL)`. Watch the
+magic happen. Here's a quick before-and-after.
+
+![Before renaming](https://github.com/WhatTheFuzz/binaryninja-openai/blob/main/resources/rename-before.png?raw=true)
+
+![After renaming](https://github.com/WhatTheFuzz/binaryninja-openai/blob/main/resources/rename-after.png?raw=true)
+
+Renaming variables only works on HLIL instructions that are initializations (ie.
+`HighLevelILVarInit`). You might also want this to support assignments
+(`HighLevelILAssign`), but I did not get great results with this. Most of the
+responses were just `result`. If your experience is different, please submit a
+pull request.
+
 ## OpenAI Model
 
 By default, the plugin uses the `text-davinci-003` model, you can tweak this
diff --git a/__init__.py b/__init__.py
@@ -1,20 +1,28 @@
 from binaryninja import PluginCommand
 from . src.settings import OpenAISettings
-from . src.entry import check_function
+from . src.entry import check_function, rename_variable
 
 # Register the settings group in Binary Ninja to store the API key and model.
 OpenAISettings()
 
-PluginCommand.register_for_high_level_il_function("OpenAI\What Does this Function Do (HLIL)?",
+PluginCommand.register_for_high_level_il_function(r"OpenAI\What Does this Function Do (HLIL)?",
                             "Checks OpenAI to see what this HLIL function does." \
                             "Requires an internet connection and an API key "
                             "saved under the environment variable "
                             "OPENAI_API_KEY or modify the path in entry.py.",
                             check_function)
 
-PluginCommand.register_for_function("OpenAI\What Does this Function Do (Pseudo-C)?",
+PluginCommand.register_for_function(r"OpenAI\What Does this Function Do (Pseudo-C)?",
                             "Checks OpenAI to see what this pseudo-C function does." \
                             "Requires an internet connection and an API key "
                             "saved under the environment variable "
                             "OPENAI_API_KEY or modify the path in entry.py.",
                             check_function)
+
+PluginCommand.register_for_high_level_il_instruction(r"OpenAI\Rename Variable (HLIL)",
+                            "If the current expression is a HLIL Initialization " \
+                            "(HighLevelILVarInit), then query OpenAI to rename the " \
+                            "variable to what it believes is correct. If the expression" \
+                            "is not an HighLevelILVarInit, then do nothing. Requires " \
+                            "an internet connection and an API key. ",
+                            rename_variable)
diff --git a/resources/rename-after.png b/resources/rename-after.png
diff --git a/resources/rename-before.png b/resources/rename-before.png
diff --git a/src/agent.py b/src/agent.py
@@ -1,3 +1,5 @@
+from __future__ import annotations
+from collections.abc import Callable
 import os
 from typing import Optional, Union
 from pathlib import Path
@@ -9,7 +11,8 @@
 from binaryninja.function import Function
 from binaryninja.lowlevelil import LowLevelILFunction
 from binaryninja.mediumlevelil import MediumLevelILFunction
-from binaryninja.highlevelil import HighLevelILFunction
+from binaryninja.highlevelil import HighLevelILFunction, HighLevelILInstruction, \
+                                    HighLevelILVarInit
 from binaryninja.settings import Settings
 from binaryninja import log, BinaryView
 
@@ -19,11 +22,16 @@
 
 class Agent:
 
-    question: str = '''
+    function_question: str = '''
     This is a function that was decompiled with Binary Ninja.
     It is in IL_FORM. What does this function do?
     '''
 
+    rename_variable_question: str = "In one word, what should the variable " \
+        "be for the variable that is assigned to the result of the C " \
+        "expression:\n"
+
+
     # A mapping of IL forms to their names.
     il_name: dict[type, str] = {
         LowLevelILFunction: 'Low Level Intermediate Language',
@@ -34,28 +42,17 @@ class Agent:
 
     def __init__(self,
                 bv: BinaryView,
-                function: Union[Function, LowLevelILFunction,
-                                MediumLevelILFunction, HighLevelILFunction],
                 path_to_api_key: Optional[Path]=None) -> None:
 
         # Read the API key from the environment variable.
         openai.api_key = self.read_api_key(path_to_api_key)
 
-        # Ensure that a function type was passed in.
-        if not isinstance(
-                function,
-            (Function, LowLevelILFunction, MediumLevelILFunction,
-                                                HighLevelILFunction)):
-            raise TypeError(f'Expected a BNIL function of type '
-                            f'Function, LowLevelILFunction, '
-                            f'MediumLevelILFunction, or HighLevelILFunction, '
-                            f'got {type(function)}.')
-
         assert bv is not None, 'BinaryView is None. Check how you called this function.'
         # Set instance attributes.
         self.bv = bv
-        self.function = function
         self.model = self.get_model()
+        # Used for the callback function.
+        self.instruction = None
 
     def read_api_key(self, filename: Optional[Path]=None) -> str:
         '''Checks for the API key in three locations.
@@ -72,7 +69,7 @@ def read_api_key(self, filename: Optional[Path]=None) -> str:
         settings: Settings = Settings()
         if settings.contains('openai.api_key'):
             if key := settings.get_string('openai.api_key'):
-                return key
+                return str(key)
 
         # If the settings don't exist, contain the key, or the key is empty,
         # check the environment variable.
@@ -111,7 +108,7 @@ def get_model(self) -> str:
             if model := settings.get_string('openai.model'):
                 # Check that is a valid model by querying the OpenAI API.
                 if self.is_valid_model(model):
-                    return model
+                    return str(model)
         # Return a valid, default model.
         assert self.is_valid_model('text-davinci-003')
         return 'text-davinci-003'
@@ -124,7 +121,7 @@ def get_token_count(self) -> int:
         if settings.contains('openai.max_tokens'):
             # Check that the value is not None.
             if (max_tokens := settings.get_integer('openai.max_tokens')) is not None:
-                return max_tokens
+                return int(max_tokens)
         return 1_024
 
     def instruction_list(self, function: Union[LowLevelILFunction,
@@ -133,6 +130,15 @@ def instruction_list(self, function: Union[LowLevelILFunction,
         '''Generates a list of instructions in string representation given a
         BNIL function.
         '''
+
+        # Ensure that a function type was passed in.
+        if not isinstance(function, (Function, LowLevelILFunction,
+                            MediumLevelILFunction, HighLevelILFunction)):
+            raise TypeError(f'Expected a BNIL function of type '
+                            f'Function, LowLevelILFunction, '
+                            f'MediumLevelILFunction, or HighLevelILFunction, '
+                            f'got {type(function)}.')
+
         if isinstance(function, Function):
             return Pseudo_C(self.bv, function).get_c_source()
         instructions: list[str] = []
@@ -144,21 +150,64 @@ def generate_query(self, function: Union[Function,
                                             LowLevelILFunction,
                                             MediumLevelILFunction,
                                             HighLevelILFunction]) -> str:
-        '''Generates a query string given a BNIL function. Reads the file
-        prompt.txt and replaces the IL form with the name of the IL form.
+        '''Generates a query string given a BNIL function. Returns the query as
+        a string.
         '''
-        prompt: str = self.question
+        prompt: str = self.function_question
         # Read the prompt from the text file.
-        prompt = self.question.replace('IL_FORM', self.il_name[type(function)])
+        prompt = prompt.replace('IL_FORM', self.il_name[type(function)])
         # Add some new lines. Maybe not necessary.
         prompt += '\n\n'
         # Add the instructions to the prompt.
         prompt += '\n'.join(self.instruction_list(function))
         return prompt
 
-    def send_query(self, query: str) -> None:
+    def generate_rename_variable_query(self,
+                                    instruction: HighLevelILInstruction) -> str:
+        '''Generates a query string given a BNIL instruction. Returns the query
+        as a string.
+        '''
+        if not isinstance(instruction, HighLevelILVarInit):
+            raise TypeError(f'Expected a BNIL instruction of type '
+                            f'HighLevelILVarInit got {type(instruction)}.')
+        # Assign the instruction to the Agent instance. This is used for the
+        # callback function so we don't need to pass in the instruction to the
+        # Query instance. This is kind of janky and should be examined in future
+        # versions.
+        self.instruction = instruction
+
+        prompt: str = self.rename_variable_question
+        # Get the disassembly lines and add them to the prompt.
+        for line in instruction.instruction_operands:
+            prompt += str(line)
+
+        return prompt
+
+    def rename_variable(self, response: str) -> None:
+        '''Renames the variable of the instruction saved in the Agent instance
+        to the response passed in as an argument.
+        '''
+        if self.instruction is None:
+            raise TypeError('No instruction was saved in the Agent instance.')
+        if response is None or response == '':
+            raise TypeError(f'No response was returned from OpenAI; got type {type(response)}.')
+        # Get just one word from the response. Remove spaces and quotes.
+        try:
+            response = response.split()[0]
+            response = response.replace(' ', '')
+            response = response.replace('"', '')
+            response = response.replace('\'', '')
+        except IndexError as error:
+            raise IndexError(f'Could not split the response: `{response}`.') from error
+        # Assign the variable name to the response.
+        log.log_debug(f'Renaming variable in expression {self.instruction} to {response}.')
+        self.instruction.dest.name = response
+
+
+    def send_query(self, query: str, callback: Optional[Callable]=None) -> None:
         '''Sends a query to the engine and prints the response.'''
         query = Query(query_string=query,
                       model=self.model,
-                      max_token_count=self.get_token_count())
+                      max_token_count=self.get_token_count(),
+                      callback_function=callback)
         query.start()
diff --git a/src/entry.py b/src/entry.py
@@ -1,14 +1,44 @@
 from pathlib import Path
 from binaryninja import BinaryView, Function
+from binaryninja.highlevelil import HighLevelILInstruction, HighLevelILVarInit
+from binaryninja.log import log_error
 from . agent import Agent
 
 API_KEY_PATH = Path.home() / Path('.openai/api_key.txt')
 
-def check_function(bv: BinaryView, func: Function) -> bool:
+def check_function(bv: BinaryView, func: Function) -> None:
     agent: Agent = Agent(
         bv=bv,
-        function=func,
         path_to_api_key=API_KEY_PATH
     )
     query: str = agent.generate_query(func)
     agent.send_query(query)
+
+def rename_variable(bv: BinaryView, instruction: HighLevelILInstruction) -> None:
+
+    if not isinstance(instruction, HighLevelILVarInit):
+        log_error(f'Instruction must be of type HighLevelILVarInit, got type: ' \
+                  f'{type(instruction)}')
+        return
+
+    agent: Agent = Agent(
+        bv=bv,
+        path_to_api_key=API_KEY_PATH
+    )
+    query: str = agent.generate_rename_variable_query(instruction)
+    agent.send_query(query=query, callback=agent.rename_variable)
+
+# Difficult to test without a payment method added, given that the rate limits
+# are so low. This should also probably take place in a background task of its
+# own.
+# def rename_all_variables_in_function(bv: BinaryView, func: HighLevelILFunction) -> None:
+#     # Get each instruction in the High Level IL Function.
+#     for instruction in func.instructions:
+#         match instruction:
+#             # Rename the variable if it is a HighLevelILVarInit.
+#             case HighLevelILVarInit():
+#                 rename_variable(bv, instruction)
+#             # Explicit pass for all other cases.
+#             case _ :
+#                 pass
+
diff --git a/src/query.py b/src/query.py
@@ -1,25 +1,37 @@
+from __future__ import annotations
+from collections.abc import Callable
+from typing import Optional
 import openai
 from binaryninja.plugin import BackgroundTaskThread
-
+from binaryninja.log import log_debug, log_info
 
 class Query(BackgroundTaskThread):
 
     def __init__(self, query_string: str, model: str,
-                 max_token_count: int) -> None:
+                 max_token_count: int, callback_function: Optional[Callable]=None) -> None:
         BackgroundTaskThread.__init__(self,
                                       initial_progress_text="",
                                       can_cancel=False)
         self.query_string: str = query_string
         self.model: str = model
         self.max_token_count: int = max_token_count
+        self.callback = callback_function
 
     def run(self) -> None:
         self.progress = "Submitting query to OpenAI."
 
-        response: str = openai.Completion.create(
+        log_debug(f'Sending query: {self.query_string}')
+
+        response = openai.Completion.create(
             model=self.model,
             prompt=self.query_string,
             max_tokens=self.max_token_count,
         )
-        # Notify the user.
-        print(response.choices[0].text)
+        # Get the response text.
+        result: str = response.choices[0].text
+        # If there is a callback, do something with it.
+        if self.callback:
+            self.callback(result)
+        # Otherwise, assume we just want to log it.
+        else:
+            log_info(result)