Benchmark on LIiveSWEBench or LiveCodeBench

AutoCodeRover could be tested on other benchmarks to see if the updated codebase can handle memorization-proof benchmarks https://github.com/livebench/liveswebench%20https://www.kprize.ai/%20https://livebench.ai/

cross-reference to another newer repo with similar SWE-bot goals https://github.com/smallcloudai/refact/discussions/796