https://github.com/RUCKBReasoning/RESDSQL
^ This repo has an implementation of this paper: https://arxiv.org/pdf/2302.05965.pdf
It details an advanced strategy for generating accurate SQL to answer a natural language query. As far as I understand there's a muli-step sequential process:
- table selection
- column selection
- generation of a SQL skeleton (think madlib for SQL queries lol)
- filling in that SQL skeleton
It would be dope to implement this in our project. It would help us scale up to larger datasets.
Currently, we just do:
- table selection
- generate SQL given these tables and their columns
