Hi again!
I recommend also to add this dataset:
https://zenodo.org/records/11654546
The data set comprises 3,800 speech audio files of 3 types of upper respiratory tract surgeries and 1 control set. The dataset has an average of 35.51 +- 5.91 audio recordings per patient. It provides valuable resources to the scientific community to systematically investigate the objective effects of upper respiratory tract surgery on voice and speech.
This data set is a complete corpus comprising data from 107 Spanish Castilian speakers. This corpus encompasses voice and speech recordings from both control speakers and patients who underwent upper airway surgical procedures in pre- and post-operative stages. The surgeries in focus include Tonsillectomy, Functional Endoscopic Sinus Surgery, and Septoplasty, all consistently performed by a single surgeon.
There is a paper where the dataset is described:
https://www.nature.com/articles/s41597-024-03540-5
and there is also a github repo where code can be found to preprocess the data and launch some machine learning experiments:
https://github.com/BYO-UPM/CUCO_Database