Function calling evaluation 

I'd like to know if there is any simple way to measure the accuracy of the function call by the model in term of picking the right tool for execution based on the prompt. Or I need to just count the success rate of the function call. 
Thanks for any input.