-
Notifications
You must be signed in to change notification settings - Fork 13.9k
common : add minimalist multi-thread progress bar #17602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
common : add minimalist multi-thread progress bar #17602
Conversation
I intentionally kept the bar simple without specifying part numbers (which ultimately don't matter much) the only thing we care about is tracking progress. Signed-off-by: Adrien Gallouët <angt@huggingface.co>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I missed something, I can't find any references to std::thread in download.cpp - do we actually using more than 1 thread here?
And AFAIK, httplib::Client is single-threaded.
Edit: nevermind, see next comment
| static std::mutex mutex; | ||
| static std::map<std::thread::id, int> lines; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a local variable at first glance, but because it's static, that effectively make it a global variable.
IMO we should avoid global variable if possible. Probably a better way is to make a dedicated class to manage threads and implement the mutex as a member of the class.
|
Hmm, nevermind, I found it: the multi-threads is launched by In this case, I think it's better to introduce a mutex at This is the code where threads are created: std::vector<std::future<bool>> futures_download;
for (auto const & item : urls) {
futures_download.push_back(std::async(std::launch::async, [bearer_token, offline](const std::pair<std::string, std::string> & it) -> bool {
return common_download_file_single(it.first, it.second, bearer_token, offline);
}, item));
}The cleaner idea looks like this (pseudo-code): std::mutex mutex_progress;
std::vector<float> progress(urls.size(), 0); // from 0.0 to 100.0
auto update_progress = [&mutex_progress, &progress](size_t index, float value) {
std::lock_guard<std::mutex> lock(mutex_progress);
progress[index] = value;
// print all progress here
};
std::vector<std::future<bool>> futures_download;
for (auto const & item : urls) {
futures_download.push_back(std::async(std::launch::async, [bearer_token, offline, update_progress](const std::pair<std::string, std::string> & it) -> bool {
return common_download_file_single(it.first, it.second, bearer_token, offline, update_progress);
}, item));
} |
I disagree on that, moving the lock to the caller is error-prone. Statics are generally avoided for good reasons, but in this case it fits perfectly as there is physically only one terminal to share. The only bad thing is the static map, but it keeps the code extremely simple and works for the current usage (avoiding the need to change all prototypes). When wefully remove curl, we can refactor the entire download code for better structure and address this properly. |
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
I intentionally kept the bar simple without specifying part numbers (which ultimately don't matter much) the only thing we care about is tracking progress