Skip to content

Conversation

@andsild
Copy link

@andsild andsild commented Jun 5, 2025

Adds an option for cutoff features/predictions to be used, so that you only do predictions on n unlabeled features per round/epoch. All labeled samples will always be included. This is to speed up training in cases where you have many slides with many annotations (in our AML case, several millions). The benefit is speed, the downside may be that feature files no longer contain all the data if a user wants to download them.

The intuition is that I doubt anyone will annotate more than a few thousand samples in any round.

The commits also sneak in a change: labeled samples will occur last in the DSA AL filmstrip now by assigning an low confidence score. Feature files will also have a "used_indices" list that makes it possible to track which features come from which superpixel

I was paranoid about breaking anything, so I have also included tests. The tests verify each step: superpixel generation, feature extraction, training and prediction. There is also one test for the whole pipeline, which uses an MNIST slide and verifies that the predictions achieve > 80 accuracy. The tests can also be used for benchmarking.

Also see #31, which allows anyone to disable CUDA for easier testing on local machines.

I have tested in DSA with my own AML slides and also using the default superpixels from the UI.

andsild added 5 commits June 5, 2025 14:25
This is to speed up AL loop

Not a perfect solution, the UI will now recommend that users predict
"default" for a lot of the labels. But it is a first step to make sure
we can handle large slide with millions of annotations
This may not have been a bug before, but now when indices may not be in
order (since we are using `cutoff`), it becomes relevant
@andsild
Copy link
Author

andsild commented Jun 5, 2025

Some tests may depend on #33

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant