Skip to content

Commit 0987500

Browse files
committed
feat: article v2
1 parent 89032a1 commit 0987500

File tree

1 file changed

+16
-15
lines changed

1 file changed

+16
-15
lines changed

custom_pytorch_yolov5/custom_pytorch.md

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
## Introduction
44

55
In this article, we're going to learn how to load a YOLOv5 model into PyTorch, and then augment the detections with three different techniques:
6+
67
1. Sorting Detections
78
2. Cropping and saving detections
89
3. Counting Detected Objects
@@ -26,11 +27,11 @@ But what is the definition of real time? I want every frame that I see in my com
2627

2728
Finally, I'd like to mention my journey through a *very painful* road of finding *a few* bugs on the Windows Operating System and trying to *virtualize* your webcam feed. There's this great plugin that could "replicate" your camera feed into a virtual version that you could use in any program (you could give your computer any program / input and feed it into the webcam stream, so that it looked like your webcam feed was coming from somewhere else), and it was really great:
2829

29-
![OBS great outdated plugin](./images/obs_1.png)
30+
![OBS great outdated plugin](./images/obs_1.PNG)
3031

3132
This was an [OBS (Open Broadcaster Software)](https://obsproject.com/) plugin. OBS is the go-to program to use when you're planning to make a livestream. However, this plugin was discontinued in OBS version 28, and all problems came with this update. I prepared this bug-compilation image so you can feel the pain too:
3233

33-
![OBS bug compilation](./images/obs_errors.png)
34+
![OBS bug compilation](./images/obs_errors.PNG)
3435

3536
So, once we've established that there are several roadblocks that prevent us from happily developing in a stable environment, we finally understand the "why" of this article. Let's begin implementing.
3637

@@ -44,7 +45,7 @@ Technical requirements are Python 3.8 or higher, and PyTorch 1.7 or higher. [Her
4445

4546
First, we will use the `argparse` to include additional parameters to our Python script. I added three optional parameters:
4647

47-
![argparse](./images/argparse.png)
48+
![argparse](./images/argparse.PNG)
4849
> **Note**: the confidence threshold will only display detected objects if the confidence score of the model's prediction is higher than the given value (0.0-1.0).
4950
5051
These argparse parameters' default values can always be modified. The _frequency_ parameter will determine how many frames to detect (e.g. a frame step). If the user specifies any number N, then only 1 frame every N frames will be used for detection. This can be useful if you're expecting your data to be very similar within sequential frames, as the detection of one object, in one frame, will suffice. Specifying this frame step is also beneficial to avoid cost overages when making these predictions (electricity bill beware, or OCI costs if you're using Oracle Cloud).
@@ -53,40 +54,40 @@ After this initial configuration, we're ready to load our custom model. You can
5354

5455
So, we now load the custom weights file:
5556

56-
![loading PyTorch model](./images/load_model.png)
57+
![loading PyTorch model](./images/load_model.PNG)
5758
> **Note**: we specify the model as a custom YOLO detector, and give it the model's weights file as input.
5859
5960
Now, we're ready to get started. We create our main loop, which constantly gets a new image from the input source (either a webcam feed or a screenshot of what we're seeing in our screen) and displays it with the bounding box detections in place:
6061

61-
![main loop](./images/main_loop.png)
62+
![main loop](./images/main_loop.PNG)
6263

6364
The most important function in that code is the `infer()` function, which returns an image and the result object, transformed into a pandas object for your convenience.
6465

65-
![infer 1](./images/infer_1.png)
66+
![infer 1](./images/infer_1.PNG)
6667

6768
In this first part of the function, we obtain a new image from the OpenCV video capture object. You can also find a similar implementation but by taking screenshots, instead of using the webcam feed in [this file](https://github.com/oracle-devrel/devo.publishing.other/custom_pytorch_yolov5/files/lightweight_screen_torch_inference.py)
6869

6970
Now, we need to pass this to our Torch model, which will return bounding box results. However, there's an important consideration to make: since we want our predictions to be as fast as possible, and we know that the mask detection model has been trained with thousands of images of all shapes and sizes, we can consider a technique which will benefit Frames Per Second (FPS) on our program: **rescaling** images into a lower resolution (since my webcam feed had 1080p resolution, and I use a 2560x1440 monitor which causes screenshots to be too detailed, more than I need).
7071
For this, I chose a `SCALE_FACTOR` variable to hold this value (between 0-1). Currently, all images will be downscaled to 640 pixels in width and the respective resolution in height, to maintain the original image's aspect ratio.
7172

72-
![infer 2](./images/infer_2.png)
73+
![infer 2](./images/infer_2.PNG)
7374

7475
Now that we have our downscaled image, we pass it to the model, and it returns the object we wanted:
7576

76-
![infer 3](./images/infer_3.png)
77+
![infer 3](./images/infer_3.PNG)
7778
> **Note**: the `size=640` option tells the model we're going to pass it images with that width, so the model will predict results of those dimensions.
7879
7980
The last thing we do is draw the bounding boxes that we obtained into the image, and return the image to display it later.
8081

81-
![infer return](./images/infer_5.png)
82+
![infer return](./images/infer_5.PNG)
8283

8384
## 1. Sorting Detections
8485

8586
This first technique is the simplest, and can be useful to add value to the standard YOLO functionality in an unique way. The idea is to quickly manipulate the PyTorch-pandas object to sort values according to one of the columns.
8687

8788
For this, I suggest two ideas: sorting by confidence score, or by detection coordinates. To illustrate how any of these techniques are useful, let's look at the following image:
8889

89-
![speed figure](./images/figure_speed.png)
90+
![speed figure](./images/figure_speed.PNG)
9091
> **Note**: this image illustrates how sorting detections can be useful. [(image credits)](https://www.linkedin.com/in/muhammad-moin-7776751a0/)
9192
9293
In the image above, an imaginary line is drawn between both sides of the roadway, in this case **horizontally**. Any object passing from one equator to the other in a specific direction is counted as an "inward" or "downward" vehicle. This can be achieved by specifying (x,y) bounds, and any item in the PyTorch-pandas object that surpasses it in any direction is detected.
@@ -112,16 +113,16 @@ To implement this, we will base everything we do on **bounding boxes**. Our PyTo
112113
113114
An important consideration is that, since we're passing images to our model with a width of 640 pixels, we need to keep our previously-mentioned `SCALE_FACTOR` variable. The problem is that the original image has a higher size than the downscaled image (the one we pass the model), so bounding box detection coordinates will also be downscaled. We need to multiply these detections by the scale factor in order to _draw_ these bounding boxes over the original image; and then display it:
114115

115-
![infer 4](./images/infer_4.png)
116+
![infer 4](./images/infer_4.PNG)
116117

117118
We use the `save_cropped_images()` function to save images, while also accounting for the frequency parameter we set: we'll only save the cropped detections in case the frame is one we're supposed to save.
118119
Inside this function, we will **upscale** bounding box detections. Also, we'll only save the image if the detected image is higher than (x,y) width and height:
119120

120-
![save cropped images](./images/save_cropped_images.png)
121+
![save cropped images](./images/save_cropped_images.PNG)
121122

122123
Last thing we do is save the cropped image with OpenCV:
123124

124-
![save image](./images/save_image.png)
125+
![save image](./images/save_image.PNG)
125126

126127
And we successfully implemented the functionality.
127128

@@ -133,12 +134,12 @@ This last technique we're going to learn about is very straightforward and easy
133134

134135
Depending on the problem, you may want to choose one of these two options. In our case, we'll implement the second option:
135136

136-
![draw 2](./images/draw_1.png)
137+
![draw 2](./images/draw_1.PNG)
137138
> **Note**: to implement the first option, you just need to *increment* the variable every time, instead of setting it. However, you might benefit from looking at implementations like [DeepSORT](https://github.com/ZQPei/deep_sort_pytorch) or [Zero-Shot Tracking](https://github.com/roboflow/zero-shot-object-tracking), which is able to recognize the same object/detection from sequential frames, and only count them as one; not separate entities.
138139
139140
With our newly-created global variable, we'll hold a value of our liking. For example, in the code above, I'm detecting the _`mask`_ class. Then, I just need to draw the number of detected objects with OpenCV, along with the bounding boxes on top of the original image:
140141

141-
![draw 1](./images/draw_2.png)
142+
![draw 1](./images/draw_2.PNG)
142143

143144
As an example, I created this GIF.
144145

0 commit comments

Comments
 (0)