|
1 | | -WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose |
2 | | -=== |
3 | | -**Yijun Zhou and James Gregson - BMVC2020** |
| 1 | +# State of art the Head Pose Estimation in Tensorflow2 |
4 | 2 |
|
| 3 | +This repository includes: |
| 4 | +- ["WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose" (BMVC 2020).](https://www.bmvc2020-conference.com/assets/papers/0907.pdf) adapted from the [original source code](https://github.com/Ascend-Research/HeadPoseEstimation-WHENet). |
5 | 5 |
|
6 | | -**Abstract:** We present an end-to-end head-pose estimation network designed to predict Euler |
7 | | -angles through the full range head yaws from a single RGB image. Existing methods |
8 | | -perform well for frontal views but few target head pose from all viewpoints. This has |
9 | | -applications in autonomous driving and retail. Our network builds on multi-loss approaches |
10 | | -with changes to loss functions and training strategies adapted to wide range |
11 | | -estimation. Additionally, we extract ground truth labelings of anterior views from a |
12 | | -current panoptic dataset for the first time. The resulting Wide Headpose Estimation Network |
13 | | -(WHENet) is the first fine-grained modern method applicable to the full-range of |
14 | | -head yaws (hence wide) yet also meets or beats state-of-the-art methods for frontal head |
15 | | -pose estimation. Our network is compact and efficient for mobile devices and applications. [**ArXiv**](https://arxiv.org/abs/2005.10353) |
16 | 6 |
|
17 | | -## Demo |
18 | | -We provided two use case of the WHENet, image input and video input in this repo. Please make sure you installed all the requirments before running the demo code by `pip install -r requirements.txt`. Additionally, please download the [YOLOv3](https://drive.google.com/file/d/1wGrwu_5etcpuu_sLIXl9Nu0dwNc8YXIH/view?usp=sharing) model for head detection and put it under `yolo_v3/data`. |
| 7 | +- [RetinaFace: Single-stage Dense Face Localisation in the Wild](https://arxiv.org/abs/1905.00641) adapted from https://github.com/StanislasBertrand/RetinaFace-tf2. |
19 | 8 |
|
20 | | -<img src=readme_imgs/video.gif height="220"/> <img src=readme_imgs/turn.JPG height="220"/> |
21 | 9 |
|
22 | | -## Image demo |
23 | | -To run WHENet with image input, please put images and bbox.txt under one folder (E.g. Sample/) and just run `python demo.py`. |
24 | 10 |
|
25 | | -Format of bbox.txt are showed below: |
| 11 | + |
| 12 | + |
| 13 | +<img src=images/output.png height="220"/> |
| 14 | + |
| 15 | + |
| 16 | + |
| 17 | +## Install |
| 18 | + |
| 19 | +You can install this repository with pip (requires python>=3.6); |
| 20 | + |
26 | 21 | ``` |
27 | | -image_name,x_min y_min x_max y_max |
28 | | -mov_001_007585.jpeg,240 0 304 83 |
| 22 | +pip install headpose_estimation |
29 | 23 | ``` |
30 | 24 |
|
31 | | -## Video/Webcam demo |
32 | | -We used [YOLO_v3](https://github.com/qqwweee/keras-yolo3) in the video demo to get the cropped head image. |
33 | | -In order to customize some of the functions we have put the yolo implementation and the pre-trained model in the repo. |
34 | | -[Hollywood head](https://www.di.ens.fr/willow/research/headdetection/) and [Crowdhuman](https://www.crowdhuman.org/) are used to train the head detection YOLO model. |
35 | | -```` |
36 | | -demo_video.py [--video INPUT_VIDEO_PATH] [--snapshot WHENET_MODEL] [--display DISPLAY_OPTION] |
37 | | - [--score YOLO_CONFIDENCE_THRESHOLD] [--iou IOU_THRESHOLD] [--gpu GPU#] [--output OUTPUT_VIDEO_PATH] |
38 | | -```` |
39 | | -Please set `--video ''` for webcam input. |
| 25 | +```bash |
| 26 | +pip install git+https://github.com/geekysethi/headpose_estimation |
| 27 | +``` |
| 28 | + |
| 29 | +You can also install with the `setup.py` |
| 30 | + |
| 31 | +## Simple API with Face Detection |
| 32 | +To perform detection you can simple use the following lines: |
| 33 | + |
| 34 | +```python |
| 35 | + |
| 36 | + |
| 37 | +import cv2 |
| 38 | +from headpose_estimation import Headpose |
| 39 | + |
| 40 | +if __name__ == "__main__": |
| 41 | + |
| 42 | + headpose = Headpose() |
| 43 | + |
| 44 | + img = cv2.imread("path_to_im.jpg") |
| 45 | + detections,image = headpose.run(img) |
| 46 | +``` |
| 47 | + |
| 48 | +This will return a list of dictionary which looks like this `[{'bbox': [xmin, ymin, xmax, ymax], 'yaw': yaw_value, 'pitch': pitch_value, 'roll': roll_value}` |
| 49 | + |
40 | 50 |
|
41 | 51 | ## Dependncies |
42 | 52 | * EfficientNet https://github.com/qubvel/efficientnet |
43 | | -* Yolo_v3 https://github.com/qqwweee/keras-yolo3 |
|
0 commit comments