Better Face Extraction

After writing my last post, it was apparent that better extraction methods were probably necessary for robust video analysis. After combing through the missed detections, we saw ~200 instances where we were able to find a body, but could not find the face associated with that body, suggesting our face detector needs improving. To fix this, I incorporated a more complex model for face extraction that relies on ageitgey’s face_recognition Python library (which can be found here).

Unlike the Haar cascades, the face_recognition library uses a deep learning-based model from the dlib package. Without GPUs, the code (when applied to video) is far from real-time. However, using nvidia-docker, I was able to make use of a GPU I have installed at my workstation to speed up the program run time.

Short anecdote: I originally thought something was wrong since the program was taking around 15 seconds to process each frame, but then I realized (after watching nvidia-smi) that another user on the workstation was using the 0th GPU (by default). I did not know exactly how to tell the program to use the second GPU, but I did know I had run into this problem of sharing GPUs before. I dusted off an old shell script that builds a Docker container and attachs a user-specified GPU; all I needed to do was change the NV_GPU variable from 0 to 1, and voilà! the program worked perfectly.

tl;dr: Consider who else might be using machine resources (GPUs) if you share a comp with friends, co-workers, enemies, etc.

Since I was running through a pre-built Docker image(nvidia/cuda:9.0-cudnn7-devel), I was using Python3 (3.5.2), OpenCV (4.1.0), and face_recognition (1.2.3).

GPU Specs: GP102 [GeForce GTX 1080 Ti].

My full code is available (minus video data) at

Set Up

Using Docker Scripts

Running the shell executable files first uses nvidia-docker to build an image called ‘reid’ then spins up the image with GPU access; I also mounted my workstation’s Research folder to the container.


Using face_recognition

After pip installing the face_recognition library, no external files are needed; this is taken care of if you use the Docker set up. You could also just pip install the package sans containers.

import face_recognition

The below code (rather simple) runs the face detector on each frame and goes in the program’s main while loop. The first argument passes frame data while the second tells the function to use the GPU.

faces = face_recognition.face_locations(frame[:, :, ::-1], model='cnn')

FPS Evaluation

The below graph shows the FPS for the three different techniques we have seen thus far. Even with GPU help, the face_recognition face detection showed the fewest FPS… which was disheartening. However, it was interesting to see that the ‘better’ line demonstrated a lower variance when compared to the Haar-based methods. Despite the drop in FPS, this supposed ‘better’ method reports less false positives (a common critique of the Haar-based approaches). This will be shown in more detail in the accuracy evaluation section.

Accuracy Evaluation

After following a similar methodology from last week’s post, we found values (tp, fn) = (150, 290), which corresponds to an (abysmal) accuracy of 0.3409. I believe this accuracy is so low because the detector is unable to find small faces. More face detectors will be tried in the future.

False Positive Analysis

In my better_face_extraction notebook, I looked at the true negatives and false positives reported by each of the three methods. Though I thought that the Haar cascade-based methods would report more false positives, there were only a few instances in the true face extraction. According to these results shown below, the best performing (highest accuracy and fps) pipeline uses Haar cascades for true face detection.

  • (687, 7) for true face extraction
  • (694, 0) for inferred face extraction
  • (694, 0) for better face extraction

My full code is available at

Patrick Tinsley

Patrick Tinsley

My name is Patrick Tinsley, and I am a graduate student at the University of Notre Dame. My focus is computer vision for surveillance video.