minNeighbors

Haar cascades are a common method used to extract bounding boxes for bodies and faces from video. Though less accurate than deep learning-based detectors, cascades can process frames very quickly even without a GPU. Today, I decided to study the impact of the cascade’s detection parameter minNeighbors on FPS. OpenCV defines the parameter as specifying how many neighbors each candidate rectangle should have to retain it during the extraction phase.

The video file I used contains 1134 frames (~37 sec @ a frame rate of 30 FPS) and was clipped from a longer surveillance video. The snippet features two subjects walking up and down a tunnel hallway.

The program I wrote uses Python3 (3.6.3) and OpenCV (4.1.0). The interactive graphs were generated using Plotly (3.10.0).

Set Up

The following code sets up Haar cascades for body and face detection using OpenCV’s built-in functions. The xml files can be found at https://github.com/opencv/opencv/tree/master/data/haarcascades or at the repo for this experiment https://github.com/pgtinsley/minNeighbors.

body_cascade = cv2.CascadeClassifier('haarcascade_fullbody.xml')
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')

minNeighbors - Body

The following code goes inside the programs’s main while loop, and detects bodies in each frame. I changed the num_neighbors variable to assume integer values 3, 5, and 8; these were chosen arbitrarily. Note that larger values correspond to stricter detection thresholds.

start = time.time()

# convert to grayscale
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

# detect
bodies = body_cascade.detectMultiScale(gray, 1.1, <num_neighbors>)

# draw bounding boxes
for (x,y,w,h) in bodies:
	cv2.rectangle(frame, (x,y),(x+w,y+h),(255,0,0),2)
    
# write out resulting image
out_video.write(frame)
    
end = time.time()

Using Python’s time library, I recorded how many seconds it takes (for each individual frame) to convert to grayscale, detect bodies, draw the bounding boxes, and write back to file. FPS was calculated by taking the reciprocal of end - start.

minNeighbors - Face

The below code works similarly, but for detecting faces.

start = time.time()

# convert to grayscale
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

# detect
faces = face_cascade.detectMultiScale(gray, 1.1, <num_neighbors>)

# draw bounding boxes
for (x,y,w,h) in faces:
	cv2.rectangle(frame, (x,y),(x+w,y+h),(255,0,0),2)
    
# write out resulting image
out_video.write(frame)
    
end = time.time()

The below graph shows the FPS associated with converting to grayscale, detecting faces, drawing bounding boxes, and writing back to file for each frame.

Takeaways

# one-way F test
from scipy import stats
F, p = stats.f_oneway(df['3neighbors'], df['5neighbors'], df['8neighbors'])

In the case of body detection, the above ANOVA (analysis of variance) code returns an F statistic of 1.4989, and a p-value of 0.2235; these two values suggest that the three data streams overlap/fit together well. This can be seen even more clearly in table below, which shows minimal difference in mean and variance between the 3-, 5-, and 8-neighbor groups. From this, I conclude that the impact of the minNeighbors parameter is of little consequence when detecting bodies.

3neighbors 5neighbors 8neighbors
mean 6.268718 6.232847 6.238072
std 0.569035 0.512565 0.515553
min 1.856827 4.939560 4.911116
25% 5.889520 5.859832 5.859857
50% 6.224236 6.178657 6.181307
75% 6.598960 6.587224 6.597323
max 9.089184 7.710812 7.861804

However, in the case of face detection, the scipy code returns an F statistic of 5.3569, and a p-value of 0.0048. These values suggest that at least two of the three data streams are significantly different from each other. Regrettably, one-way F tests do not detail which two differ significantly. However, the 8-neighbor group seems a likely candidate as it shows a much larger variance than the previous groups. For the sake of stability in future experiments, I will likely refrain from using values over 5 when detecting faces.

3neighbors 5neighbors 8neighbors
mean 6.263116 6.227707 6.302547
std 0.520799 0.496904 0.609897
min 1.971634 4.980596 4.862815
25% 5.924780 5.859245 5.881710
50% 6.238201 6.172629 6.221023
75% 6.588312 6.560227 6.630512
max 8.799104 7.894462 8.920312

In order to use F tests in ANOVA, the data streams must be normally distributed; the following code and results show that all data are distributed normally.

for col in df.columns:
    print(stats.normaltest(df[col]))

### BODY 
# NormaltestResult(statistic=149.694489301264, pvalue=3.120718981841747e-33)
# NormaltestResult(statistic=18.671051303728674, pvalue=8.823334088037103e-05)
# NormaltestResult(statistic=28.58292783348486, pvalue=6.212924594111203e-07)

### FACE
# NormaltestResult(statistic=118.9999274353767, pvalue=1.4437569370460975e-26)
# NormaltestResult(statistic=29.089117926904635, pvalue=4.823677903016947e-07)
# NormaltestResult(statistic=241.7485956029695, pvalue=3.1985965235236407e-53)

My full code is available at https://github.com/pgtinsley/minNeighbors.

Patrick Tinsley

Patrick Tinsley

My name is Patrick Tinsley, and I am a graduate student at the University of Notre Dame. My focus is computer vision for surveillance video.