Haar cascades are a common method used to extract bounding boxes for bodies and faces from video. Though less accurate than deep learning-based detectors, cascades can process frames very quickly even without a GPU. Today, I decided to study the impact of the cascade’s detection parameter minNeighbors on FPS. OpenCV defines the parameter as specifying how many neighbors each candidate rectangle should have to retain it during the extraction phase.
The video file I used contains 1134 frames (~37 sec @ a frame rate of 30 FPS) and was clipped from a longer surveillance video. The snippet features two subjects walking up and down a tunnel hallway.
The program I wrote uses Python3 (3.6.3) and OpenCV (4.1.0). The interactive graphs were generated using Plotly (3.10.0).
The following code sets up Haar cascades for body and face detection using OpenCV’s built-in functions. The xml files can be found at https://github.com/opencv/opencv/tree/master/data/haarcascades or at the repo for this experiment https://github.com/pgtinsley/minNeighbors.
body_cascade = cv2.CascadeClassifier('haarcascade_fullbody.xml') face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
minNeighbors - Body
The following code goes inside the programs’s main while loop, and detects bodies in each frame. I changed the num_neighbors variable to assume integer values 3, 5, and 8; these were chosen arbitrarily. Note that larger values correspond to stricter detection thresholds.
start = time.time() # convert to grayscale gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # detect bodies = body_cascade.detectMultiScale(gray, 1.1, <num_neighbors>) # draw bounding boxes for (x,y,w,h) in bodies: cv2.rectangle(frame, (x,y),(x+w,y+h),(255,0,0),2) # write out resulting image out_video.write(frame) end = time.time()
Using Python’s time library, I recorded how many seconds it takes (for each individual frame) to convert to grayscale, detect bodies, draw the bounding boxes, and write back to file. FPS was calculated by taking the reciprocal of end - start.
minNeighbors - Face
The below code works similarly, but for detecting faces.
start = time.time() # convert to grayscale gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # detect faces = face_cascade.detectMultiScale(gray, 1.1, <num_neighbors>) # draw bounding boxes for (x,y,w,h) in faces: cv2.rectangle(frame, (x,y),(x+w,y+h),(255,0,0),2) # write out resulting image out_video.write(frame) end = time.time()
The below graph shows the FPS associated with converting to grayscale, detecting faces, drawing bounding boxes, and writing back to file for each frame.
# one-way F test from scipy import stats F, p = stats.f_oneway(df['3neighbors'], df['5neighbors'], df['8neighbors'])
In the case of body detection, the above ANOVA (analysis of variance) code returns an F statistic of 1.4989, and a p-value of 0.2235; these two values suggest that the three data streams overlap/fit together well. This can be seen even more clearly in table below, which shows minimal difference in mean and variance between the 3-, 5-, and 8-neighbor groups. From this, I conclude that the impact of the minNeighbors parameter is of little consequence when detecting bodies.
However, in the case of face detection, the scipy code returns an F statistic of 5.3569, and a p-value of 0.0048. These values suggest that at least two of the three data streams are significantly different from each other. Regrettably, one-way F tests do not detail which two differ significantly. However, the 8-neighbor group seems a likely candidate as it shows a much larger variance than the previous groups. For the sake of stability in future experiments, I will likely refrain from using values over 5 when detecting faces.
In order to use F tests in ANOVA, the data streams must be normally distributed; the following code and results show that all data are distributed normally.
for col in df.columns: print(stats.normaltest(df[col])) ### BODY # NormaltestResult(statistic=149.694489301264, pvalue=3.120718981841747e-33) # NormaltestResult(statistic=18.671051303728674, pvalue=8.823334088037103e-05) # NormaltestResult(statistic=28.58292783348486, pvalue=6.212924594111203e-07) ### FACE # NormaltestResult(statistic=118.9999274353767, pvalue=1.4437569370460975e-26) # NormaltestResult(statistic=29.089117926904635, pvalue=4.823677903016947e-07) # NormaltestResult(statistic=241.7485956029695, pvalue=3.1985965235236407e-53)
My full code is available at https://github.com/pgtinsley/minNeighbors.