Face Detection and Feature Extraction Using dlib in Python (with Real-Time Webcam Integration)
Introduction
This tutorial will guide you through using dlib for face detection and feature extraction in Python. We’ll start with detecting faces and extracting facial landmarks from an image, and then extend it to real-time facial feature extraction from a webcam.
Prerequisites
Ensure you have Python installed and the following libraries:
dlib
opencv-python
imutils
Exercise 1
Step 1: Install dlib and Dependencies
Install the required libraries via pip
:
pip install dlib opencv-python imutils
If you are on windows, download the wheel and install it.
Step 2: Load and Display an Image
Let’s begin by loading and displaying a sample image. Take an image of your face using your webcam, now rename the image to ‘my_face.jpg’ and save it next to the following script.
import cv2
# Load a sample image
image = cv2.imread('my_face.jpg')
# Display the image
cv2.imshow('Image', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
Step 3: Face Detection with dlib
Now, we will use dlib’s face detector to find faces in the image.
import dlib
import cv2
# Initialize dlib's face detector
detector = dlib.get_frontal_face_detector()
image = cv2.imread('my_face.jpg')
# Convert image to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Detect faces in the image
faces = detector(gray)
# Draw rectangles around detected faces
for face in faces:
x, y, w, h = face.left(), face.top(), face.width(), face.height()
cv2.rectangle(image, (x, y), (x+w, y+h), (0, 255, 0), 2)
# Display the output
cv2.imshow('Face Detection', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
Step 4: Facial Landmark Detection
Download the pre-trained shape_predictor_68_face_landmarks.dat
file from here. Extract the .dat
file and place it in your working directory.
Next, detect facial landmarks.
# Load the landmark predictor
predictor = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat')
# Detect landmarks for each face
for face in faces:
landmarks = predictor(gray, face)
# Draw circles for each landmark
for n in range(0, 68): # here 68 is the number of landmarks
x = landmarks.part(n).x
y = landmarks.part(n).y
cv2.circle(image, (x, y), 2, (255, 0, 0), -1)
# Display the image with landmarks
cv2.imshow('Facial Landmarks', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
Step 5: Extracting The Face
Extract the face only and save it to a new jpg file.
Step 6: Real-Time Feature Extraction from Webcam
Now, let’s extend the functionality to real-time face detection and landmark extraction using your webcam.
Step 6.1: Set up the Webcam
We’ll use OpenCV to access the webcam feed and dlib to perform real-time facial detection and feature extraction.
import cv2
import dlib
# Initialize dlib's face detector and facial landmark predictor
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat')
# Start webcam feed
cap = cv2.VideoCapture(0) # this number may be 0, 1, 2, ... depending on your system config.
while True:
ret, frame = cap.read() # Capture frame-by-frame from the webcam
if not ret:
break
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # Convert to grayscale
# Detect faces
faces = detector(gray)
for face in faces:
# Detect facial landmarks
landmarks = predictor(gray, face)
# Draw bounding box around the face
x, y, w, h = face.left(), face.top(), face.width(), face.height()
cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
# Draw the facial landmarks
for n in range(0, 68):
x = landmarks.part(n).x
y = landmarks.part(n).y
cv2.circle(frame, (x, y), 2, (255, 0, 0), -1)
# Display the frame with detection and landmarks
cv2.imshow('Webcam Face Detection and Landmarking', frame)
# Exit the loop when 'q' is pressed
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Release the webcam and close windows
cap.release()
cv2.destroyAllWindows()
Step 7: Extract eyes and mouth
Now, we will isolate specific facial features like the eyes and mouth using the detected landmarks. The dlib’s 68 facial landmarks map different parts of the face as follows:
Now try to complete the following functinos by extrating the required regions.
Step 3.1: Extract Eyes
We’ll extract the left and right eyes by selecting the respective landmark coordinates, then drawing bounding boxes around them.
# Define a function to extract the eyes based on landmarks
def extract_eyes(landmarks, frame):
# Fill in your code here after defining the extraction region.
# Draw rectangles around the eyes
cv2.rectangle(frame, (lx_min, ly_min), (lx_max, ly_max), (0, 255, 255), 2) # Left eye
cv2.rectangle(frame, (rx_min, ry_min), (rx_max, ry_max), (0, 255, 255), 2) # Right eye
Step 3.2: Extract Mouth
Similarly, we’ll extract the mouth by selecting the landmark points for the mouth (48–67).
# Define a function to extract the mouth based on landmarks
def extract_mouth(landmarks, frame):
# Fill in your code here after defining the extraction region.
# Draw rectangle around the mouth
cv2.rectangle(frame, (mx_min, my_min), (mx_max, my_max), (0, 0, 255), 2)
Step 3.3: Integrating into the Real-Time Webcam Feed
Now, we integrate the eye and mouth extraction into the real-time webcam detection loop:
import cv2
import dlib
# Initialize dlib's face detector and facial landmark predictor
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat')
# Define functions for extracting eyes and mouth (as defined earlier)
def extract_eyes(landmarks, frame):
# Fill in your code
cv2.rectangle(frame, (lx_min, ly_min), (lx_max, ly_max), (0, 255, 255), 2)
cv2.rectangle(frame, (rx_min, ry_min), (rx_max, ry_max), (0, 255, 255), 2)
def extract_mouth(landmarks, frame):
# Fill in your code
cv2.rectangle(frame, (mx_min, my_min), (mx_max, my_max), (0, 0, 255), 2)
# Start webcam feed
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Detect faces
faces = detector(gray)
for face in faces:
landmarks = predictor(gray, face)
# Draw face bounding box and landmarks
for n in range(0, 68):
x = landmarks.part(n).x
y = landmarks.part(n).y
cv2.circle(frame, (x, y), 2, (255, 0, 0), -1)
# Extract and highlight eyes and mouth
extract_eyes(landmarks, frame)
extract_mouth(landmarks, frame)
# Show the frame
cv2.imshow('Real-Time Face and Feature Detection', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
Exercise 2: Detect Eye Blinks Using dlib in Python
Understanding the Eye Aspect Ratio (EAR)
The EAR is calculated based on the relative distances of the eye landmarks. We use six points around each eye:
- Points 36–41 for the left eye
- Points 42–47 for the right eye
The formula for the EAR is:
[ EAR = \frac{||p_2 - p_6|| + ||p_3 - p_5||}{2 \cdot ||p_1 - p_4||} ]
Where ( p_1, p_2, p_3, p_4, p_5, p_6 ) are the eye landmark points.
Code to Detect Blinks
import cv2
import dlib
from scipy.spatial import distance
# Initialize dlib's face detector and facial landmark predictor
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat')
# Define a function to calculate the Eye Aspect Ratio (EAR)
def eye_aspect_ratio(eye):
# Fill in code here
# eye is a 6 element the represents the landmarks p_1 to p_6
# EAR threshold below which a blink is detected
EAR_THRESHOLD = 0.25
# Number of consecutive frames the eye must be below the threshold to count as a blink
CONSEC_FRAMES = 3
frame_counter = 0
blink_counter = 0
# Start webcam feed
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Detect faces
faces = detector(gray)
for face in faces:
landmarks = predictor(gray, face)
# Get coordinates of the left and right eyes
left_eye = [(landmarks.part(i).x, landmarks.part(i).y) for i in range(36, 42)]
right_eye = [(landmarks.part(i).x, landmarks.part(i).y) for i in range(42, 48)]
# Calculate EAR for both eyes
left_ear = eye_aspect_ratio(left_eye)
right_ear = eye_aspect_ratio(right_eye)
# Average EAR for both eyes
avg_ear = (left_ear + right_ear) / 2.0
# If EAR is below the threshold, count frames
if avg_ear < EAR_THRESHOLD:
frame_counter += 1
else:
if frame_counter >= CONSEC_FRAMES:
blink_counter += 1
print(f"Blink detected! Total blinks: {blink_counter}")
frame_counter = 0
# Draw the eye landmarks
#
#
#
# Display the EAR on the frame
cv2.putText(frame, f"EAR: {avg_ear:.2f}", (30, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2)
# Show the frame
cv2.imshow('Eye Blink Detection', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
Explanation of Code
- EAR Calculation: We calculate the Eye Aspect Ratio for both eyes. When the ratio drops below a threshold, it indicates the eye is closed.
- Blink Detection: If the EAR stays below the threshold for a certain number of consecutive frames, a blink is registered.
- Trigger Action: Each blink increments a counter and prints a message. This could be expanded to trigger commands for a robot (e.g., sending a ROS message to start/stop movement).
Adjustments
- Threshold Adjustment: You may need to adjust the
EAR_THRESHOLD
andCONSEC_FRAMES
depending on lighting conditions and camera resolution. - Add ROS Integration: Instead of printing a message, integrate ROS to send commands for a robot. For instance, send a service call to control a TurtleBot.