Face Detection and Feature Extraction Using dlib in Python (with Real-Time Webcam Integration)

Introduction

This tutorial will guide you through using dlib for face detection and feature extraction in Python. We’ll start with detecting faces and extracting facial landmarks from an image, and then extend it to real-time facial feature extraction from a webcam.

Prerequisites

Ensure you have Python installed and the following libraries:

  • dlib
  • opencv-python
  • imutils

Exercise 1

Step 1: Install dlib and Dependencies

Install the required libraries via pip:

pip install dlib opencv-python imutils

If you are on windows, download the wheel and install it.

Step 2: Load and Display an Image

Let’s begin by loading and displaying a sample image. Take an image of your face using your webcam, now rename the image to ‘my_face.jpg’ and save it next to the following script.

import cv2

# Load a sample image
image = cv2.imread('my_face.jpg')

# Display the image
cv2.imshow('Image', image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Step 3: Face Detection with dlib

Now, we will use dlib’s face detector to find faces in the image.

import dlib
import cv2


# Initialize dlib's face detector
detector = dlib.get_frontal_face_detector()
image = cv2.imread('my_face.jpg')

# Convert image to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Detect faces in the image
faces = detector(gray)

# Draw rectangles around detected faces
for face in faces:
    x, y, w, h = face.left(), face.top(), face.width(), face.height()
    cv2.rectangle(image, (x, y), (x+w, y+h), (0, 255, 0), 2)

# Display the output
cv2.imshow('Face Detection', image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Step 4: Facial Landmark Detection

Download the pre-trained shape_predictor_68_face_landmarks.dat file from here. Extract the .dat file and place it in your working directory.

Next, detect facial landmarks.

# Load the landmark predictor
predictor = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat')

# Detect landmarks for each face
for face in faces:
    landmarks = predictor(gray, face)

    # Draw circles for each landmark
    for n in range(0, 68): # here 68 is the number of landmarks
        x = landmarks.part(n).x
        y = landmarks.part(n).y
        cv2.circle(image, (x, y), 2, (255, 0, 0), -1)

# Display the image with landmarks
cv2.imshow('Facial Landmarks', image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Step 5: Extracting The Face

Extract the face only and save it to a new jpg file.

Step 6: Real-Time Feature Extraction from Webcam

Now, let’s extend the functionality to real-time face detection and landmark extraction using your webcam.

Step 6.1: Set up the Webcam

We’ll use OpenCV to access the webcam feed and dlib to perform real-time facial detection and feature extraction.

import cv2
import dlib

# Initialize dlib's face detector and facial landmark predictor
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat')

# Start webcam feed
cap = cv2.VideoCapture(0) # this number may be 0, 1, 2, ... depending on your system config.

while True:
    ret, frame = cap.read()  # Capture frame-by-frame from the webcam
    if not ret:
        break

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)  # Convert to grayscale

    # Detect faces
    faces = detector(gray)

    for face in faces:
        # Detect facial landmarks
        landmarks = predictor(gray, face)

        # Draw bounding box around the face
        x, y, w, h = face.left(), face.top(), face.width(), face.height()
        cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)

        # Draw the facial landmarks
        for n in range(0, 68):
            x = landmarks.part(n).x
            y = landmarks.part(n).y
            cv2.circle(frame, (x, y), 2, (255, 0, 0), -1)

    # Display the frame with detection and landmarks
    cv2.imshow('Webcam Face Detection and Landmarking', frame)

    # Exit the loop when 'q' is pressed
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release the webcam and close windows
cap.release()
cv2.destroyAllWindows()

Step 7: Extract eyes and mouth

Now, we will isolate specific facial features like the eyes and mouth using the detected landmarks. The dlib’s 68 facial landmarks map different parts of the face as follows:

image

Now try to complete the following functinos by extrating the required regions.

Step 3.1: Extract Eyes

We’ll extract the left and right eyes by selecting the respective landmark coordinates, then drawing bounding boxes around them.

# Define a function to extract the eyes based on landmarks
def extract_eyes(landmarks, frame):

    # Fill in your code here after defining the extraction region.

    # Draw rectangles around the eyes
    cv2.rectangle(frame, (lx_min, ly_min), (lx_max, ly_max), (0, 255, 255), 2)  # Left eye
    cv2.rectangle(frame, (rx_min, ry_min), (rx_max, ry_max), (0, 255, 255), 2)  # Right eye

Step 3.2: Extract Mouth

Similarly, we’ll extract the mouth by selecting the landmark points for the mouth (48–67).

# Define a function to extract the mouth based on landmarks
def extract_mouth(landmarks, frame):
   
    
    # Fill in your code here after defining the extraction region.

    # Draw rectangle around the mouth
    cv2.rectangle(frame, (mx_min, my_min), (mx_max, my_max), (0, 0, 255), 2)

Step 3.3: Integrating into the Real-Time Webcam Feed

Now, we integrate the eye and mouth extraction into the real-time webcam detection loop:

import cv2
import dlib

# Initialize dlib's face detector and facial landmark predictor
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat')

# Define functions for extracting eyes and mouth (as defined earlier)

def extract_eyes(landmarks, frame):
    
    # Fill in your code

    cv2.rectangle(frame, (lx_min, ly_min), (lx_max, ly_max), (0, 255, 255), 2)
    cv2.rectangle(frame, (rx_min, ry_min), (rx_max, ry_max), (0, 255, 255), 2)


def extract_mouth(landmarks, frame):

    # Fill in your code

    cv2.rectangle(frame, (mx_min, my_min), (mx_max, my_max), (0, 0, 255), 2)


# Start webcam feed
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Detect faces
    faces = detector(gray)

    for face in faces:
        landmarks = predictor(gray, face)

        # Draw face bounding box and landmarks
        for n in range(0, 68):
            x = landmarks.part(n).x
            y = landmarks.part(n).y
            cv2.circle(frame, (x, y), 2, (255, 0, 0), -1)

        # Extract and highlight eyes and mouth
        extract_eyes(landmarks, frame)
        extract_mouth(landmarks, frame)

    # Show the frame
    cv2.imshow('Real-Time Face and Feature Detection', frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Understanding the Eye Aspect Ratio (EAR)

The EAR is calculated based on the relative distances of the eye landmarks. We use six points around each eye:

  • Points 36–41 for the left eye
  • Points 42–47 for the right eye

The formula for the EAR is:

[ EAR = \frac{||p_2 - p_6|| + ||p_3 - p_5||}{2 \cdot ||p_1 - p_4||} ]

Where ( p_1, p_2, p_3, p_4, p_5, p_6 ) are the eye landmark points.

import cv2
import dlib
from scipy.spatial import distance

# Initialize dlib's face detector and facial landmark predictor
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat')

# Define a function to calculate the Eye Aspect Ratio (EAR)
def eye_aspect_ratio(eye):
    # Fill in code here
    # eye is a 6 element the represents the landmarks p_1 to p_6

# EAR threshold below which a blink is detected
EAR_THRESHOLD = 0.25
# Number of consecutive frames the eye must be below the threshold to count as a blink
CONSEC_FRAMES = 3
frame_counter = 0
blink_counter = 0

# Start webcam feed
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Detect faces
    faces = detector(gray)

    for face in faces:
        landmarks = predictor(gray, face)

        # Get coordinates of the left and right eyes
        left_eye = [(landmarks.part(i).x, landmarks.part(i).y) for i in range(36, 42)]
        right_eye = [(landmarks.part(i).x, landmarks.part(i).y) for i in range(42, 48)]

        # Calculate EAR for both eyes
        left_ear = eye_aspect_ratio(left_eye)
        right_ear = eye_aspect_ratio(right_eye)

        # Average EAR for both eyes
        avg_ear = (left_ear + right_ear) / 2.0

        # If EAR is below the threshold, count frames
        if avg_ear < EAR_THRESHOLD:
            frame_counter += 1
        else:
            if frame_counter >= CONSEC_FRAMES:
                blink_counter += 1
                print(f"Blink detected! Total blinks: {blink_counter}")
            frame_counter = 0

        # Draw the eye landmarks
        #
        #
        #

        # Display the EAR on the frame
        cv2.putText(frame, f"EAR: {avg_ear:.2f}", (30, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2)

    # Show the frame
    cv2.imshow('Eye Blink Detection', frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Explanation of Code

  • EAR Calculation: We calculate the Eye Aspect Ratio for both eyes. When the ratio drops below a threshold, it indicates the eye is closed.
  • Blink Detection: If the EAR stays below the threshold for a certain number of consecutive frames, a blink is registered.
  • Trigger Action: Each blink increments a counter and prints a message. This could be expanded to trigger commands for a robot (e.g., sending a ROS message to start/stop movement).

Adjustments

  • Threshold Adjustment: You may need to adjust the EAR_THRESHOLD and CONSEC_FRAMES depending on lighting conditions and camera resolution.
  • Add ROS Integration: Instead of printing a message, integrate ROS to send commands for a robot. For instance, send a service call to control a TurtleBot.