Session Timeout

Your session is about to expire. Do you want to extend the session?

00:

Extend My Session

Article

Prevent facial recognition spoofing with stereo vision

By Cameron Coward | May 2, 2022 | Provided By Avnet

SEE ALL AVNET ARTICLES

Avnet’s Dual-Camera Mezzanine board can help fight facial recognitions spoofing.

Without any additional protection, image sensor-based facial recognition is easy to spoof with a printed photo. A system with a single image sensor cannot perceive depth, which means that it cannot differentiate between a three-dimensional face and a two-dimensional photo of a face in identical lighting conditions. Adding stereo vision to a sensor system adds powerful protections to thwart spoofing attacks.

Depth perception basics

Humans perceive depth because we have two eyes. Each eye sees objects at a slightly different angle and our brains use the difference to infer the distance to an object or features on an object. While this is not infallible—many optical illusions rely on that fact—it is quite reliable. Your depth perception lets you discern between a photo and a live human face with ease.

To compensate for the lack of depth perception, facial recognition systems often utilize some tricks. For example, a system could look for movement to eliminate printed photos. But unless the system has additional hardware, such as an infrared sensor or time-of-flight (ToF) sensor, it will remain vulnerable to spoofing. A recorded video of the target’s face would fool this hypothetical system, for instance.

Hardware solutions

One hardware solution that can reduce spoofing risk is dual-camera stereo vision. With a second image sensor mounted at a known distance from the first, a system can perceive depth in the same way as human eyes. The distance between image sensors is a constant and the system can use that to perform calculations based on pixel changes between the two images.

Avnet offers this hardware in the Dual-Camera Mezzanine for Ultra96-V2 development boards. Avnet engineers developed the Dual-Camera Mezzanine for a variety of stereo vison tasks and it works well for any situation that requires depth perception. When paired with the Ultra96-V2, many interesting machine learning jobs, like object recognition and classification, become possible.

Spoofing attacks

In the paper “Deep Learning for Face Anti-Spoofing: A Survey,” the authors explored several techniques to fight facial spoofing. Their research shows stereo vision-based deep learning is one of the most reliable.

The authors experimented with three facial spoofing attacks: print, replay and mask. The print attack attempts to fool the systems using a photo. The replay attack does the same thing, but with a recorded video. The mask attack requires a much larger investment, as the attacker must wear a mask to mimic the target. They tested several types of masks, including a paper mask, resin mask and a mannequin head.

As expected, the stereo vision technique was “very good” at recognizing the print and replay attacks. The faces in those attacks had no depth, which stereo vision detects reliably.

Masks were harder for the stereo vision technique to identify, and the authors gave it a “good” rating on this test. Most of the other techniques had similar results, with the exceptions being heat-detecting techniques based on infrared systems.

The best technique combines stereo vision with infrared sensors to perceive both depth and heat. Depth perception prevents print and replay attacks, while heat detection prevents mask attacks.

Other depth-perception sensors, including ToF sensors, were as good as stereo vision at preventing print and replay attacks, but were worse at preventing mask attacks.

Reducing the load

Another benefit of stereo vision, for anti-spoofing and more general tasks, is that it can reduce processor load and computation time. Avnet’s Mario Bergeron explains this concept in a tutorial on Hackster.io, an Avnet resource for developers.

The Ultra96-V2 is a powerful board with hardware pipelines dedicated to computer vision, but calculating the depth of every pixel in a video feed is still a very resource-intensive task. Fortunately, there are few use cases that require depth data for the entire video frame. In most situations, depth data is only needed for specific objects that make up a fraction of the available pixels.

To leverage that fact, designers can perform depth calculations on only the pixels relevant to the job. To do that, the relevant pixels must be isolated.

Bergeron demonstrates how by first performing face detection. It is only necessary to perform face detection on a single video feed. The resulting bounding box is then enlarged by a small margin to compensate for the angle. Apply that bounding box to both video feeds and subtract everything else, and now you have a much smaller array of pixels on which to perform depth calculations.

This technique applies to more than just facial recognition. The same principles would be useful for monitoring the speed of a car, for example. If the system only needs to look for cars, modern machine learning models recognize them quickly and without much processing power. From there, the system could compute the car’s distance at two or more points in the video to calculate its speed.

With hardware like the Dual-Camera Mezzanine, adding stereo vision to a system is easy. There is no guesswork involved as all the relevant parameters (distance between image sensors and focal length) are known. Pipelines for machine learning and computer vision tasks are already available and ready to use.

About Author

Cameron Coward, Senior Technology Writer at Avnet

Cameron Coward is a senior technology writer at Avnet. Before transitioning to a writing career, he ...

View Bio

Helpful Links

Ultra96-V2

Marketing Content Spots

Bluetooth Channel Sounding takes positioning to a new level

By Avnet Staff - July 29, 2025

The Bluetooth Special Interest Group (SIG) has released Version 6.0 of the core specification and its newest feature, channel sounding, is challenging incumbent approaches to indoor positioning. Access is one reason, in more ways than one.

Vibe coding using AI on a computer screen

Will vibe coding kill the compiler?

By Philip Ling - July 3, 2025

There is an ever-growing number of high-level programming languages to choose from. With vibe coding using AI assistants becoming popular, could the days of high-level languages, toolchains and even compilers be numbered?

Related Events

Avnet's Transportation Tuesdays Webinar - Smart Hall effect magnetic Sensors

See how Smart Hall effect sensors ease automotive design

Date:

Location: Virtual Series

IoT Skills: Learn How To Rapidly Secure Your IoT Design with Azure Sphere

Date: August 14, 2019

Location: Webinar

prevent-facial-recognition-spoofing-with-stereo-vision

Complementary Content

${loading}