Article

Prevent facial recognition spoofing with stereo vision

 Author profile image placeholder graphic
person wearing mask, holding smartphone
Avnet’s Dual-Camera Mezzanine board can help fight facial recognitions spoofing.

Without any additional protection, image sensor-based facial recognition is easy to spoof with a printed photo. A system with a single image sensor cannot perceive depth, which means that it cannot differentiate between a three-dimensional face and a two-dimensional photo of a face in identical lighting conditions. Adding stereo vision to a sensor system adds powerful protections to thwart spoofing attacks.

Depth perception basics

Humans perceive depth because we have two eyes. Each eye sees objects at a slightly different angle and our brains use the difference to infer the distance to an object or features on an object. While this is not infallible—many optical illusions rely on that fact—it is quite reliable. Your depth perception lets you discern between a photo and a live human face with ease.

To compensate for the lack of depth perception, facial recognition systems often utilize some tricks. For example, a system could look for movement to eliminate printed photos. But unless the system has additional hardware, such as an infrared sensor or time-of-flight (ToF) sensor, it will remain vulnerable to spoofing. A recorded video of the target’s face would fool this hypothetical system, for instance.

Hardware solutions

One hardware solution that can reduce spoofing risk is dual-camera stereo vision. With a second image sensor mounted at a known distance from the first, a system can perceive depth in the same way as human eyes. The distance between image sensors is a constant and the system can use that to perform calculations based on pixel changes between the two images.

Avnet offers this hardware in the Dual-Camera Mezzanine for Ultra96-V2 development boards. Avnet engineers developed the Dual-Camera Mezzanine for a variety of stereo vison tasks and it works well for any situation that requires depth perception. When paired with the Ultra96-V2, many interesting machine learning jobs, like object recognition and classification, become possible.

Spoofing attacks

In the paper “Deep Learning for Face Anti-Spoofing: A Survey,” the authors explored several techniques to fight facial spoofing. Their research shows stereo vision-based deep learning is one of the most reliable.

The authors experimented with three facial spoofing attacks: print, replay and mask. The print attack attempts to fool the systems using a photo. The replay attack does the same thing, but with a recorded video. The mask attack requires a much larger investment, as the attacker must wear a mask to mimic the target. They tested several types of masks, including a paper mask, resin mask and a mannequin head.

As expected, the stereo vision technique was “very good” at recognizing the print and replay attacks. The faces in those attacks had no depth, which stereo vision detects reliably.

Masks were harder for the stereo vision technique to identify, and the authors gave it a “good” rating on this test. Most of the other techniques had similar results, with the exceptions being heat-detecting techniques based on infrared systems.

The best technique combines stereo vision with infrared sensors to perceive both depth and heat. Depth perception prevents print and replay attacks, while heat detection prevents mask attacks.

Other depth-perception sensors, including ToF sensors, were as good as stereo vision at preventing print and replay attacks, but were worse at preventing mask attacks.

Reducing the load

Another benefit of stereo vision, for anti-spoofing and more general tasks, is that it can reduce processor load and computation time. Avnet’s Mario Bergeron explains this concept in a tutorial on Hackster.io, an Avnet resource for developers.

The Ultra96-V2 is a powerful board with hardware pipelines dedicated to computer vision, but calculating the depth of every pixel in a video feed is still a very resource-intensive task. Fortunately, there are few use cases that require depth data for the entire video frame. In most situations, depth data is only needed for specific objects that make up a fraction of the available pixels.

To leverage that fact, designers can perform depth calculations on only the pixels relevant to the job. To do that, the relevant pixels must be isolated.

Bergeron demonstrates how by first performing face detection. It is only necessary to perform face detection on a single video feed. The resulting bounding box is then enlarged by a small margin to compensate for the angle. Apply that bounding box to both video feeds and subtract everything else, and now you have a much smaller array of pixels on which to perform depth calculations.

This technique applies to more than just facial recognition. The same principles would be useful for monitoring the speed of a car, for example. If the system only needs to look for cars, modern machine learning models recognize them quickly and without much processing power. From there, the system could compute the car’s distance at two or more points in the video to calculate its speed.

With hardware like the Dual-Camera Mezzanine, adding stereo vision to a system is easy. There is no guesswork involved as all the relevant parameters (distance between image sensors and focal length) are known. Pipelines for machine learning and computer vision tasks are already available and ready to use.

About Author

 Author profile image placeholder graphic
Cameron Coward, Senior Technology Writer at Avnet

Cameron Coward is a senior technology writer at Avnet. Before transitioning to a writing career, he ...

Helpful Links

Marketing Content Spots
Related Articles
Related Articles
virtual reality headset
Beyond the datasheet: The no-code route to MEMS-based machine learning at the edge
By Philip Ling   -   January 17, 2025
New MEMS sensors from STMicroelectronics integrate an innovative machine learning core, making it simpler to deploy machine language in many applications where motion detection is used. We take a look beyond the datasheet to see how it works.
globe
2025 supply chain insights you won’t get from a genAI chatbot
By David Paulson   -   January 16, 2025
What are the biggest risks and/or opportunities facing stakeholders across the high-tech supply chain in 2025? The greatest opportunities lie in the details others overlook.
Related Events
Related Events