I recognise that face: Machine vision with Omron's HVC

By Giovanna Monari | November 14, 2017

SEE ALL AVNET ARTICLES

Giving machines the power of vision gives them the ability to interact with people in a much more natural way, but integrating this capability was previously a complex and difficult task.

Giving machines the power of vision gives them the ability to interact with people in a much more natural way. Machine vision, using cameras to detect the user’s presence and to try to establish how they wish to interact with the machine, opens up lots of opportunities for more intuitive user interfaces that adapt themselves to the user’s identity, situation or frame of mind. However, embedding this capability into machines was previously a complex and difficult task.

Omron has developed an easy solution that can recognise faces, gestures and even emotions for machine user interface design. The B5T HVC module, a small camera board which includes advanced image processing, outputs ready to use data on the person’s gaze direction and blink state, and can estimate expression, age and gender based on a photograph of the user.

The Omron module incorporates proprietary "OKAO® Vision" software, which translates values into digital and text data. Hundreds of millions of licenses have already been issued for this software, into applications like digital cameras, mobile phones and surveillance robots.

Following the release of the original HVC P model in March 2014, Omron has now introduced an improved version called HVC-P2, which boasts a maximum recognition speed 10 times that of the previous model. Customers can also choose from two camera heads, a long-distance detection type and a wide-angle detection type, depending on their specific application purposes.

The HVC-P2 features a separate camera and main board, connected via a flexible flat cable, allowing the sensor to be installed on the edge of a flat display unit, which was difficult with the previous model.

Ten sensing functions are included in the module:

Human body detection, face detection and hand detection features simply detect the presence of a pers

on, a face or a hand in the photograph. These features output the number of faces, hands or people in the photograph (up to 35 objects may be detected per frame) as well as co-ordinates of the objects relative to the frame and their size in pixels.

Human body detection may be used to detect the presence of people over 7m away in image data. This may be useful for security purposes, with a security system flagging up footage for review by security personnel if people are found somewhere they shouldn’t be. Or perhaps an office camera detects when everyone has gone home for the night and turns off the heating and lights.

Hand detection enables detection of up to 35 hands in the image frame up to 1.5m away. It can only detect hands from the front, that is, with the palm directly facing the camera. This may be a useful feature for control systems where the user presents their palm to the camera in a ‘stop’ gesture to stop the equipment. This function takes only 370ms to respond on average, and can detect hands from a minimum size of 40 pixels

Face detection recognises faces from a minimum size of 64 pixels and triggers additional facial recognition features. This may be used to detect when a person is looking at a device in order to tailor the user interface for best results; perhaps if the user isn’t looking, a voice prompt may be used instead.

Facial pose estimation estimates where a person is looking based on the direction their head is turned to. Since this feature gives detailed data on the yaw, pitch and roll angles of a person’s face, it can be used to tell what they are looking at with a useful degree of accuracy. It’s easy to imagine this being used to monitor advertising or in retail environments.

Gaze estimation is similar to facial pose estimation, but it returns yaw and pitch angles for the person’s pupils, again, to help estimate what the person may be looking at. This is typically used in digital cameras to establish when the subject is looking directly at the camera in order to decide the best moment to take a picture.

Blink estimation is also used in digital cameras to determine whether the subject is blinking. It outputs the degree of blinking for left and right eyes. Both gaze estimation and blink estimation have average response times from 400ms to as low as 50 ms depending on the model, distance, and detection size.

Age estimation enables machines to estimate a person’s age, between 0 and 75. This can be used to tailor operating systems to the user’s needs, perhaps for children or the elderly.

Gender estimation guesses whether a person is male or female. This is typically used in combination with age estimation.

Expression estimation guesses the person’s mood based on the appearance of their face, and can detect five different moods: neutral, happy, surprised, angry or sad. It compares the image data with known facial indicators of these moods and the expression with the highest score (0 to 100) is output as the person’s most likely expression (along with the score itself). This function also returns the ‘expression degree’, a figure between -100 and +100 that indicates overall how happy or sad the user is.

Face recognition is the function used to recognise specific users, and perhaps tailor the equipment interface to their individual needs or preferences. The module can recognise a face up to 1.3m away in under a second by comparing image data to photographs in its memory. This function returns a user ID number or “non-registered” if the person is not recognised.

Each feature (with the exception of gaze estimation and blink estimation, which are measured in degrees) also outputs a measurement called degree of confidence, which reveals the system’s perceived potential for error in its result. The confidence score is 0 to 1000 and a higher value indicates a higher level of confidence.

The HVC module can be integrated quickly and easily into established systems or as part of a new design, without the need to understand the complexities of the underlying algorithms or the optical design. This is a fully integrated, plug in module whose outputs can be directly used to make decisions in consumer electronics and beyond.

For more information on the B5T HVC-P2 module, or to purchase an evaluation kit, get in touch with one of our technical specialists here, or buy from our online store.