Vision systems: Processing at the edge, in the cloud, or both?

Generally speaking, computer vision enables computers to derive actionable results from digital images and video. Just as Artificial Intelligence (AI) allows computers to think, computer vision lets them also see, observe and understand, all in a fraction of a second. A typical sequence begins with filters that modify the image, followed by extraction of data, communicating that data somewhere for processing or comparing it against known tolerances to deliver a pass/fail result.
Visual inspection of peanuts on a conveyor belt, for example, requires the camera and its processing system to view all the nuts as they go speeding by, determine which ones are unacceptable, and send a command to a device that blows just those specimens from the belt. The camera system in an autonomous vehicle takes this to a new level as it must scan and evaluate all the details of the operating environment, analyze what it sees and send comprehensive results that determine how the vehicle should respond.
The cloud is immense and so is its latency
In either case, this requires a combination of a high-resolution video, formidable processing power and AI, and a high-speed communication path. But in many applications today, the solution is limited to performing minimal image processing in the camera itself or a nearby embedded computer connected via gigabit Ethernet or CoaXPress. When cloud computing emerged, it offered virtually unlimited processing power and storage capability. These resources continue to grow as Amazon Web Services (AWS) and Microsoft Azure offer comprehensive tools that make the process easier, safer and more affordable.
It has nevertheless become apparent that for real-time applications, it takes too long for a round trip between where the images are captured and the cloud (and back). They include robotics and other industrial applications, vehicle Advanced Driver Assistance Systems (ADAS), telesurgery and dozens more. Security becomes another issue, as the transmission path provides a large attack surface for hackers.
Processing images and video at the edge solves these problems by reducing latency to near zero and maintaining critical data at its source. It also requires only a single, low-cost initial capital expenditure rather than the recurring cost of using a cloud service and allows end-users to continue to collect and process data if external communication is disrupted. But even minimal image processing requires considerable computing prowess, and until relatively recently, it was either unavailable in a small footprint, too expensive, or both.
Fortunately, that’s no longer the case, as some companies have dramatically increased the performance of their devices by integrating the Central Processing Unit (CPU), Graphics Processing Unit (GPU) and AI processors on a single device. GPUs alone have evolved from 2D and 3D graphics processors to massively parallel “AI accelerators.” They can accommodate any level of image processing and machine learning directly on the camera, even when the data load is immense, as it is for high-resolution video. In fact, even 32-b microprocessors and single-board computers can now perform a surprising amount of machine learning at very low cost and power consumption in a very small footprint.
Putting it all together
That said, designers are still faced with making important decisions, from integrating hardware and application software, training computer vision models, and integrating existing models and computer vision libraries in the application. To make all this easier, some suppliers are integrating compute units so designers can benefit from a single platform-level application programming interface. With a single unified programming interface for each vendor, programming a computer vision system should be easier, whether the driving force is a CPU, GPU or Field Programmable Gate Array (FPGA).
Better yet, at least one major camera manufacturer provides a complete edge-to-cloud package that combines the camera, the AI and machine learning capabilities of AWS with a custom camera software stack and an edge processing unit. It’s essentially a turnkey solution that avoids having to cobble together solutions from different vendors, a concept that will probably be commonplace in the future.

Cloud, edge or something in-between?
As processing power and AI become more formidable, less expensive, compact and frugal with power, it’s logical to assume that soon there will be no need to send computer vision data to the cloud at all. But that is unlikely because even though employing cloud data centers can be expensive, it offers extraordinary advantages in sheer processing power, unlimited storage and comprehensive analysis tools.
At the moment, data captured by cameras in vehicles remains local as there is no need to connect with other vehicles or the means to do so. However, when autonomous vehicles begin plying the roadways, many will use both cameras and lidar, so they will of necessity require very high levels of onboard processing power combined with the need to share data with other vehicles, roadside sensors and other infrastructure. Some of this will be performed in vehicles where low latency is critical, but the aggregated data from all transportation-related sources will rely on the performance that only the cloud can provide.
For autonomous applications, the cloud-versus-edge decision comes down to whether enough processing power is available at the edge and if split-second decision-making is needed. If the answer to both is “yes,” edge processing is mandatory and reliance on the cloud may be of less importance. If not, the cloud can be an excellent choice. And, of course, the needs of many applications fall somewhere in between.

