Article

Discover what’s special about AI processor architectures

Nishant Nishant
What is an AI processor?

Microprocessors (MPUs) and microcontrollers (MCUs) are essential components in every embedded system, whether it is a connected consumer device such as a smart speaker or a complex industrial robot. Large or small, a processor can typically integrate all key data flow and control functions into one single IC package. These functions would include interfacing with peripherals, providing wireless communications and computation.

It is normally the computational effort required by the application that dictates what MCU or MPU is selected for a specific design. The technical capability of both processor types continues to increase. It is not uncommon for even low-cost devices to feature high-value functional blocks, such as multiple processor cores and cryptographic security.

Parallelism at the hardware level increases performance but most software applications are still executed sequentially. Even with multiple cores and multi-threaded operating systems, instructions and computations are pipelined. Artificial intelligence (AI) is changing that.

AI processing demands challenge computational norms

AI has rapidly become an intrinsic part of our technology engagement. AI and machine learning (ML) techniques are becoming ubiquitous, from smart speakers recognizing voice commands to vehicles identifying road signs and obstacles. Neural networks can perform a diverse set of image, video, and sound recognition tasks. Algorithms implement trained AI models, which run in place of or alongside conventional sequential programs, often in the same processor.

Sequential code needs pre-defined options. Coders need to account for all possible computationally logical outcomes or risk invoking a runtime error. AI algorithms more closely mimic human intelligence. It would be impractical to represent, logically, all possible objects an image sensor may capture. It is technically infeasible for sequential code to quantify all outcomes based on pre-defined decisions.

AI allows system engineers to adjust these expectations without specific decisions being previously coded. Not surprisingly, AI algorithms represent a highly complex computational workload on a processor, requiring many mathematical matrix and vector multiplications to be performed rapidly to yield a probable result—termed inferencing—within acceptable time and power budgets. To deliver on this, the processor must execute multiple instructions on multiple data (MIMD) in parallel. The MIMD architecture paradigm isn’t new, and the concept can be applied to AI to achieve significant performance benefits.

What are AI processors?

To achieve computational concurrency, also termed parallelism, an AI processor architecture differs from an ordinary processor. It will also feature high-speed on-chip memory to optimize execution speed and reduce latency. Parallelism was being used before the rise of AI, for graphical rendering and vision processing tasks, in the form of graphical processing units (GPUs). It’s not surprising that leaders in GPUs are now at the forefront of AI.

Some neural processing unit (NPU) architectures build on GPU fundamentals, but introduce low-power capabilities, an essential attribute for edge AI applications.

AI processors are neural network accelerators that use a parallel architecture to speed processing tasks. Some accelerators are optimized for specific types of neural networks. For example, a convolutional neural network is typically used for image recognition tasks compared to a recurrent neural network that suits speech recognition applications.

Comparison of general-purpose and AI processors

General Processor AI Processor
A general-purpose microcontroller or microprocessor used in a wide range of embedded systems. The processor is specifically designed for accelerating AI workloads using AI/ML neral network algorithms.
May have one of more central processing unit (CPU) cores. AI processor (accelerator) may be integrated on-die with ordinary processor cores for space- and energy-efficient designs.
Architecture is optimized for sequential tasks. Slow and less efficient AI workloads. Architecture is optimized for parallelism, performing huge numbers of matrix calculations concurrently.

Chart shows the fundamental differences between general-purpose and AI-optimized processors relate to their architecture, level of parallelism, and memory management.

Although AI processors are available as discrete ICs, the industry trend for many Internet of Things (IoT) and Industrial Internet of Things (IIoT) applications is to combine a neural network accelerator with a conventional processor. Many AI use cases and workloads will still need traditional capabilities, such as reading sensors or simple mathematical operations. An integrated solution that combines the old with the new delivers an optimal, cost-effective and energy-efficient IC.

How to choose an AI processor

The more complex the neural network, the more computational and storage resources an AI processor requires. AI processors are available from various semiconductor vendors offering a variety of neural network-optimized devices. They range from ultra-low power microcontrollers with integrated AI accelerator technology, able to perform 100 giga operations per second (GOPS), to high-performance discrete vision processors capable of 20 tera operations per second (TOPS).

Some notable features differentiating AI processors include the number of operations per second based on the maximum clock speed, the type and number of traditional processor cores, the weight storage capacity of the neural network accelerator, and the power consumption characteristics.

Weights are a key aspect of a neural network. Weights are an analog of the strength of a neuron's signal in the human brain. A weight's value dictates the influence an input to the network has on the output. A low-power microcontroller with a neural network accelerator might offer something like 400k 8-bit weight capacity compared to a high-performance discrete device offering GBs of 16-bit weight storage.

AI processors continue evolving and outstripping the performance characteristics of ordinary high-performance microprocessors. For example, in late 2024, a leading European semiconductor vendor released a family of microcontrollers with neural network accelerators designed for edge AI applications capable of 600 GOPS, a 600x increase compared to its leading high-performance microprocessor.

For the product engineering team, the desired use case dictates the type of device selected. For a consumer fitness tracker, for example, AI algorithms can recognize control gestures and types of activity. Inferencing gesture recognition takes significantly less processing than detecting and differentiating between multiple objects in an automotive vision system, such as pedestrians, other vehicles, and road signs.

What are the differences between edge AI and cloud AI?

As AI processors have continued to evolve, the range of applications they can perform has multiplied. One significant trend is edge AI, where neural network models can infer results at the point of data acquisition. Unlike a cloud-based AI approach, which sends data to a neural network running in a data center, conducting inference at the edge yields many benefits. These include reduced latency, improved security and less need for constant communication.

From a consumer confidence perspective, this also means that edge devices, such as smart speakers, are not constantly transferring ambient conversations to the cloud. Although edge-based AI applications are trendy, cloud AI applications still feature heavily. Cloud AI offers access to dynamic and scalable resources for computationally intensive applications with enormous datasets, suiting medical image analysis, financial fraud detection and uncovering retail data trends.

AI processors differ architecturally

Conventional sequential processors typically use a Harvard or Von Neumann architecture. AI neural network accelerator and processor architectures are optimized for parallelism, allowing hundreds or thousands of concurrent calculations. By combining an AI processor with a conventional processor on the same die, embedded system developers reap the benefits of both architectures.

 

Enjoyed this article? Check out our others.

About Author

Nishant Nishant
Avnet Staff

We use Avnet Staff as a collective byline when our team of editors and writers collaborate on the co...

Marketing Content Spots
Related Articles
Related Articles
The superior heat transfer qualities of liquid position it as the best choice for AI data centers
Liquid cooling helps shape thermal management for AI
By Andrea Tsapralis   -   March 5, 2025
AI is changing data centers and our expectations. But this transformation comes with demands, in terms of power consumption and corresponding heat generation. The industry is rapidly adopting the latest thermal management solutions.
manufacturing
Deep vision inspection using AI enables multiple emerging markets
By Michaël Uyttersprot   -   August 5, 2024
Machine vision enables many existing and emerging markets. Security, manufacturing and industrial automation all use machine vision. Adding AI inferencing at the sensor provides many benefits.
Related Events
Related Events
What, Why & How of Vision AI at the Edge
Date: April 23, 2021
Location: On Demand