Article

Use multicore microcontrollers for higher speed at lower power

Nishant Nishant
MCU on a PCB

Due in part to licensable cores, dominated by Arm, there are many vendors in the microcontroller marketplace with products for every market niche. RISC-V microcontrollers take advantage of the open-source nature of the underlying instruction-set architecture (ISA) to support the development of devices with custom instructions that are finely tuned for their target applications.

Increased performance and functionality now come without proportional increases in energy consumption. Architectural improvements brought by multicore devices contribute to efficiency. Originally the preserve of high-end computing platforms, multicore architectures are now commonplace in the microcontroller market.

Many of the multicore products developed for this space use heterogeneous-multicore architectures. This contrasts with the homogenous architectures that are more often found in high-performance computing applications.

What advantages do multicore devices offer?

Heterogeneous architectures have different cores implemented on the same integrated circuits, each tuned to certain needs. For example, some microcontrollers will integrate an additional core just to run a Bluetooth protocol or provide similar communications. The second core may even be a digital signal processor, or DSP.

Homogeneous architectures have multiple instantiations of the same processor core on the chip. These devices can share the software workload across many cores to accelerate performance. In the middle are heterogeneous concepts. These devices have binary-compatible cores with different performance attributes.

Until the mid-2000s, the chief mechanism for improving performance was to develop increasingly complex microarchitectures running at higher clock speeds to allow single threads to run as fast as possible. However, this came at the cost of unsustainably higher power consumption.

A multicore platform leverages silicon scaling as it continues to follow Moore’s Law. The advances come at a lower cost per function. Splitting software into multiple threads and executing them in parallel has also proven to be more energy efficient. This approach allows simpler microarchitectures to offer lower average power consumption. And it provides the ability to shut down cores when they are not needed.

Is it harder or easier to program more microcontroller cores?

Using multiple cores requires some planning to maximize performance and minimize the risk of encountering bugs that are difficult to find.

With a single core, the assumption is that only one thread is running at any time. This simplifies the synchronization of data shared by tasks. With a multicore architecture, the developer needs to take extra care. Many threads accessing the same data could be running concurrently.

This involves code to let a task declare ownership of a variable and prevent others from modifying its data until the variable is unlocked. Failure to do so can cause data corruption. It can also lead to subtle bugs such as race conditions that can be troublesome in I/O-intensive embedded systems. These affect sections of code where, to complete correctly, two or more events need to occur in a particular sequence. 

Can using more microcontroller cores result in lower power consumption?

The better energy efficiency offered by heterogeneous multicore architectures is one important reason for using them in embedded systems. One approach that can save power is the dual-core microcontroller, which marries a high-performance processor core with an ultra-low power worker core.

The ‘big-little’ concept was pioneered by Arm. The smaller core can run simple tasks at lower power than the high-performance core. It may, for example, handle I/O from peripherals while the main processor is in sleep mode. The main processor only wakes up if there is sufficient data to work on or if the software running on the worker cores finds a condition that needs more effort. The result is a system that uses a fraction of its peak power demand for most of the time.

More recent microcontroller designs have augmented heterogenous-multicore approaches with more specialized cores or even application-specific accelerators. This provides better energy efficiency for the target tasks. A common accelerator function is security, implementing the modulo arithmetic needed for cryptography in hardware.

Asymmetric multicore implementation

Example_Multicore_MCU_Implemtation_Block_Diagram

Multicore microcontrollers combine two or more cores sometimes for specific functions. A possible asymmetric multicore implementation shown here might combine cores with different performance levels. Some may even have different instruction sets.

How integrated are multicore microcontrollers?

The level of integration and task sharing will depend on the specific product. Many support a fully shared architecture where all peripherals are available to all cores on the device. This provides flexibility but can lead to issues with security or memory contention if the high-performance cores need to wait for worker cores to perform many short I/O transfers that do not exploit the full bandwidth of the shared bus. Partitioning the device into shared-memory clusters interconnected by a lower-speed bus or network-on-chip makes it easier to optimize performance.

Other reasons for partitioning may revolve around security or safety. Some cores may be allowed direct access to a security coprocessor. Or there may be cores that operate in strict lockstep to provide redundancy for safety-critical routines. Other on-chip processors can run non-critical code. This form of partitioning limits the amount of code that needs to be certified to high levels of security, safety or both. It also minimizes cost when implementing software that does not need strong guarantees of availability or security.

Running an RTOS on a multicore device

Traditionally, real-time operating system (RTOS) implementations were designed to run on a single core. However, vendors have responded with RTOS implementations that can run across more than one core in a system. They implement the core synchronization operations needed to ensure tasks can execute safely when running simultaneously on different processor cores.

Such an RTOS can dynamically schedule tasks to run on any available core and, in doing so, perform load balancing that minimizes task latency. A common feature of multicore-aware RTOS designs is the ability to lock threads to specific cores. Developers can use this technique to guarantee execution time to important tasks.

Mixed-RTOS example

Example_Mixed_RTOS_System_Block_Diagram

A mixed-RTOS system example with multicore and single-core support. The main benefit here is integration. While the software applications and core architectures may be different, the overall device will have shared resources.

In multicore architectures that allow memory spaces to be partitioned under hardware control, it is also possible to run a multicore RTOS across several cores in parallel with a single-core RTOS that is dedicated to another single processor. This is a scenario often used by integrators working with heterogeneous multicore microcontrollers. The single-processor RTOS is used to manage I/O transactions and low-level operations. The multicore RTOS then supports application-level code.

Synchronous and asynchronous processing

Many embedded developers will be familiar with both asynchronous and synchronous processing, even on single-core systems. However, they may not have considered the distinction between the two forms of inter-task cooperation.

Synchronous processing is more familiar as this is the type of communication used to pass data to and from subroutines in the same task. In this situation, execution needs to stop in the calling routine before the called subroutine can perform its work. At completion, the subroutine passes the results back to the caller, which then restarts execution.

In all but the simplest embedded systems, there will be interrupt handlers set up to respond to incoming requests from hardware controllers. These handlers are activated asynchronously to the running task. The data they save to memory is later accessed by a task that may have to wait for some time before the RTOS makes it ready to run. The scheduling priority of that task will determine the wait.

Multicore operation has the same mixture of processing styles but extends the choices available to the developer. Synchronous processing may be used where one task requesting another needs to wait for the answer before it continues execution. In this scenario, the RTOS will suspend the task and run another on one processor until the responding task, which may run on a different core, provides the required answer.

Two approaches to software task processing

Explanation_Asynchronous_Synchronous_Processing

The core difference between asynchronous and synchronous processing is shown here. Tasks will either be executed in parallel or queued, based on dependencies.

Tasks may also make use of asynchronous processing. In this situation, one task issues a request to another task but continues working on other data instead of suspending operation to wait for an answer. If, for example, the request is a write operation to a peripheral, the task may not need an answer or just to check that the output buffer has enough space for a subsequent transaction.

In other cases, the asynchronous operations may provide the opportunity to run more code before the calling task needs to check on the status of a request and potentially block it before a successful outcome is reported. Overall, a mixture of synchronous and asynchronous processing styles provides developers with opportunities to optimize the performance of multicore hardware.

Conclusion

Silicon scaling has made multicore designs ever more affordable for a wide range of applications, even enabling ultra-low power microcontroller architectures. Implementing software for these designs involves some changes, but it is relatively simple for embedded developers to take advantage of these products.

> Avnet can supply the right 32-bit microcontroller you need

About Author

Nishant Nishant
Avnet Staff

We use Avnet Staff as a collective byline when our team of editors and writers collaborate on the co...

Helpful Links

Marketing Content Spots
Related Articles
Related Articles
What is an AI processor?
Discover what’s special about AI processor architectures
By Avnet Staff   -   February 10, 2025
Are you and your engineering peers curious about AI processors? There’s a good chance you are, but can you agree on what makes them different, better or special? It’s not just a label; they are architected in a fundamentally different way.
Engineers_bench_with_electronic_components
Where is the new low end in FPGAs?
By Avnet Staff   -   April 8, 2024
Programmable logic is a favorite of design engineers. Programmable devices blend flexibility, functional density, and time-to-market advantages. When low-end parts are discontinued, it reflects how electronic system design is evolving.
Related Events
Related Events

No related events found

use-multicore-microcontrollers-for-higher-speed-at-lower-power