Feature Story

Developing Heterogeneous Cache Coherent SoCs

Automotive and other customer needs are not what they once were.

As with other challenges, the task of successfully developing heterogeneous cache coherent SoCs demands an understanding of your customers’ requirements. Conducting interviews with your customers aids this understanding. For example, through the interview you should learn:

How many IPs are needed to connect to the heterogeneous system;
What kind of bandwidth each IP requires;
The types of IPs that are in the system;
What kind of features you would enable in the interconnect IP.

The next step is to define “heterogeneity” because, while many people use the word “heterogeneous,” it has a number of meanings. Some guidelines:

You must have different types of processors within the same system;
Different processor types also have different cache structures. For example, an Arm CPU would use the same cache structure as another Arm core, but a different CPU may pose a different cache structure
Different types of IPs must also be considered:
- CPUs, GPUs, and DSPs
- IPs that make up an SoC, such as those for connectivity, USB, SATA, etc.

A highly flexible snoop filter architecture accommodates different cache structures of different kinds of processors. It also reduces the number of memory bits required to perform snoop filtering.

Adapt to Changing Customer Needs

Understanding what the customer requirements are for non-coherency and coherency is a must. Are the coherent and non-coherent domains separated, a full merger, or a customized mix? ArterisIP, for instance, has developed a component called a non-coherent bridge. Its purpose is to drive non-coherent accesses into the coherent domain.

A few years ago, coherency systems were small and compact with a maximum of three to four different processors. Coherency was confined to CPU clusters, and functionality was grouped under an application. Coherency wasn’t necessarily distributed beyond a subsystem.

However, customer needs are changing, and today there is a need for greater processor performance. Companies are adding more and different types of processors. In addition:

SoC layouts are expanding tremendously;
Processors are growing larger;
Complex layouts are affecting the coherency domain;
Coherent domain is expanding all over the chip.

So how do you handle all these? First, you must make sure the infrastructure is designed to distribute coherency system-wide. The interconnect technology must enable network packet transport and accommodate a variety of topologies, such as ring and mesh. The infrastructure must also be configurable and flexible because as design complexity continues to grow, designers need to understand which topologies best suit a particular chip layout. Having the proper tools to predict where complexities might cause performance and power issues in the chip layout stage is critical to adapting to the layout and discovering which topology best resolves these issues.

Optimizing Power Consumption of Complex Systems

To optimize for power, first, you need to provide a power-ready IP. Once this is accomplished, you need to implement some tried and true techniques—these may include voltage domain, power domain, clock gating, and high-level clock gating.

When an IP is power-ready, it will have connectivity to a power interface and can be controlled by a PMU (Power Management Unit) in the system. The PMU will decide when to shut down the IP – i.e. when it is not in use or not needed by the system. At the application level, this power-aware controller (PMU) can lower system power consumption by putting an IP on idle.

Maturing to Meet Challenges

Heterogeneous SoCs are still in development and haven’t yet matured. But processors in coherent domain are now sharing data with each other. Other CPUs and GPUs have become cache coherent, although I’m confident we can do a lot more.

Moreover, data sharing is not only between the processor and the GPU, but among all the IPs of the system—a concept that is still work in progress. This idea must be pushed a little bit farther to achieve total coherency. Today not many non-coherent IPs share data with coherent IPs. But applications are emerging that need coherency, and this will bring new requirements.

Some of these design challenges are hindering product development, for example, for Advanced Driver-Assistance Systems (ADAS) for automotive. Automotive applications have performance requirements and the need to share data with heterogeneous processors to achieve those requirements. We’ll see the introduction of new features to this market. Other markets will include artificial intelligence and machine learning.

A decade ago, mobile application processors were driving the need to cache coherency. Next, data center systems took over as the primary drivers. Now the automotive market is fuelling the race to extend cache coherency to all of the heterogeneous processing elements in SoCs. In two or three years, a new trend will emerge to extend heterogeneous cache coherency even further—but designers will need flexibility, configurability and scalability to ensure that these systems are high-performance, low-latency, and power- and cost-efficient.

Return to: 2017 Feature Stories