Feature Story

Heterogenous Systems Architecture Foundation Unveils HSA 1.1

Today's SoCs are a complicated amalgamation of multiple components such as CPUs, GPUs, DSPs, FPGAs, fabrics and fixed-function accelerators that integrate various open and proprietary IP blocks (reusable units of logic, cell or chip layout designs) into one singular environment. Tying these unique components together into one streamlinedSoC requires multiple (and distinct) toolchains, profiling methods and debug tools.

The Heterogeneous Systems Architecture (HSA) creates an abstraction layer and runtime that ties the system together so that a programmer does not have to worry about the underlying processor, thus streamlining and simplifying the process. The architecture consists of three components: a programmer reference manual, which is the virtual instruction set that abstracts the underlying processor type, a runtime, and a system architecture that explains how to build the components to communicate with the different processors. HSA allows programmers to employ many common languages, such as Python, Java, OpenCL 2+, OpenMP and C++AMP (C++17 in the future), and also enables the programmer to write the control and compute language in the same programming language, if so desired.

The HSA Foundation, founded in 2012, is comprised of 40 companies, which includes chip vendors, tool providers, software developers and 17 universities. The foundation operates under an open architecture, which means that all of the work is published and shared, and can be used by anyone royalty free.

AMD's Carrizo serves as a perfect example of an SoC that operates in a heterogeneous environment, and its APU, which features a CPU and GPU on the same die, was the first HSA-compatible 1.0 hardware released to the market.

HSA provides a pool of cache-coherent shared virtual memory that eliminates data transfers between components to reduce latency and boost performance. For instance, when a CPU completes a data processing task, the data may still require graphical processing. This requires the CPU to pass the data from its memory space to the GPU memory, after which the GPU then processes the data and returns it to the CPU. This complex process adds latency and incurs a performance penalty, but shared memory allows the GPU to access the same memory the CPU was utilizng, thus reducing and simplifying the software stack.

Cache coherency is a common tool employed in server environments to streamline operations, but HSA enables application of the technique anywhere one can find an SoC, including a broad range of devices in the client desktop, mobile, and tablet segments, among others.

HSA v1.1 can be applied to any HSA 1.0-compliant architecture without hardware changes, and it expands support from a single vendor to multiple vendors, which allows greater interoperation between both HSA-compliant and non-HSA compliant devices. The foundation also included a strict formal definition of the HSA memory model and instituted system-level profiling capabilities that support an architected event/timestamp model that allows users to access hardware information for profile-guided optimizations or user code analysis (in any language).

The new revision also includes QoS (Quality of Service) enhancements and expanded run-time improvements that include the ability for agents to wait on multiple signals. Other improvements include non-temporal memory access, which allows removal of rarely used data that is polluting a cache layer, and high-level debug information in an updated Finalizer code object.

The HSA Foundation began with the goal of solidifying hardware support, and the next steps of the program travel down the software optimization path. As more platforms become available, such as the broad AMD Carrizo product stack from vendors that include Dell, Asus and Lenovo, the foundation expects the ecosystem to expand rapidly.

HSA v1.1 is available for download from Github and can be used free of charge.

Return to: 2016 Feature Stories