Oleg Zabluda's blog
Thursday, September 08, 2016
 
ARM adds 2048-bit vectors to v8A with SVE
ARM adds 2048-bit vectors to v8A with SVE
"""
ARM unveiled their SVE extensions for supercomputing. [...] Scalable Vector Extension and it does indeed scale from 128 to 2048 bits in 128b chunks. It is an optional ISA extension for ARM v8-A/AARCH64 for use in supercomputing, not consumer or media type work. While it may fit some of those workloads, it is not NEON v2, it is separate and distinct by design. It also isn’t fully finalized and public, that release is expected in late 2016 or early 2017 with silicon bearing SVE not expected until 2019 or 2020.
[...]
The nice thing about SVE is that it is vector length agnostic, your hardware can range from 128-2048b and the code can be written for 128-2048b vector units and they don’t have to match. If your vectors are 2048b wide and the hardware is only 128b wide, code will automatically run in 16 passes. If the code is 128b wide vectors and the hardware is 2048b wide, 15/16ths of the hardware will be powered down
[...]
Given that the marquee customer is Fujitsu and their Post-K supercomputer [to replace Sparc] which will use a 512b wide SVE pipe, you can be pretty sure their data will come in 512b increments. Others making SVE enabled silicon for similar projects can pick the physical widths to suit their projects.

One thing SVE won’t do is pack unfilled vector units with multiple disparate instructions automatically. If you have a 512b SVE unit and four independent 128b vectors, the hardware will not automagically run them in one cycle. If you have a compiler that can pack this type of work together before hand, you win, but the hardware won’t do it for you. This plus the ISA itself is why SVE isn’t really suited for consumer or image processing work.
[...]
scatter/gather, per-lane predication, and predicate–based loop control.
"""
http://semiaccurate.com/2016/09/07/arm-adds-2048-bit-vectors-v8a-sve/

Labels:


| |

Home

Powered by Blogger