Computers

Intel Looks to Create Gold With its Arc Alchemist and Xe HPG Architecture

The Intel Architecture Day 2021 covered a bunch of new information, including details on Intel’s Alder Lake CPUs, Sapphire Rapids, Ponte Vecchio and the Xe HPC GPU, and more. But here we’re talking about consumer graphics and the new Arc Alchemist GPUs.

Four years after Intel hired GPU guru Raja Koduri away from AMD, its first ‘real’ discrete graphics card ambitions are finally nearing completion. We’ve been hearing details and pieces about the Xe Graphics architecture for a couple of years now, but with the hardware set to launch in the first quarter of 2022, we’re on the final approach, and all of the major details and design decisions are finished. Here’s what we know about the upcoming Intel Arc GPU, the underlying architecture, and what we expect in terms of performance — which might actually be pretty decent (fingers crossed). However, it will need to be more than just “decent” to earn a spot on our list of the best graphics cards.

Intel has been steadily improving its GPU ambitions over the past decade or so, starting with the introduction of HD Graphics back in the Clarkdale era (1st Gen Core) in 2010. From inauspicious beginnings, Intel has become the largest provider of GPUs in the world — provided that you include slow and relatively weak integrated graphics solutions under that umbrella. But when you boost graphics performance by 50 to 100 percent multiple times, eventually you get to the point where even a slow start can reach impressive speeds. That’s where Arc and the Xe HPG architecture come into the picture.

Beyond the Integrated Graphics Barrier 

(Image credit: Intel)

Over the past decade, we’ve seen several instances where Intel’s integrated GPUs have basically doubled in theoretical performance. HD Graphics 3000 (Gen6) was nearly double the performance of the original HD Graphics (Gen5), and HD Graphics 4600 (Gen7) was another doubling, give or take. Gen8 was relatively short-lived, at least on the desktop side, while Gen9/Gen9.5 was again basically double the performance of Gen7 and has been the top desktop solution since the Core i7-6700K launched in 2015. At least until Xe (Gen12) showed up in this year’s Rocket Lake CPU. Gen11 also potentially doubled performance, with Gen12 potentially doubling it again, but both of those were predominantly limited to mobile solutions.

Intel frankly admits that integrated graphics solutions are constrained by many factors: Memory bandwidth and capacity, chip size, and total power requirements all play a role. While CPUs that consume up to 250W of power exist — Intel’s Core i9-10900K and Core i9-11900K both fall into this category — competing CPUs that top out at around 145W are far more common (e.g., AMD’s Ryzen 5000 series). Plus, integrated graphics have to share all of those resources with the CPU, which means it’s typically limited to about half of the total budget. Dedicated graphics solutions have far fewer constraints.

Consider the first generation Xe Graphics found in Tiger Lake. Most of the chips have a 15W TDP, and even the later generation 8-core TGL-H chips only use up to 45W (65W configurable TDP). Except TGL-H also cut the GPU budget down to 32 EUs (Execution Units), where the lower power TGL chips had 96 EUs. In contrast, the top AMD and Nvidia dedicated graphics cards like the Radeon RX 6900 XT and GeForce RTX 3080 Ti have a power budget of 300W to 350W for the reference design, with custom cards pulling as much as 400W.

What could an Intel GPU do with 20X more power available? We’re about to find out — at least, once Intel’s Arc Alchemist GPU launches.

Meet Xe-Core: No More Execution Units 

Image 1 of 9

Intel Arc Alchemist and Xe HPG Architecture

(Image credit: Intel)
Image 2 of 9

Intel Arc Alchemist and Xe HPG Architecture

(Image credit: Intel)
Image 3 of 9

Intel Arc Alchemist and Xe HPG Architecture

(Image credit: Intel)
Image 4 of 9

Intel Arc Alchemist and Xe HPG Architecture

(Image credit: Intel)
Image 5 of 9

Intel Arc Alchemist and Xe HPG Architecture

(Image credit: Intel)
Image 6 of 9

Intel Arc Alchemist and Xe HPG Architecture

(Image credit: Intel)
Image 7 of 9

Intel Arc Alchemist and Xe HPG Architecture

(Image credit: Intel)
Image 8 of 9

Intel Arc Alchemist and Xe HPG Architecture

(Image credit: Intel)
Image 9 of 9

Intel Arc Alchemist and Xe HPG Architecture

(Image credit: Intel)

One of the more interesting announcements from Intel’s Architecture Day is that the heart of its GPU designs, known as Execution Units, will be going away. Fundamentally, we’re still talking about the same basic hardware, but the latest enhancements to the processing pipelines formerly known as EUs are so significant (in Intel’s words) that it has decided to rebrand them. Say hello to Xe-core, which hosts 16 Vector Engines (what used to be called an EU) as well as 16 Matrix Engines — XMX stands for Xe Matrix eXtension, if you’re wondering. What specific enhancements is Intel talking about when referring to the Execution Units? The short answer is that the new Vector Engines support the full DirectX 12 Ultimate feature set. That means support for features including ray tracing, variable rate shading, mesh shaders, and sampler feedback — all of which are also supported by Nvidia’s RTX 20-series Turing architecture from 2018, if you’re wondering. The Vector Engine itself still operates on a 256-bit chunk of data, or the equivalent of eight 32-bit (FP32) operations. The Matrix Engine meanwhile operates on 1024-bit chunks of data, and while Intel didn’t go deep into the design, it looks and sounds as though the MXM cores are analogous to Nvidia’s Tensor cores. They’re designed to accelerate machine learning and AI-related functions, likely using FP16 data types, which means up to 64 16-bit FP16 (or BF16) operations per clock. Much like the AMD and Nvidia GPU architectures, the Xe-core represents just part of the building blocks used for Intel’s Arc GPUs. Like previous designs, the next level up from the Xe-core is called a Render Slice (analogous to an Nvidia GPC, sort of), which contains four Xe-core blocks, which in total means 64 Vector and Matrix Engines, plus additional hardware. That additional hardware includes four ray tracing units, geometry and rasterization pipelines, samplers, and the pixel backend. The ray tracing units are perhaps the most interesting addition, but other than their presence and their capabilities — they can do ray traversal, bounding box intersection, and triangle intersection — we don’t have any details on how the RT units compare to AMD’s ray accelerators or Nvidia’s RT cores. Are they faster, slower, or similar in overall performance? We’ll have to wait to get hardware in hand to find out for sure. Intel did provide a demo of Alchemist running an Unreal Engine demo that apparently uses ray tracing, but it’s for an unknown game, running at unknown settings … and running rather poorly, to be frank. Hopefully that’s because this is early hardware and drivers, but skip to the 4:57 mark in this Arc Alchemist video from Intel to see it in action.

(Image credit: Intel)

Finally, Intel can pair various numbers of render slices together to create the entire GPU, with the L2 cache and the memory fabric tying everything together. The maximum Xe HPG configuration for the initial Arc Alchemist launch will have up to eight render slices. Ignoring the change in naming from EU to Vector Engine, that still gives the same maximum configuration of 512 EU/Vector Engines.

Intel didn’t quote a specific amount of L2 cache, per render slice or for the entire GPU. There will likely be multiple Arc configurations — one with four render slices seem likely; perhaps even a two render slice GPU would be useful. Intel did reveal that its Xe HPC GPUs will have 512KB of L1 cache per Xe-core, and up to 144MB of L2 cache per slice, but that’s a completely different part, and the Xe HPG GPUs will likely have less L1 and L2 cache. Still, given how much benefit AMD saw from its Infinity Cache, we wouldn’t be shocked to see 32MB or more of total cache on the largest Arc GPUs.

While it doesn’t sound like Intel has specifically improved throughput on the Vector Engines compared to the EUs in Gen11/Gen12 solutions, that doesn’t mean performance hasn’t improved. DX12 Ultimate includes some new features that can also help performance, but the biggest change comes via boosted clock speeds. Intel didn’t provide any specific numbers, but it did state that Xe HPG can run at 1.5X frequencies compared to Xe LP, and it also said that Xe HPG delivers 1.5X improved performance per watt. Taken together, we could be looking at clock speeds of 2.0–2.3GHz for the Arc GPUs, which would yield a significant amount of raw compute.

Putting it all together, Arc Alchemist will have up to eight render slices, each with four Xe-cores, 16 Vector Engines per Xe-core, and each Vector Engine can do eight FP32 operations per clock. Double that for FMA operations (Fused Multiply Add, a common matrix operation used in graphics workloads), then multiply by a potential 2.0–2.3GHz clock speed, and we get the theoretical performance in GFLOPS:

8 (RS) * 4 (Xe-core) *16 (VE) * 8 (FP32) * 2 (FMA) * 2.0–2.3 (GHz) = 16,384–18,841.6 GFLOPS

Obviously, GFLOPS (or TFLOPS) on its own doesn’t tell us everything, but 16-19 TFLOPS for the top configurations is certainly nothing to scoff at. Nvidia’s Ampere GPUs theoretically have a lot more compute — the RTX 3080, as an example, has a maximum of 29.8 TFLOPS — but some of that gets shared with INT32 calculations. AMD’s RX 6800 XT, by comparison ‘only’ has 20.7 TFLOPS, but in many games, it can deliver similar performance to the RTX 3080. Either way, depending on final clock speeds, Xe HPG and Arc Alchemist likely come in below the theoretical level of the current AMD and Nvidia GPUs, but not by much. So on paper at least, it looks like Intel could land in the vicinity of the RTX 3070/3070 Ti and RX 6800 — assuming drivers and everything else don’t hold it back.

XMX: Matrix Engines and Deep Learning for XeSS 

Image 1 of 7

Intel Arc Alchemist and Xe HPG Architecture

(Image credit: Intel)
Image 2 of 7

Intel Arc Alchemist and Xe HPG Architecture

(Image credit: Intel)
Image 3 of 7

Intel Arc Alchemist and Xe HPG Architecture

(Image credit: Intel)
Image 4 of 7

Intel Arc Alchemist and Xe HPG Architecture

(Image credit: Intel)
Image 5 of 7

Intel Arc Alchemist and Xe HPG Architecture

(Image credit: Intel)
Image 6 of 7

Intel Arc Alchemist and Xe HPG Architecture

(Image credit: Intel)
Image 7 of 7

Intel Arc Alchemist and Xe HPG Architecture

(Image credit: Intel)

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button