PS5 Pro Technical Seminar Presentation ft. Mark Cerny

80PoundMedia

Join the legendary Mark Cerny, Lead Architect at Sony PlayStation, as he presents a technical seminar on PS5 Pro to an audience at Sony Interactive Entertainment HQ in San Mateo, California. This presentation is intended to be an in-depth look at the hardware and software implemented into Sony's latest console the PlayStation 5 Pro.

Transcript

00:00Hi, I'm Mark Cerny. Today I'd like to do a deep dive into the technology behind our latest console, PlayStation 5 Pro.

00:08Now, to be clear, this is a bits-and-bytes talk with no game footage at all.

00:13Principally, I'll be explaining what we put into the PlayStation 5 Pro GPU and why.

00:19Historically, every seven or so years, there's enough interesting technological advances that we release a new generation of console,

00:26like PS3 or PS4 or 5.

00:30These introduce broad improvements like a more powerful CPU and GPU,

00:34and also significant new capabilities like compute shaders on PS4 or the SSD and 3D audio on PS5.

00:42Games can then be created with the whole console feature set in mind.

00:45It allows for a tremendous step up in what the player experiences.

00:49It does take a certain amount of time for game creators to get up to speed with that new console,

00:54but we're all prepared for that because the benefits of moving to that new generation are so high.

01:00Recently, there have also been console releases during a generation, like PS4 Pro and now PS5 Pro.

01:07These are much more tightly focused, typically on the GPU.

01:10And what developers are making are improved versions of games, never dedicated games.

01:15So targets for Pro consoles are very different.

01:19First, the work that needs to be done by the game creators for the Pro console needs to be kept to an absolute minimum.

01:25They already have a lot of pieces of hardware they're supporting.

01:28We don't want to add much to that burden.

01:31And second, those tightly focused improvements need to be pretty significant.

01:35The games have to play noticeably better.

01:39One of the trickier aspects of console design is that creating a console is roughly a four-year journey.

01:45In order to launch PS5 Pro in 2024, we were actually trying to work out the key feature set in 2020.

01:51In other words, at a time before PlayStation 5 had even been released.

01:56And of course, what we came up with was the set of improvements we've been calling the Big 3.

02:02First, there's that larger GPU.

02:04The idea is simple.

02:06Pretty much anything the game is rendering on PlayStation 5 should get a lot faster on PS5 Pro.

02:12Second, there are the upgrades to the ray tracing hardware.

02:15Those games that move to the new architecture should get a substantial additional speed boost.

02:20And finally, there's AI-driven upscaling.

02:23The upscaling technology is a combination of custom hardware for machine learning

02:27and an AI library called PlayStation Spectral Super Resolution, PSSR for short, that runs on top of that hardware.

02:35PSSR analyzes the game images as it upscales them and can add quite a bit of crispness and detail.

02:42Now to do all that, we needed a larger and more capable GPU.

02:46This is what we used on the original PS5.

02:49It's a GPU from our partner AMD.

02:51More specifically, it's an RDNA 2 GPU, meaning that it used the second generation of AMD's RDNA 2 technology.

02:59The GPU has subunits called workgroup processors, or WGPs.

03:03PlayStation 5 has 18 of them.

03:07The GPU on PS5 Pro is much larger.

03:10It has 30 workgroup processors.

03:12It's also what I'm calling a hybrid RDNA GPU, which is to say it combines multiple generations of RDNA technology.

03:20The base technology for PS5 Pro is somewhere between RDNA 2 and RDNA 3.

03:26I'm calling it RDNA 2.X.

03:29As I'll shortly explain, that choice makes it much easier for game developers to port their games to the new console.

03:37Raytracing uses what I'm calling future RDNA technology.

03:41It's roadmap RDNA that's well past the feature set today.

03:45It's showing up here first.

03:48And machine learning is custom, or to be more specific, it's custom enhancements to RDNA.

03:54And just to be clear, I may say machine learning, or ML, or AI today.

03:59These are just different words for the same topics.

04:02Now, to support that GPU and the overall plan for PS5 Pro, we needed faster memory, and we needed more memory.

04:09The faster part is pretty simple.

04:12The system memory on PS5 Pro has a bandwidth of 576 GB per second, which is 28% higher than PlayStation 5.

04:20More memory is needed for a variety of usage scenarios on PS5 Pro.

04:26Integrating PSSR takes memory. A few hundred megabytes are needed for its internal buffers.

04:32Adding raytracing takes memory.

04:34Raytracing uses data in the form of an acceleration structure that can easily be a few hundred megabytes in size.

04:41And if the game is targeting higher resolution, that can take memory as well.

04:46It can be just a little memory, perhaps if the maximum rendering resolution is being increased a bit.

04:51Or it can be a lot of memory, for example, if the game is targeting 8K.

04:56So we supply over a gigabyte of extra memory to the games, and we do this in the same way we did on PS4 Pro.

05:03Which is to say, we added a hidden slower RAM.

05:06We used DDR5 for that, and moved a lot of the operating system into it.

05:11That's games which need high bandwidth in fast memory.

05:16Getting back to the hybrid GPU, I'd like to take you through each of the three aspects of our strategy,

05:21beginning with the choice of RDNA 2.X as the base technology.

05:27AMD is continuously updating the GPU technology.

05:30RDNA 3 has more functionality and is more performant than RDNA 2.

05:35There's even a chance to bring in future RDNA technologies, like we did with ray tracing.

05:41If we're making a new generation of console, of course we want the latest and greatest,

05:46but with a mid-generation release like PS5 Pro,

05:49we also have to consider that a single game package needs to support both PS5 and PS5 Pro.

05:56That limited the degree to which we could adopt RDNA 3 technologies.

06:01For example, games have something called shader programs that execute on the GPU.

06:05A game might have over 100,000 of them.

06:08If we adopted RDNA 3 technologies to the extent that code compiled to run on PS5 Pro wouldn't run on PS5,

06:16that would mean creating two versions of each executable piece of code.

06:20One for PS5, another for PS5 Pro.

06:23That's a massive complication.

06:25The game package needs to be patched to include that second version,

06:29and then the game needs to either selectively load just the appropriate version,

06:33or find room for both versions in system memory.

06:36It's a big burden for the developers.

06:39Consequently, PS5 Pro uses a version of RDNA that I'm calling RDNA 2.X,

06:45which is bringing in a number of features from RDNA 3,

06:48but not anything that would cause that degree of complications.

06:52For example, aspects of vertex and primitive processing are faster on PS5 Pro.

06:57That's from bringing in parts of the geometry pipe from RDNA 3 that are powerful,

07:02but either trivial for the game to adopt, or better yet, invisible to the game program.

07:08One thing I'd like to clear up is the erroneous 33.5 teraflop number that's been circulating for PS5 Pro.

07:15That number isn't anywhere in our developer docs.

07:18It comes from a misunderstanding by someone commenting on leaked PS5 Pro technical information.

07:24Part of the confusion comes from RDNA 3 architectures having double the flops of RDNA 2 architectures.

07:30Now, to quote Digital Foundry on this topic,

07:33it's a nice little bonus to have twice the flops,

07:36but it doesn't do anything like double real-world performance.

07:39So there's a certain amount of flopflation going on here.

07:44We did not bring in the doubled floating-point math from RDNA 3,

07:48because achieving that bonus in performance would require a recompile for PS5 Pro.

07:53As I said, having two versions of each compiled piece of code

07:56would create more work than we're comfortable asking the developers to do.

08:01Here, then, are the correct stats for PS5 Pro.

08:04It's pretty simple.

08:05PS5 Pro has 30 workgroup processors, which is 67% more than PS5 has,

08:11so the flops should be 67% higher as well.

08:14If we assume a pretty common operating frequency of 2.17 GHz,

08:18the math works out to 16.7 teraflops on PS5 Pro.

08:24Of course, teraflop numbers are pretty meaningless.

08:27What isn't meaningless is the performance of the PS5 Pro GPU.

08:3167% more workgroup processors means that we can create synthetic tests

08:35that show 67% faster processing.

08:38In practice, though, there are a lot of factors involved,

08:41such as memory bandwidth or even how a particular game engine

08:44responds to the details of the new architecture.

08:47So a game team might be looking for something more like a 45% increase

08:51in rendering speed.

08:53That's still a huge improvement, though.

08:55At that performance, it means that if a game is running at 60 FPS

08:59and is taking 16 milliseconds to render on PS5,

09:03then that same frame could be rendered in 11 milliseconds on PS5 Pro.

09:07That leaves 5 milliseconds to do something new and exciting,

09:11like adding ray tracing,

09:13which is the second of our three key improvements on PS5 Pro.

09:17There's a passion in the game development community for ray tracing.

09:21Even in 2020, before the launch of PlayStation 5,

09:24we could see creators using ray tracing to add reflections

09:28and improved lighting to their games.

09:30At the same time, calculation costs for the rays were pretty high,

09:34so when we kicked off development of PS5 Pro later that year,

09:37one of our top priorities was finding ways to accelerate that computation.

09:42Those conversations with AMD led to our very nice feature set

09:46that's showing up here first.

09:49Note that there's two factors increasing the performance.

09:52It's not just that there are 67% more workgroup processors.

09:56Thanks to the new feature set, each one is more capable.

10:00It's difficult to quote an exact speedup

10:02because it's very dependent on specifics of usage,

10:05but we commonly see the calculation of the rays occurring

10:08at double or triple the speed of PlayStation 5.

10:12The most impactful new features in PS5 Pro

10:15relate to a new acceleration structure and stack management in hardware.

10:19There's a lot to unpack here.

10:22First, let's talk about the improvements related to the acceleration structure.

10:26In order to use ray tracing on PlayStation 5,

10:29you need to have data and system memory that describes your geometry,

10:33say, a million triangles worth.

10:35Then there's something called an intersection engine

10:38inside each of the workgroup processors

10:40that lets you check to see if a ray hit any of those million triangles

10:44and which it hit first.

10:46It would be too slow to test each ray individually

10:49against all million of those triangles,

10:51so there are also boxes in the data structure.

10:54These boxes let the ray tracing hardware

10:56more efficiently home in on the triangles that might be intersected.

11:00For example, we can see that the ray misses that upper left box,

11:04so there's no need to test the ray

11:06against any of the triangles contained within it.

11:09The boxes are actually in a hierarchy,

11:11starting with big ones and progressively reducing in size.

11:15Every time we hit a box, we test against the boxes nested within it

11:19until ultimately we reach some triangles we can test against.

11:24Together, those triangles and boxes are called the acceleration structure.

11:29On the original PlayStation 5,

11:31we used a type of acceleration structure called a BVH-4.

11:35BVH stands for Bounding Volume Hierarchy, meaning hierarchy of boxes,

11:40and the 4 indicates that the boxes are in groups of up to 4.

11:44The intersection engine can then check a ray

11:46against up to 4 boxes a cycle or one triangle.

11:50Generally speaking, there's a lot more checking against boxes.

11:53That's what primarily determines the performance of the ray calculations.

11:58PS5 Pro adds a BVH-8 option for the acceleration structure,

12:02where the boxes are efficiently encoded in groups of 8,

12:05and the intersection engine runs twice as fast.

12:08A ray can be tested against 8 boxes a cycle or 2 triangles.

12:13That doubling of the ray intersection speed

12:15has a great theoretical impact on ray tracing performance,

12:19but real-world cases also need a solution to the problem of divergence.

12:23That's what led us to our second big feature,

12:26which is stack management in hardware.

12:29Before I can explain that feature, though, I have to explain divergence.

12:33The workgroup processors handle groups of 32 or 64 items at once.

12:38They could be pixels or vertices or rays.

12:41This strategy is called SIMD, Single Instruction Multiple Data.

12:46So SIMD32 means the same operations are being performed on 32 items.

12:51This works very well when all 32 items are getting the same treatment.

12:55For example, 32 pixels from a triangle

12:58are all reading from locations in the same texture,

13:01and then those 32 pixels all need a lighting calculation.

13:05This is called coherent processing.

13:08There's a difficulty that arises with divergent processing,

13:11where some of the pixels need one action taken,

13:14and others need something else.

13:16In this case, it's quite possible that the processing takes twice as long.

13:20In the limit, if all 32 items need different handling,

13:24it's possible to be dozens of times slower.

13:27So when 32 rays are being processed together,

13:30the degree of divergence has a big impact on performance.

13:34Ray tracing can be fairly coherent.

13:36When we compute simple shadows from the sun, the rays are all parallel.

13:41But ray tracing can also get extremely divergent.

13:44If rays are bouncing off of a curved surface or a bumpy object,

13:48then potentially they're all heading in different directions.

13:52The shader code needed to handle divergent rays on PS5

13:56is reasonably complicated.

13:58Part of what the code has to do is manage a stack.

14:01The internal structure of the BVH is quite complex,

14:04and each of the 32 rays can be traversing it in a different fashion.

14:09On PS5 Pro, stack management is in hardware,

14:12which greatly simplifies the shader program.

14:15It's shorter, which means it's faster,

14:17and since it's handling fewer cases, there's less divergence,

14:21which further increases the speed of execution.

14:25You may have noticed that there are now two versions of the code that are needed,

14:29a longer one for PlayStation 5 and a simplified one for PS5 Pro.

14:33But this need for two versions only applies to shader programs

14:36that are calculating how rays travel through the scene.

14:39There's typically not too many of those.

14:41In fact, with some games, it's just a single shader program

14:44that needs the two versions.

14:47Putting that all together, it's great to have the higher performance,

14:50but even better, we're seeing more consistently high performance on PS5 Pro.

14:55That consistency comes from the improved handling of divergence.

14:59Performance testing on PS5 Pro tends to show a good boost

15:02for the coherent cases like shadows or reflections off of flat surfaces,

15:07but a much nicer boost for the divergent cases.

15:10The stack management hardware is really helping there.

15:13Having more consistent performance across a broad set of use cases

15:16will go a long way towards easing adoption of ray tracing.

15:20The final improvements to the GPU are for machine learning.

15:24There are a lot of uses for machine learning or AI,

15:27whichever term you prefer.

15:29Large language models and generative AI are quite interesting tech,

15:33but with ML, it's also possible to go after a very specific target,

15:37which is to give the games a graphical boost.

15:41One of the key ways that can work is that the game renders less.

15:44There's 8 million pixels on a 4K TV.

15:47If the game renders sparsely, say, a quarter of those pixels,

15:50it can do it a lot faster,

15:52and then the right neural network can intuit how to fill in those gaps

15:56and make a high-quality image.

15:59Another way to think about this, which is not quite as accurate,

16:02is that the game renders a smaller, lower-resolution image

16:06and then uses the neural network to upscale that image.

16:10This is called super resolution,

16:12and it's part of a whole family of strategies

16:14that reduce the work involved in rendering the game images.

16:18There's also frame generation or frame extrapolation,

16:21where a neural network inserts additional frames

16:24between the ones the game renders.

16:26That can really reduce the choppiness of low-frame rate games.

16:29Neural networks can also be used

16:31to turn noisy, staticky images into smooth ones,

16:34which is an issue that crops up frequently,

16:36particularly with optimized ray tracing.

16:39Having said that, super resolution is definitely the focus of our current efforts.

16:45It's important to note that high-quality upscaling

16:48changes the way we should be thinking about game rendering resolution.

16:52Let's imagine three games that are rendering at various resolutions.

16:56A reductionist view on this is that the 1440p game engine is the best,

17:00and that the 1080p game is clearly flawed.

17:03But after a super resolution pass,

17:05these are all ending up at 4K resolution for display.

17:09The conversation really needs to be about what's important—image quality.

17:14When game creators improve their lighting or materials or add ray tracing,

17:18then rendering each pixel can get more expensive,

17:21and the resolution will drop.

17:23They move up on this chart.

17:24And that's perfectly fine,

17:26as long as the upscaling technology is ensuring

17:28that the result is a crisp, beautiful image and not a blurry one.

17:33A different way to say that is that high-quality super resolution

17:37lets game creators focus on fewer, richer pixels

17:40and significantly improve the resulting image quality.

17:43It's a world where internal rendering resolution is not the primary concern.

17:48That's the world we want to be in.

17:50When we're using these strategies for graphics,

17:53the work isn't exclusively ML.

17:55The neural networks tend to be preceded by conventional processing

17:58and followed by some as well.

18:00It's the piece in the middle that's the neural network.

18:02More specifically, it's a type of neural network called a CNN,

18:06which stands for Convolutional Neural Network.

18:10Here's a simplified CNN for super resolution.

18:13It's not quite the one we use in PSSR,

18:15but close enough for the purposes of this conversation.

18:19You can see a lot of images.

18:21In the language of machine learning, they're called tensors,

18:24but basically they're images with many bytes of data per pixel.

18:28The colored arrows are the layers of the network.

18:31They're used for processing those images.

18:33There's quite a few of them as well.

18:35Let's zoom in on the first layer.

18:38Its input is a game image, 4K RGB.

18:41Perhaps it's a quickie upscale of what the game rendered.

18:44The first layer does lots of matrix math

18:47and then outputs another 4K image,

18:49now with substantially more information per pixel,

18:52maybe 16 bytes describing edges and the like

18:55that the neural network found in that input game image.

18:59The second layer then picks up the output of the first layer

19:02and does a lot more matrix math.

19:04The resulting image might reflect some deeper understanding

19:07about what the game rendered.

19:09There's also layers that reduce the size of the images.

19:12Downsizing to 540p or even 270p

19:15lets the neural network efficiently analyze

19:18larger scale structures in the input game image.

19:22As you can tell, there's a phenomenal amount of math going on here.

19:2610,000 operations are being performed on every input pixel.

19:30And we need to do that math in something like a millisecond.

19:35Consequently, ML hardware needs very high performance,

19:38typically hundreds of trillions of operations per second.

19:41Note that we don't call those teraflops

19:43because they're integer operations.

19:45Instead, we say dups.

19:48There were a number of early decisions we had to make.

19:51The very first was deciding where in the hardware

19:54we would put whatever would do all of that matrix math.

19:58Generally speaking, there's two options.

20:01One option is to put ML capabilities into a GPU.

20:04Of course, that's additional logic so the GPU gets larger.

20:08Or one can add an NPU, a neural processing unit.

20:12NPUs are brilliant at executing neural networks,

20:15but perhaps not so good with the preprocessing and postprocessing

20:18surrounding the CNNs.

20:21The deciding factor was the order of graphics processing within a frame.

20:25With this approach, most rendering in a frame

20:28is done at low resolution, maybe 1080p.

20:31Then machine learning is used to upscale to 4K.

20:34And there's no more rendering that can happen

20:36until that upscale finishes.

20:38So we need to process that neural network as quickly as possible,

20:41which is to say we need very powerful ML hardware.

20:45The more powerful, the better.

20:47That need for power is what pushed us towards using an enhanced GPU.

20:51The choice was either adding a large NPU

20:54or making more moderate enhancements to the GPU.

20:57The next big decision was where all of this technology

21:00was going to come from.

21:02When we were starting the PS5 Pro project in 2020,

21:05we knew that we would need performant ML hardware

21:08and a high-quality neural network for super resolution.

21:12But we're not looking for ML hardware that's generically high performance.

21:16We need something that's optimal for our specific kinds of workloads.

21:20And our typical workload is a lightweight CNN,

21:23something that can run in a millisecond or so

21:25and has a lot of little layers.

21:28Broadly speaking, you can license tech or purchase tech or build tech.

21:32But once you're licensing technology, that's what you're doing forever.

21:35So in 2020, despite the degree of effort required,

21:39we decided to build our own hardware and software technology.

21:44I'll start with the hardware.

21:46We made a set of targeted enhancements to the RDNA shader core

21:49and the surrounding memory systems.

21:51We're calling it custom RDNA,

21:53as it is custom hardware created to our design specifications,

21:57but within the overall RDNA architecture

22:00and, of course, implemented by the RDNA experts at AMD.

22:05Our target for the peak computational capabilities was 308 bit tops,

22:09which is to say 300 trillion operations a second

22:13using bytes as input.

22:15There's a lot of thought that needs to go into the details

22:18of exactly how that math functions,

22:20but adding that amount of raw performance is not terribly hard to do.

22:24The difficulty is memory access.

22:27PS5 Pro has 576 gigabytes a second of bandwidth to system memory.

22:32When we compare that bandwidth

22:34with the computational capability of 300 tops,

22:37it's clear that it's easy to be bandwidth limited.

22:40Let me give two examples.

22:42If we do a computation where we read a byte as input

22:45and then eventually write a byte, that's two bytes on the bus,

22:48and the balance point of the system is about 1,000 operations on that byte.

22:53If we're doing more than that, we have a well-designed system.

22:56The 300 tops is being meaningfully utilized.

22:59If we do less than that, though, we are bandwidth bound,

23:02and we're wasting some of that machine learning capability.

23:051,000 operations is a lot.

23:08Alternatively, to understand this issue,

23:11we can take a look at one of the layers of the network,

23:14say the second layer from the example I showed before.

23:17Let's imagine we have to read that input image

23:20and that it has 16 bytes of information for every pixel.

23:23That's about 128 megabytes on the system bus.

23:26Then we do our math, say a pointwise convolution,

23:29and write the output image,

23:31which is another 128 megabytes on the system bus.

23:34We are completely bandwidth bound.

23:37We're only using something like 3% of our potential 300 tops.

23:41So we're throwing out 97% of our performance,

23:44and those reads and writes are going to take half a millisecond

23:47just for this one layer.

23:49That's almost half of our budget for the entire CNN.

23:52One strategy for getting around these bandwidth issues

23:55is to fuse layers.

23:57The idea is to read up that input image,

23:59process the first layer,

24:01and then stick the results somewhere,

24:03maybe in fast on-chip memory,

24:05where the second layer can quickly and easily get access to them.

24:08As a result, we're reading from system memory once

24:11and writing once,

24:13but now we're processing two layers of the CNN

24:16and using something like 6% of our 300 tops.

24:19Still terrible, but an improvement.

24:22What we really want here is a fully fused network.

24:26That's the holy grail of neural network implementation.

24:29With a fully fused network,

24:31you're reading the input game image from system memory at the very start,

24:34processing all of the layers of the CNN internally on-chip,

24:38and then writing the results back to system memory at the very end.

24:41With bandwidth that low,

24:43that 300 tops number is finally meaningful.

24:46There's two problems we need to solve, though.

24:49The first relates to the amount of on-chip memory required.

24:53There's 8 million pixels in a 4K image.

24:56If each pixel needs 16 bytes, that's about 128 megabytes.

25:00In terms of on-chip memory, that's a lot.

25:03Luckily, we don't need to process the whole screen at once.

25:06We can subdivide the screen

25:08and take just a piece of it at a time through the neural network.

25:11Let's call that piece a tile.

25:13Problem solved, right?

25:16The difficulty we encounter is that as we are processing the tile,

25:19bad data creeps in from the edges,

25:22so we have to throw out part of our results.

25:24The smaller the tile is,

25:26the higher the proportion of data that has to be discarded.

25:29There are therefore effective limits to how small we can make the tile.

25:33And correspondingly, there's a certain amount of fast on-chip memory that's key

25:37if we are to achieve that goal of a fully fused network.

25:42The other problem we need to solve

25:44relates to the bandwidth of that on-chip memory.

25:47Our targets are incredibly high.

25:49We'd like many, many terabytes per second.

25:51When you think in those terms, everything seems small.

25:54For example, we could increase the size of the GPU's L2 cache

25:58and try to use that for the on-chip memory,

26:00but unfortunately the L2 bandwidth is just a few terabytes a second.

26:05This memory problem was the starting point for our custom design.

26:09From there, it's been almost a four-year journey.

26:12I'll hit a few high points of the hardware architecture,

26:15beginning with the memory we ended up using.

26:18It turns out we do have fast on-chip RAM in the RDNA architecture

26:22with an aggregate bandwidth of 200 terabytes per second.

26:26We just need to change our mindset.

26:29What we're doing on PS5 Pro is using the vector registers

26:32in the workgroup processors as that RAM.

26:35Each workgroup processor has four sets of registers,

26:38each 128k in size and with a bandwidth of over a terabyte per second.

26:4330 workgroup processors therefore give us 15 megabytes of memory

26:47at a combined bandwidth of 200 terabytes per second,

26:51which is to say several hundred times faster than system memory.

26:56Of course, the roadmap RDNA's architecture and instruction set

26:59required some modifications to take better advantage of that register RAM.

27:04We ended up adding 44 new shader instructions.

27:07Those instructions take that freer approach to register RAM access

27:11and also implement the math needed for the CNNs,

27:14which is primarily done in 8-bit precision.

27:18These instructions are specifically designed to operate in a takeover mode

27:22where each WGP processes the CNN for a single screen tile.

27:28By the way, the 300 TOPS number has been a real mystery

27:31since it leaked early this year.

27:33No one on the outside has been able to derive that number

27:36from the workgroup processor count and the GPU frequency.

27:39The secret is that there are instructions that perform 3x3 convolutions.

27:44Those use 9 multiplies and 9 adds for a total of 18 operations.

27:49And at that pretty common GPU frequency of 2.17 GHz,

27:53the performance really does work out to 300 TOPS.

27:56Here's the math.

27:59The CNNs also need 16-bit math, so there's a number of instructions for that.

28:03These instructions tend to be a bit simpler and more straightforward.

28:07We kept the chip area and the cost low for the 16-bit math

28:11simply by targeting lower 16-bit performance,

28:14because most of the processing in these CNNs can be done with 8-bit operations.

28:19As for 32-bit math, nothing in the CNNs particularly seems to need it,

28:23so we just left it as is.

28:26Our custom RDNA solution also involved a number of additional features,

28:31which I'm going to skip over so I can get to the other half of what we built,

28:34which is the neural network for super-resolution

28:37that we created to run on top of that custom RDNA architecture.

28:42PSSR is an original PlayStation design.

28:46The full name, of course, is PlayStation Spectral Super Resolution,

28:50and that Spectral is branding.

28:52It doesn't refer to any particular aspect of the algorithm.

28:55Just like we have Tempest for audio tech,

28:58we're using Spectral for our ML libraries for graphics.

29:02One of the project goals for PSSR is ease of adoption,

29:05so it uses essentially the same set of inputs as FSR or DLSS or XCSS.

29:11Those strategies use the pixel color of the current frame,

29:14but also depth information and motion vectors

29:17that give the flow of the pixels between the previous frame and the current frame.

29:21PSSR is not quite a drop-in replacement for the other strategies, but it's close.

29:27Having said that, PSSR is designed for consoles,

29:30so its primary use case is a little different from the others.

29:34PC games tend to render at a fixed resolution

29:37and with frame rate that varies based on scene complexity.

29:40Gaming monitors can handle that variable frame rate.

29:43So a typical PC game scenario is render at fixed resolution,

29:47upscale by a fixed 2 to 1 ratio, display at fixed resolution.

29:52In contrast, console games tend to have a frame rate that's fixed

29:55because they're displaying on a 60fps TV.

29:58What varies is the rendering resolution.

30:01If the scene is complex, then the rendering resolution is lower.

30:04If the scene is simpler, the rendering resolution is higher.

30:07Since the display resolution is usually fixed at 4K,

30:11PSSR needs to handle a continuously changing upscaling ratio.

30:15That scenario is primarily what we design for and train for.

30:20Of course, PC games are increasingly supporting variable rendering resolution,

30:25and all of these upscaling strategies can handle fixed upscaling ratios

30:29and variable upscaling ratios.

30:31I'm just pointing out that the focus with the PSSR project

30:34has been a little bit different.

30:36So with those goals in mind, starting in 2021,

30:39we considered a lot of types of neural networks.

30:42They were all recurrent networks,

30:44which is to say they feed some of the results back in as inputs.

30:48For what it's worth, we looked at flat networks

30:51that just run at the display resolution,

30:54networks that run at the lower rendering resolution

30:57with a little final bump up to display resolution,

31:00autoencoders that step down the resolution and step it back up,

31:04and UNETs that do the same but with different connectivity.

31:08And that's where we ended up.

31:10PSSR is a recurrent UNET.

31:13We also learned just how much work remains after a network is chosen.

31:17We did a lot of training and then did beta releases to select developers

31:21and got to see all kinds of issues cropping up

31:24once PSSR was actually integrated into games.

31:26And that required yet more training passes.

31:29Some of those issues were trivial.

31:31We found out that one game used a perfect blue in its sky,

31:34and PSSR had never seen perfect blue in its training.

31:37It had no idea what to do with it.

31:40Of course, some of the issues we encountered were much more complex.

31:44Looking back at the four years since we started this project,

31:47I'm so glad that we made the time-intensive decision

31:50to build our own technology.

31:52Results are good, and just as importantly,

31:55we've learned so much about how AI can improve game graphics.

31:59It can only make our future brighter.

32:02So that was the background and details of our improvements

32:06in these three key areas on PS5 Pro.

32:09The larger GPU, the advanced ray tracing, and the AI-driven upscaling.

32:13I'm going to restate those three somewhat,

32:16and then I'd like to take a moment to do something we very rarely do,

32:20which is talk about the future.

32:22Specifically, I'd like to talk about the future potential

32:25in each of these three key areas.

32:28First, there's rasterized rendering,

32:30by which I mean the conventional rendering strategies

32:32that were all we had up through PS4 Pro or so.

32:35There's not a whole lot of growth left here.

32:37It mostly has to come from making the GPU bigger or memory faster.

32:42Ray tracing is different.

32:44It's still early days for the technology,

32:46and I suspect we're in for several quantum leaps in performance

32:49over the next decade.

32:51Machine learning, though, has the greatest potential for growth,

32:55and that's an area we're beginning to focus on.

32:58Some of that growth in machine learning

33:00will come from more performant and more efficient hardware architectures.

33:04The ML architecture in PS5 Pro is quite good,

33:07but we did not, in fact, achieve that holy grail

33:10of a fully-fused network when running PSSR.

33:13It's close, but PSSR can't quite keep all of its intermediate data on-chip,

33:18and therefore does, to some degree, bottleneck on system memory access.

33:23We see definite room for improvement in future ML hardware.

33:27An additional source of future growth

33:29will come from more sophisticated neural networks.

33:32When fewer higher-quality pixels are combined with the right neural network,

33:36the result is richer graphics.

33:39One way to look at this is supportable upscaling ratio.

33:43If we're able to create quality imagery with a 2-to-1 upscale,

33:46and can then improve the neural network

33:48and reach the same image quality with a 3-to-1 upscale,

33:51then the effective power of the GPU has roughly doubled.

33:55And that stacks on top of whatever is being done

33:57to speed up rasterized rendering or ray tracing.

34:00There's enormous potential here.

34:03We also hope to be heading towards multiple uses of these CNNs within a frame,

34:07not just super-resolution, but also some of the other targets I was talking about,

34:11such as the denoising that's needed when doing optimized ray tracing.

34:16Through PS5 Pro, we've developed some good understanding

34:19of hardware design for machine learning, as well as neural network design,

34:23and we intend to continue this work with a pinpoint focus on games.

34:27Of course, as part of their broader strategy,

34:29AMD is pursuing many of the same goals.

34:33And so I have some very exciting news to share.

34:36We have begun a deeper collaboration with AMD.

34:40For the project name, we're taking a hint from AMD's Red and PlayStation's Blue.

34:45The code name is Amethyst.

34:47With Amethyst, we've started on another long journey

34:50and are combining our expertise with two goals in mind.

34:54The first goal is a more ideal architecture for machine learning,

34:58something capable of generalized processing of neural networks,

35:01but particularly good at the lightweight CNNs needed for game graphics,

35:05and something focused around achieving that holy grail of fully fused networks.

35:11In going after this, we're combining the lessons AMD has learned

35:14from its multi-generation RDNA roadmap,

35:16and SIE has learned from the custom work in PS5 Pro.

35:21But ML use in games shouldn't and can't be restricted to graphics libraries.

35:27We're also working towards a democratization of machine learning,

35:30something accessible that allows direct work in AI and ML by game developers,

35:35both for graphics and for gameplay.

35:38Amethyst is not about proprietary technology for PlayStation.

35:41In fact, it's the exact opposite.

35:44Through this technology collaboration,

35:46we're looking to support broad work in machine learning across a variety of devices.

35:52The other goal is to develop, in parallel, a set of high-quality CNNs for game graphics.

35:59Both SIE and AMD will independently have the ability to draw from this collection

36:03of network architectures and training strategies,

36:06and these components should be key in increasing the richness of game graphics,

36:10as well as enabling more extensive use of ray tracing and path tracing.

36:15We're looking forward to keeping you posted

36:17throughout what we anticipate to be a multi-year collaboration.

36:22Let me get back to PS5 Pro for one final moment.

36:26You've now heard a bit about our fairly intense last few years

36:29building this console and developing PSSR.

36:33There's been so much learning for us as we delve into these new technologies.

36:37But the payoff, as I said in my PS5 tech video a few years back, is in the games.

36:43And by now we know to expect the unexpected.

36:46It's an absolute guarantee that the development community

36:49will grab a hold of this technology

36:51and move in a direction that we never could have anticipated.

36:54Personally, I can't wait to see what they do with it.

36:58Thank you for your time today.

Category

Transcript

Recommended