• last year
Join the legendary Mark Cerny, Lead Architect at Sony PlayStation, as he presents a technical seminar on PS5 Pro to an audience at Sony Interactive Entertainment HQ in San Mateo, California. This presentation is intended to be an in-depth look at the hardware and software implemented into Sony's latest console the PlayStation 5 Pro.
Transcript
00:00Hi, I'm Mark Cerny. Today I'd like to do a deep dive into the technology behind our latest console, PlayStation 5 Pro.
00:08Now, to be clear, this is a bits-and-bytes talk with no game footage at all.
00:13Principally, I'll be explaining what we put into the PlayStation 5 Pro GPU and why.
00:19Historically, every seven or so years, there's enough interesting technological advances that we release a new generation of console,
00:26like PS3 or PS4 or 5.
00:30These introduce broad improvements like a more powerful CPU and GPU,
00:34and also significant new capabilities like compute shaders on PS4 or the SSD and 3D audio on PS5.
00:42Games can then be created with the whole console feature set in mind.
00:45It allows for a tremendous step up in what the player experiences.
00:49It does take a certain amount of time for game creators to get up to speed with that new console,
00:54but we're all prepared for that because the benefits of moving to that new generation are so high.
01:00Recently, there have also been console releases during a generation, like PS4 Pro and now PS5 Pro.
01:07These are much more tightly focused, typically on the GPU.
01:10And what developers are making are improved versions of games, never dedicated games.
01:15So targets for Pro consoles are very different.
01:19First, the work that needs to be done by the game creators for the Pro console needs to be kept to an absolute minimum.
01:25They already have a lot of pieces of hardware they're supporting.
01:28We don't want to add much to that burden.
01:31And second, those tightly focused improvements need to be pretty significant.
01:35The games have to play noticeably better.
01:39One of the trickier aspects of console design is that creating a console is roughly a four-year journey.
01:45In order to launch PS5 Pro in 2024, we were actually trying to work out the key feature set in 2020.
01:51In other words, at a time before PlayStation 5 had even been released.
01:56And of course, what we came up with was the set of improvements we've been calling the Big 3.
02:02First, there's that larger GPU.
02:04The idea is simple.
02:06Pretty much anything the game is rendering on PlayStation 5 should get a lot faster on PS5 Pro.
02:12Second, there are the upgrades to the ray tracing hardware.
02:15Those games that move to the new architecture should get a substantial additional speed boost.
02:20And finally, there's AI-driven upscaling.
02:23The upscaling technology is a combination of custom hardware for machine learning
02:27and an AI library called PlayStation Spectral Super Resolution, PSSR for short, that runs on top of that hardware.
02:35PSSR analyzes the game images as it upscales them and can add quite a bit of crispness and detail.
02:42Now to do all that, we needed a larger and more capable GPU.
02:46This is what we used on the original PS5.
02:49It's a GPU from our partner AMD.
02:51More specifically, it's an RDNA 2 GPU, meaning that it used the second generation of AMD's RDNA 2 technology.
02:59The GPU has subunits called workgroup processors, or WGPs.
03:03PlayStation 5 has 18 of them.
03:07The GPU on PS5 Pro is much larger.
03:10It has 30 workgroup processors.
03:12It's also what I'm calling a hybrid RDNA GPU, which is to say it combines multiple generations of RDNA technology.
03:20The base technology for PS5 Pro is somewhere between RDNA 2 and RDNA 3.
03:26I'm calling it RDNA 2.X.
03:29As I'll shortly explain, that choice makes it much easier for game developers to port their games to the new console.
03:37Raytracing uses what I'm calling future RDNA technology.
03:41It's roadmap RDNA that's well past the feature set today.
03:45It's showing up here first.
03:48And machine learning is custom, or to be more specific, it's custom enhancements to RDNA.
03:54And just to be clear, I may say machine learning, or ML, or AI today.
03:59These are just different words for the same topics.
04:02Now, to support that GPU and the overall plan for PS5 Pro, we needed faster memory, and we needed more memory.
04:09The faster part is pretty simple.
04:12The system memory on PS5 Pro has a bandwidth of 576 GB per second, which is 28% higher than PlayStation 5.
04:20More memory is needed for a variety of usage scenarios on PS5 Pro.
04:26Integrating PSSR takes memory. A few hundred megabytes are needed for its internal buffers.
04:32Adding raytracing takes memory.
04:34Raytracing uses data in the form of an acceleration structure that can easily be a few hundred megabytes in size.
04:41And if the game is targeting higher resolution, that can take memory as well.
04:46It can be just a little memory, perhaps if the maximum rendering resolution is being increased a bit.
04:51Or it can be a lot of memory, for example, if the game is targeting 8K.
04:56So we supply over a gigabyte of extra memory to the games, and we do this in the same way we did on PS4 Pro.
05:03Which is to say, we added a hidden slower RAM.
05:06We used DDR5 for that, and moved a lot of the operating system into it.
05:11That's games which need high bandwidth in fast memory.
05:16Getting back to the hybrid GPU, I'd like to take you through each of the three aspects of our strategy,
05:21beginning with the choice of RDNA 2.X as the base technology.
05:27AMD is continuously updating the GPU technology.
05:30RDNA 3 has more functionality and is more performant than RDNA 2.
05:35There's even a chance to bring in future RDNA technologies, like we did with ray tracing.
05:41If we're making a new generation of console, of course we want the latest and greatest,
05:46but with a mid-generation release like PS5 Pro,
05:49we also have to consider that a single game package needs to support both PS5 and PS5 Pro.
05:56That limited the degree to which we could adopt RDNA 3 technologies.
06:01For example, games have something called shader programs that execute on the GPU.
06:05A game might have over 100,000 of them.
06:08If we adopted RDNA 3 technologies to the extent that code compiled to run on PS5 Pro wouldn't run on PS5,
06:16that would mean creating two versions of each executable piece of code.
06:20One for PS5, another for PS5 Pro.
06:23That's a massive complication.
06:25The game package needs to be patched to include that second version,
06:29and then the game needs to either selectively load just the appropriate version,
06:33or find room for both versions in system memory.
06:36It's a big burden for the developers.
06:39Consequently, PS5 Pro uses a version of RDNA that I'm calling RDNA 2.X,
06:45which is bringing in a number of features from RDNA 3,
06:48but not anything that would cause that degree of complications.
06:52For example, aspects of vertex and primitive processing are faster on PS5 Pro.
06:57That's from bringing in parts of the geometry pipe from RDNA 3 that are powerful,
07:02but either trivial for the game to adopt, or better yet, invisible to the game program.
07:08One thing I'd like to clear up is the erroneous 33.5 teraflop number that's been circulating for PS5 Pro.
07:15That number isn't anywhere in our developer docs.
07:18It comes from a misunderstanding by someone commenting on leaked PS5 Pro technical information.
07:24Part of the confusion comes from RDNA 3 architectures having double the flops of RDNA 2 architectures.
07:30Now, to quote Digital Foundry on this topic,
07:33it's a nice little bonus to have twice the flops,
07:36but it doesn't do anything like double real-world performance.
07:39So there's a certain amount of flopflation going on here.
07:44We did not bring in the doubled floating-point math from RDNA 3,
07:48because achieving that bonus in performance would require a recompile for PS5 Pro.
07:53As I said, having two versions of each compiled piece of code
07:56would create more work than we're comfortable asking the developers to do.
08:01Here, then, are the correct stats for PS5 Pro.
08:04It's pretty simple.
08:05PS5 Pro has 30 workgroup processors, which is 67% more than PS5 has,
08:11so the flops should be 67% higher as well.
08:14If we assume a pretty common operating frequency of 2.17 GHz,
08:18the math works out to 16.7 teraflops on PS5 Pro.
08:24Of course, teraflop numbers are pretty meaningless.
08:27What isn't meaningless is the performance of the PS5 Pro GPU.
08:3167% more workgroup processors means that we can create synthetic tests
08:35that show 67% faster processing.
08:38In practice, though, there are a lot of factors involved,
08:41such as memory bandwidth or even how a particular game engine
08:44responds to the details of the new architecture.
08:47So a game team might be looking for something more like a 45% increase
08:51in rendering speed.
08:53That's still a huge improvement, though.
08:55At that performance, it means that if a game is running at 60 FPS
08:59and is taking 16 milliseconds to render on PS5,
09:03then that same frame could be rendered in 11 milliseconds on PS5 Pro.
09:07That leaves 5 milliseconds to do something new and exciting,
09:11like adding ray tracing,
09:13which is the second of our three key improvements on PS5 Pro.
09:17There's a passion in the game development community for ray tracing.
09:21Even in 2020, before the launch of PlayStation 5,
09:24we could see creators using ray tracing to add reflections
09:28and improved lighting to their games.
09:30At the same time, calculation costs for the rays were pretty high,
09:34so when we kicked off development of PS5 Pro later that year,
09:37one of our top priorities was finding ways to accelerate that computation.
09:42Those conversations with AMD led to our very nice feature set
09:46that's showing up here first.
09:49Note that there's two factors increasing the performance.
09:52It's not just that there are 67% more workgroup processors.
09:56Thanks to the new feature set, each one is more capable.
10:00It's difficult to quote an exact speedup
10:02because it's very dependent on specifics of usage,
10:05but we commonly see the calculation of the rays occurring
10:08at double or triple the speed of PlayStation 5.
10:12The most impactful new features in PS5 Pro
10:15relate to a new acceleration structure and stack management in hardware.
10:19There's a lot to unpack here.
10:22First, let's talk about the improvements related to the acceleration structure.
10:26In order to use ray tracing on PlayStation 5,
10:29you need to have data and system memory that describes your geometry,
10:33say, a million triangles worth.
10:35Then there's something called an intersection engine
10:38inside each of the workgroup processors
10:40that lets you check to see if a ray hit any of those million triangles
10:44and which it hit first.
10:46It would be too slow to test each ray individually
10:49against all million of those triangles,
10:51so there are also boxes in the data structure.
10:54These boxes let the ray tracing hardware
10:56more efficiently home in on the triangles that might be intersected.
11:00For example, we can see that the ray misses that upper left box,
11:04so there's no need to test the ray
11:06against any of the triangles contained within it.
11:09The boxes are actually in a hierarchy,
11:11starting with big ones and progressively reducing in size.
11:15Every time we hit a box, we test against the boxes nested within it
11:19until ultimately we reach some triangles we can test against.
11:24Together, those triangles and boxes are called the acceleration structure.
11:29On the original PlayStation 5,
11:31we used a type of acceleration structure called a BVH-4.
11:35BVH stands for Bounding Volume Hierarchy, meaning hierarchy of boxes,
11:40and the 4 indicates that the boxes are in groups of up to 4.
11:44The intersection engine can then check a ray
11:46against up to 4 boxes a cycle or one triangle.
11:50Generally speaking, there's a lot more checking against boxes.
11:53That's what primarily determines the performance of the ray calculations.
11:58PS5 Pro adds a BVH-8 option for the acceleration structure,
12:02where the boxes are efficiently encoded in groups of 8,
12:05and the intersection engine runs twice as fast.
12:08A ray can be tested against 8 boxes a cycle or 2 triangles.
12:13That doubling of the ray intersection speed
12:15has a great theoretical impact on ray tracing performance,
12:19but real-world cases also need a solution to the problem of divergence.
12:23That's what led us to our second big feature,
12:26which is stack management in hardware.
12:29Before I can explain that feature, though, I have to explain divergence.
12:33The workgroup processors handle groups of 32 or 64 items at once.
12:38They could be pixels or vertices or rays.
12:41This strategy is called SIMD, Single Instruction Multiple Data.
12:46So SIMD32 means the same operations are being performed on 32 items.
12:51This works very well when all 32 items are getting the same treatment.
12:55For example, 32 pixels from a triangle
12:58are all reading from locations in the same texture,
13:01and then those 32 pixels all need a lighting calculation.
13:05This is called coherent processing.
13:08There's a difficulty that arises with divergent processing,
13:11where some of the pixels need one action taken,
13:14and others need something else.
13:16In this case, it's quite possible that the processing takes twice as long.
13:20In the limit, if all 32 items need different handling,
13:24it's possible to be dozens of times slower.
13:27So when 32 rays are being processed together,
13:30the degree of divergence has a big impact on performance.
13:34Ray tracing can be fairly coherent.
13:36When we compute simple shadows from the sun, the rays are all parallel.
13:41But ray tracing can also get extremely divergent.
13:44If rays are bouncing off of a curved surface or a bumpy object,
13:48then potentially they're all heading in different directions.
13:52The shader code needed to handle divergent rays on PS5
13:56is reasonably complicated.
13:58Part of what the code has to do is manage a stack.
14:01The internal structure of the BVH is quite complex,
14:04and each of the 32 rays can be traversing it in a different fashion.
14:09On PS5 Pro, stack management is in hardware,
14:12which greatly simplifies the shader program.
14:15It's shorter, which means it's faster,
14:17and since it's handling fewer cases, there's less divergence,
14:21which further increases the speed of execution.
14:25You may have noticed that there are now two versions of the code that are needed,
14:29a longer one for PlayStation 5 and a simplified one for PS5 Pro.
14:33But this need for two versions only applies to shader programs
14:36that are calculating how rays travel through the scene.
14:39There's typically not too many of those.
14:41In fact, with some games, it's just a single shader program
14:44that needs the two versions.
14:47Putting that all together, it's great to have the higher performance,
14:50but even better, we're seeing more consistently high performance on PS5 Pro.
14:55That consistency comes from the improved handling of divergence.
14:59Performance testing on PS5 Pro tends to show a good boost
15:02for the coherent cases like shadows or reflections off of flat surfaces,
15:07but a much nicer boost for the divergent cases.
15:10The stack management hardware is really helping there.
15:13Having more consistent performance across a broad set of use cases
15:16will go a long way towards easing adoption of ray tracing.
15:20The final improvements to the GPU are for machine learning.
15:24There are a lot of uses for machine learning or AI,
15:27whichever term you prefer.
15:29Large language models and generative AI are quite interesting tech,
15:33but with ML, it's also possible to go after a very specific target,
15:37which is to give the games a graphical boost.
15:41One of the key ways that can work is that the game renders less.
15:44There's 8 million pixels on a 4K TV.
15:47If the game renders sparsely, say, a quarter of those pixels,
15:50it can do it a lot faster,
15:52and then the right neural network can intuit how to fill in those gaps
15:56and make a high-quality image.
15:59Another way to think about this, which is not quite as accurate,
16:02is that the game renders a smaller, lower-resolution image
16:06and then uses the neural network to upscale that image.
16:10This is called super resolution,
16:12and it's part of a whole family of strategies
16:14that reduce the work involved in rendering the game images.
16:18There's also frame generation or frame extrapolation,
16:21where a neural network inserts additional frames
16:24between the ones the game renders.
16:26That can really reduce the choppiness of low-frame rate games.
16:29Neural networks can also be used
16:31to turn noisy, staticky images into smooth ones,
16:34which is an issue that crops up frequently,
16:36particularly with optimized ray tracing.
16:39Having said that, super resolution is definitely the focus of our current efforts.
16:45It's important to note that high-quality upscaling
16:48changes the way we should be thinking about game rendering resolution.
16:52Let's imagine three games that are rendering at various resolutions.
16:56A reductionist view on this is that the 1440p game engine is the best,
17:00and that the 1080p game is clearly flawed.
17:03But after a super resolution pass,
17:05these are all ending up at 4K resolution for display.
17:09The conversation really needs to be about what's important—image quality.
17:14When game creators improve their lighting or materials or add ray tracing,
17:18then rendering each pixel can get more expensive,
17:21and the resolution will drop.
17:23They move up on this chart.
17:24And that's perfectly fine,
17:26as long as the upscaling technology is ensuring
17:28that the result is a crisp, beautiful image and not a blurry one.
17:33A different way to say that is that high-quality super resolution
17:37lets game creators focus on fewer, richer pixels
17:40and significantly improve the resulting image quality.
17:43It's a world where internal rendering resolution is not the primary concern.
17:48That's the world we want to be in.
17:50When we're using these strategies for graphics,
17:53the work isn't exclusively ML.
17:55The neural networks tend to be preceded by conventional processing
17:58and followed by some as well.
18:00It's the piece in the middle that's the neural network.
18:02More specifically, it's a type of neural network called a CNN,
18:06which stands for Convolutional Neural Network.
18:10Here's a simplified CNN for super resolution.
18:13It's not quite the one we use in PSSR,
18:15but close enough for the purposes of this conversation.
18:19You can see a lot of images.
18:21In the language of machine learning, they're called tensors,
18:24but basically they're images with many bytes of data per pixel.
18:28The colored arrows are the layers of the network.
18:31They're used for processing those images.
18:33There's quite a few of them as well.
18:35Let's zoom in on the first layer.
18:38Its input is a game image, 4K RGB.
18:41Perhaps it's a quickie upscale of what the game rendered.
18:44The first layer does lots of matrix math
18:47and then outputs another 4K image,
18:49now with substantially more information per pixel,
18:52maybe 16 bytes describing edges and the like
18:55that the neural network found in that input game image.
18:59The second layer then picks up the output of the first layer
19:02and does a lot more matrix math.
19:04The resulting image might reflect some deeper understanding
19:07about what the game rendered.
19:09There's also layers that reduce the size of the images.
19:12Downsizing to 540p or even 270p
19:15lets the neural network efficiently analyze
19:18larger scale structures in the input game image.
19:22As you can tell, there's a phenomenal amount of math going on here.
19:2610,000 operations are being performed on every input pixel.
19:30And we need to do that math in something like a millisecond.
19:35Consequently, ML hardware needs very high performance,
19:38typically hundreds of trillions of operations per second.
19:41Note that we don't call those teraflops
19:43because they're integer operations.
19:45Instead, we say dups.
19:48There were a number of early decisions we had to make.
19:51The very first was deciding where in the hardware
19:54we would put whatever would do all of that matrix math.
19:58Generally speaking, there's two options.
20:01One option is to put ML capabilities into a GPU.
20:04Of course, that's additional logic so the GPU gets larger.
20:08Or one can add an NPU, a neural processing unit.
20:12NPUs are brilliant at executing neural networks,
20:15but perhaps not so good with the preprocessing and postprocessing
20:18surrounding the CNNs.
20:21The deciding factor was the order of graphics processing within a frame.
20:25With this approach, most rendering in a frame
20:28is done at low resolution, maybe 1080p.
20:31Then machine learning is used to upscale to 4K.
20:34And there's no more rendering that can happen
20:36until that upscale finishes.
20:38So we need to process that neural network as quickly as possible,
20:41which is to say we need very powerful ML hardware.
20:45The more powerful, the better.
20:47That need for power is what pushed us towards using an enhanced GPU.
20:51The choice was either adding a large NPU
20:54or making more moderate enhancements to the GPU.
20:57The next big decision was where all of this technology
21:00was going to come from.
21:02When we were starting the PS5 Pro project in 2020,
21:05we knew that we would need performant ML hardware
21:08and a high-quality neural network for super resolution.
21:12But we're not looking for ML hardware that's generically high performance.
21:16We need something that's optimal for our specific kinds of workloads.
21:20And our typical workload is a lightweight CNN,
21:23something that can run in a millisecond or so
21:25and has a lot of little layers.
21:28Broadly speaking, you can license tech or purchase tech or build tech.
21:32But once you're licensing technology, that's what you're doing forever.
21:35So in 2020, despite the degree of effort required,
21:39we decided to build our own hardware and software technology.
21:44I'll start with the hardware.
21:46We made a set of targeted enhancements to the RDNA shader core
21:49and the surrounding memory systems.
21:51We're calling it custom RDNA,
21:53as it is custom hardware created to our design specifications,
21:57but within the overall RDNA architecture
22:00and, of course, implemented by the RDNA experts at AMD.
22:05Our target for the peak computational capabilities was 308 bit tops,
22:09which is to say 300 trillion operations a second
22:13using bytes as input.
22:15There's a lot of thought that needs to go into the details
22:18of exactly how that math functions,
22:20but adding that amount of raw performance is not terribly hard to do.
22:24The difficulty is memory access.
22:27PS5 Pro has 576 gigabytes a second of bandwidth to system memory.
22:32When we compare that bandwidth
22:34with the computational capability of 300 tops,
22:37it's clear that it's easy to be bandwidth limited.
22:40Let me give two examples.
22:42If we do a computation where we read a byte as input
22:45and then eventually write a byte, that's two bytes on the bus,
22:48and the balance point of the system is about 1,000 operations on that byte.
22:53If we're doing more than that, we have a well-designed system.
22:56The 300 tops is being meaningfully utilized.
22:59If we do less than that, though, we are bandwidth bound,
23:02and we're wasting some of that machine learning capability.
23:051,000 operations is a lot.
23:08Alternatively, to understand this issue,
23:11we can take a look at one of the layers of the network,
23:14say the second layer from the example I showed before.
23:17Let's imagine we have to read that input image
23:20and that it has 16 bytes of information for every pixel.
23:23That's about 128 megabytes on the system bus.
23:26Then we do our math, say a pointwise convolution,
23:29and write the output image,
23:31which is another 128 megabytes on the system bus.
23:34We are completely bandwidth bound.
23:37We're only using something like 3% of our potential 300 tops.
23:41So we're throwing out 97% of our performance,
23:44and those reads and writes are going to take half a millisecond
23:47just for this one layer.
23:49That's almost half of our budget for the entire CNN.
23:52One strategy for getting around these bandwidth issues
23:55is to fuse layers.
23:57The idea is to read up that input image,
23:59process the first layer,
24:01and then stick the results somewhere,
24:03maybe in fast on-chip memory,
24:05where the second layer can quickly and easily get access to them.
24:08As a result, we're reading from system memory once
24:11and writing once,
24:13but now we're processing two layers of the CNN
24:16and using something like 6% of our 300 tops.
24:19Still terrible, but an improvement.
24:22What we really want here is a fully fused network.
24:26That's the holy grail of neural network implementation.
24:29With a fully fused network,
24:31you're reading the input game image from system memory at the very start,
24:34processing all of the layers of the CNN internally on-chip,
24:38and then writing the results back to system memory at the very end.
24:41With bandwidth that low,
24:43that 300 tops number is finally meaningful.
24:46There's two problems we need to solve, though.
24:49The first relates to the amount of on-chip memory required.
24:53There's 8 million pixels in a 4K image.
24:56If each pixel needs 16 bytes, that's about 128 megabytes.
25:00In terms of on-chip memory, that's a lot.
25:03Luckily, we don't need to process the whole screen at once.
25:06We can subdivide the screen
25:08and take just a piece of it at a time through the neural network.
25:11Let's call that piece a tile.
25:13Problem solved, right?
25:16The difficulty we encounter is that as we are processing the tile,
25:19bad data creeps in from the edges,
25:22so we have to throw out part of our results.
25:24The smaller the tile is,
25:26the higher the proportion of data that has to be discarded.
25:29There are therefore effective limits to how small we can make the tile.
25:33And correspondingly, there's a certain amount of fast on-chip memory that's key
25:37if we are to achieve that goal of a fully fused network.
25:42The other problem we need to solve
25:44relates to the bandwidth of that on-chip memory.
25:47Our targets are incredibly high.
25:49We'd like many, many terabytes per second.
25:51When you think in those terms, everything seems small.
25:54For example, we could increase the size of the GPU's L2 cache
25:58and try to use that for the on-chip memory,
26:00but unfortunately the L2 bandwidth is just a few terabytes a second.
26:05This memory problem was the starting point for our custom design.
26:09From there, it's been almost a four-year journey.
26:12I'll hit a few high points of the hardware architecture,
26:15beginning with the memory we ended up using.
26:18It turns out we do have fast on-chip RAM in the RDNA architecture
26:22with an aggregate bandwidth of 200 terabytes per second.
26:26We just need to change our mindset.
26:29What we're doing on PS5 Pro is using the vector registers
26:32in the workgroup processors as that RAM.
26:35Each workgroup processor has four sets of registers,
26:38each 128k in size and with a bandwidth of over a terabyte per second.
26:4330 workgroup processors therefore give us 15 megabytes of memory
26:47at a combined bandwidth of 200 terabytes per second,
26:51which is to say several hundred times faster than system memory.
26:56Of course, the roadmap RDNA's architecture and instruction set
26:59required some modifications to take better advantage of that register RAM.
27:04We ended up adding 44 new shader instructions.
27:07Those instructions take that freer approach to register RAM access
27:11and also implement the math needed for the CNNs,
27:14which is primarily done in 8-bit precision.
27:18These instructions are specifically designed to operate in a takeover mode
27:22where each WGP processes the CNN for a single screen tile.
27:28By the way, the 300 TOPS number has been a real mystery
27:31since it leaked early this year.
27:33No one on the outside has been able to derive that number
27:36from the workgroup processor count and the GPU frequency.
27:39The secret is that there are instructions that perform 3x3 convolutions.
27:44Those use 9 multiplies and 9 adds for a total of 18 operations.
27:49And at that pretty common GPU frequency of 2.17 GHz,
27:53the performance really does work out to 300 TOPS.
27:56Here's the math.
27:59The CNNs also need 16-bit math, so there's a number of instructions for that.
28:03These instructions tend to be a bit simpler and more straightforward.
28:07We kept the chip area and the cost low for the 16-bit math
28:11simply by targeting lower 16-bit performance,
28:14because most of the processing in these CNNs can be done with 8-bit operations.
28:19As for 32-bit math, nothing in the CNNs particularly seems to need it,
28:23so we just left it as is.
28:26Our custom RDNA solution also involved a number of additional features,
28:31which I'm going to skip over so I can get to the other half of what we built,
28:34which is the neural network for super-resolution
28:37that we created to run on top of that custom RDNA architecture.
28:42PSSR is an original PlayStation design.
28:46The full name, of course, is PlayStation Spectral Super Resolution,
28:50and that Spectral is branding.
28:52It doesn't refer to any particular aspect of the algorithm.
28:55Just like we have Tempest for audio tech,
28:58we're using Spectral for our ML libraries for graphics.
29:02One of the project goals for PSSR is ease of adoption,
29:05so it uses essentially the same set of inputs as FSR or DLSS or XCSS.
29:11Those strategies use the pixel color of the current frame,
29:14but also depth information and motion vectors
29:17that give the flow of the pixels between the previous frame and the current frame.
29:21PSSR is not quite a drop-in replacement for the other strategies, but it's close.
29:27Having said that, PSSR is designed for consoles,
29:30so its primary use case is a little different from the others.
29:34PC games tend to render at a fixed resolution
29:37and with frame rate that varies based on scene complexity.
29:40Gaming monitors can handle that variable frame rate.
29:43So a typical PC game scenario is render at fixed resolution,
29:47upscale by a fixed 2 to 1 ratio, display at fixed resolution.
29:52In contrast, console games tend to have a frame rate that's fixed
29:55because they're displaying on a 60fps TV.
29:58What varies is the rendering resolution.
30:01If the scene is complex, then the rendering resolution is lower.
30:04If the scene is simpler, the rendering resolution is higher.
30:07Since the display resolution is usually fixed at 4K,
30:11PSSR needs to handle a continuously changing upscaling ratio.
30:15That scenario is primarily what we design for and train for.
30:20Of course, PC games are increasingly supporting variable rendering resolution,
30:25and all of these upscaling strategies can handle fixed upscaling ratios
30:29and variable upscaling ratios.
30:31I'm just pointing out that the focus with the PSSR project
30:34has been a little bit different.
30:36So with those goals in mind, starting in 2021,
30:39we considered a lot of types of neural networks.
30:42They were all recurrent networks,
30:44which is to say they feed some of the results back in as inputs.
30:48For what it's worth, we looked at flat networks
30:51that just run at the display resolution,
30:54networks that run at the lower rendering resolution
30:57with a little final bump up to display resolution,
31:00autoencoders that step down the resolution and step it back up,
31:04and UNETs that do the same but with different connectivity.
31:08And that's where we ended up.
31:10PSSR is a recurrent UNET.
31:13We also learned just how much work remains after a network is chosen.
31:17We did a lot of training and then did beta releases to select developers
31:21and got to see all kinds of issues cropping up
31:24once PSSR was actually integrated into games.
31:26And that required yet more training passes.
31:29Some of those issues were trivial.
31:31We found out that one game used a perfect blue in its sky,
31:34and PSSR had never seen perfect blue in its training.
31:37It had no idea what to do with it.
31:40Of course, some of the issues we encountered were much more complex.
31:44Looking back at the four years since we started this project,
31:47I'm so glad that we made the time-intensive decision
31:50to build our own technology.
31:52Results are good, and just as importantly,
31:55we've learned so much about how AI can improve game graphics.
31:59It can only make our future brighter.
32:02So that was the background and details of our improvements
32:06in these three key areas on PS5 Pro.
32:09The larger GPU, the advanced ray tracing, and the AI-driven upscaling.
32:13I'm going to restate those three somewhat,
32:16and then I'd like to take a moment to do something we very rarely do,
32:20which is talk about the future.
32:22Specifically, I'd like to talk about the future potential
32:25in each of these three key areas.
32:28First, there's rasterized rendering,
32:30by which I mean the conventional rendering strategies
32:32that were all we had up through PS4 Pro or so.
32:35There's not a whole lot of growth left here.
32:37It mostly has to come from making the GPU bigger or memory faster.
32:42Ray tracing is different.
32:44It's still early days for the technology,
32:46and I suspect we're in for several quantum leaps in performance
32:49over the next decade.
32:51Machine learning, though, has the greatest potential for growth,
32:55and that's an area we're beginning to focus on.
32:58Some of that growth in machine learning
33:00will come from more performant and more efficient hardware architectures.
33:04The ML architecture in PS5 Pro is quite good,
33:07but we did not, in fact, achieve that holy grail
33:10of a fully-fused network when running PSSR.
33:13It's close, but PSSR can't quite keep all of its intermediate data on-chip,
33:18and therefore does, to some degree, bottleneck on system memory access.
33:23We see definite room for improvement in future ML hardware.
33:27An additional source of future growth
33:29will come from more sophisticated neural networks.
33:32When fewer higher-quality pixels are combined with the right neural network,
33:36the result is richer graphics.
33:39One way to look at this is supportable upscaling ratio.
33:43If we're able to create quality imagery with a 2-to-1 upscale,
33:46and can then improve the neural network
33:48and reach the same image quality with a 3-to-1 upscale,
33:51then the effective power of the GPU has roughly doubled.
33:55And that stacks on top of whatever is being done
33:57to speed up rasterized rendering or ray tracing.
34:00There's enormous potential here.
34:03We also hope to be heading towards multiple uses of these CNNs within a frame,
34:07not just super-resolution, but also some of the other targets I was talking about,
34:11such as the denoising that's needed when doing optimized ray tracing.
34:16Through PS5 Pro, we've developed some good understanding
34:19of hardware design for machine learning, as well as neural network design,
34:23and we intend to continue this work with a pinpoint focus on games.
34:27Of course, as part of their broader strategy,
34:29AMD is pursuing many of the same goals.
34:33And so I have some very exciting news to share.
34:36We have begun a deeper collaboration with AMD.
34:40For the project name, we're taking a hint from AMD's Red and PlayStation's Blue.
34:45The code name is Amethyst.
34:47With Amethyst, we've started on another long journey
34:50and are combining our expertise with two goals in mind.
34:54The first goal is a more ideal architecture for machine learning,
34:58something capable of generalized processing of neural networks,
35:01but particularly good at the lightweight CNNs needed for game graphics,
35:05and something focused around achieving that holy grail of fully fused networks.
35:11In going after this, we're combining the lessons AMD has learned
35:14from its multi-generation RDNA roadmap,
35:16and SIE has learned from the custom work in PS5 Pro.
35:21But ML use in games shouldn't and can't be restricted to graphics libraries.
35:27We're also working towards a democratization of machine learning,
35:30something accessible that allows direct work in AI and ML by game developers,
35:35both for graphics and for gameplay.
35:38Amethyst is not about proprietary technology for PlayStation.
35:41In fact, it's the exact opposite.
35:44Through this technology collaboration,
35:46we're looking to support broad work in machine learning across a variety of devices.
35:52The other goal is to develop, in parallel, a set of high-quality CNNs for game graphics.
35:59Both SIE and AMD will independently have the ability to draw from this collection
36:03of network architectures and training strategies,
36:06and these components should be key in increasing the richness of game graphics,
36:10as well as enabling more extensive use of ray tracing and path tracing.
36:15We're looking forward to keeping you posted
36:17throughout what we anticipate to be a multi-year collaboration.
36:22Let me get back to PS5 Pro for one final moment.
36:26You've now heard a bit about our fairly intense last few years
36:29building this console and developing PSSR.
36:33There's been so much learning for us as we delve into these new technologies.
36:37But the payoff, as I said in my PS5 tech video a few years back, is in the games.
36:43And by now we know to expect the unexpected.
36:46It's an absolute guarantee that the development community
36:49will grab a hold of this technology
36:51and move in a direction that we never could have anticipated.
36:54Personally, I can't wait to see what they do with it.
36:58Thank you for your time today.

Recommended