Hello yuz-ers, what a month we’ve had! Great graphical changes, an amazing audio rewrite, preliminary work on LDN support, testing new OpenGL drivers, and plenty of fixes! Continue reading to find out more.
Since we’ve been teasing you for months (and we will continue to do so) in previous articles, you know Blinkhawk has been working on a bunch of miscellaneous GPU related fixes and performance improvements. While we will have more information in a dedicated article in the near future, please enjoy the following brief overview.
As the scope of the project grew, the team decided to split it in two parts, with the released first part focusing more on game fixes and improving accuracy. While that doesn’t mean part 1 doesn’t already improve performance, part 2 will focus exclusively on it.
Becoming playable! (Hades & Mario Golf: Super Rush)
This story started many moons ago.
The awesome devs working on Skyline Emulator finished implementing their NVDRV
service (NVIDIA Driver service) and they offered it to us, as it is much more accurate than our old implementation.
YO-KAI WATCH 4, before and after
The implemented changes in part 1 of Project Y.F.C. include:
Super Smash Bros. Ultimate
“World of Light” mode, Deltarune
, and several other games.Xenoblade Chronicles
games).The World of Light single-player mode is now playable! (Super Smash Bros. Ultimate)
A large list of critical changes like this, sadly, brings regressions with it. We’re working to resolve these regressions, but Blinkhawk is busy with IRL things, so expect a delay to see these changes implemented into Mainline while we sort things out and pave the way for part 2. (Remember: you can check the hovercard or the PR itself to see its merge status!)
Not all games are in perfect shape... yet! (The Witcher 3: Wild Hunt)
Not a name we’ve mentioned before, right? Well, it was a surprise for us as well! Maide is behind this wonderful gift: an almost complete rewrite of yuzu’s audio engine.
The main driving force behind this project was to resolve the multi-year old issues that have accumulated thanks to our very old initial audio implementation. yuzu was missing many playback features, such as audio effects. But the old code was too hard to maintain, making it impossible to keep up with the Switch’s updates over the years.
Here is a before and after of Metroid Dread
while underwater. You can notice the effects missing in the first recording, as if Samus was just out in the open.
Cleaner code allows developers to more easily stay up to date (the current implementation uses the changes introduced in firmware version 14.0.0), and should help introduce changes found by reverse engineering in the future.
The list of fixes is so large, it’s practically countless. While over 15 official issues were fixed, it’s impossible to know how many undocumented issues have been resolved.
We plan to have a dedicated article for Andio in the near future where we will dig deeper into the changes introduced.
All users can enjoy the benefits of Project Andio, available in both Mainline and Early Access!
Yes, bunnei, it’s London.
For those that didn’t catch on to the name, Project London is our work to get LDN (Local Wireless) support into yuzu, including hosted rooms for online connectivity.
Such rooms, and their corresponding user interface, are what Tobi has been working on.
The implementation is based on Citra, and while it’s already perfectly functional, it won’t be available for users until the network backend is ready.
Thankfully, as you can see above, internal testing has been positive under ideal conditions, so the “only” remaining work is tweaking and bug fixing. If only it were that simple…
An online service like this requires the transfer of network packets, so ENet is added as a dependency.
Stay tuned for future improvements on this work in progress!
It has been an eventful month for a long-maligned corner of the yuzu codebase, generally referred to in hushed tones among developers as CoreTiming. CoreTiming may be the cause of many timing-related emulation issues in yuzu.
While reviewing Project Andio, Blinkhawk noted that one of his longstanding open pull requests, which implemented a more precise version of CoreTiming, fixed some audio corruption regressions in emulated games, and even fixed some games that were previously having issues with freezing, such as Mario Strikers: Battle League
.
With the new audio code being almost ready to go at that point, the team decided to get this pull request rebased and merged so we could have a new audio system without any regressions.
The new CoreTiming implementation would use multiple host threads to wait for events, and should have been much better in theory.
However, it didn’t fix everything. Maide found that there were still some lingering issues with audio callbacks not looping as precisely as they needed to. In yuzu, looping events previously used CoreTiming to reschedule themselves for an exact number of milliseconds after their execution, instead of when they were intended to be executed. This caused significant drifting and issues with the new audio renderer. The usual victims were the most affected, users running CPUs with only 4 threads.
To fix this, Maide reworked the way looping events were handled. Now, CoreTiming automatically computes the correct time to reschedule a looping event, making the implementation significantly more precise for those types of events. With the change to looping events in, and noticing that the other changes Blinkhawk had added were causing serious regressions, the team opted to remove the multi-threaded host CoreTiming implementation, and then most of Blinkhawk’s new implementation entirely, as it was still causing serious performance problems for a subset of users.
But that wasn’t all that changed for timing this month.
Intel Alder Lake (Gen. 12) CPU users on Windows have long been reporting noticeable clock drift in Super Smash Bros. Ultimate
, but it got a lot worse since the NVNFlinger rewrite a few months ago.
As previously reported, the resident bunnei rabbit mostly fixed this issue in a follow-up pull request
which restored the (inaccurate) behaviour of the old implementation, and the clock drift issue improved significantly for those users.
Maide, not content to just improve audio, discovered that the way yuzu’s NVNFlinger implementation was waiting on buffers would drift,
due to the same problem that was previously fixed in CoreTiming!
Instead of reimplementing the fix here as well, he modified NVNFlinger to use a timing callback, which fixed the drifting issues in SSBU, and also resolved many longstanding issues with frametime inconsistency.
This also provides a significant performance boost in many games due to keeping the frametime presentation consistent, and allows Fire Emblem Warriors: Three Hopes
to be playable.
Time to smash those attack buttons (Fire Emblem Warriors: Three Hopes)
Finally, BreadFish64 implemented a way to read the exact TSC frequency of the host CPU.
The TSC
(timestamp counter) is a high precision timer measuring the number of base clock ticks performed by an Intel or AMD processor since boot.
CoreTiming uses this value to emulate the ARM physical count register, which performs a similar role as the TSC
for ARM devices, like the Switch.
Getting the exact TSC
frequency, as opposed to just estimating it, allows CoreTiming to avoid drifting due to mismatch between the host frequency, which depends on your CPU and the guest clock frequency, which is fixed to 19.2MHz.
More precision and faster boot times are never a bad thing!
While using the new debugger on games and homebrew, comex spotted an issue causing yuzu to miss breakpoints in code that had already been run, or hit breakpoints which had already been deleted. Merry investigated and discovered an inaccuracy in Dynarmic’s caching of code blocks. Fixing the cache clearing and calculating block hashes correctly fixes the issues with breakpoints being hit.
comex also observed an issue with watchpoints, where resuming execution after breaking on a watchpoint would seemingly fail to resume with the correct state. byte[] investigated the issue and found that it happened when Dynarmic failed to update the PC register inside watchpoint callbacks. Merry fixed this issue again by completely rewriting Dynarmic’s support for watchpoints, now breaking immediately when necessary and avoiding almost all of the performance penalty of enabling watchpoints. Nice!
byte[] has also been hard at work fixing various kernel issues and inconsistencies throughout June, and this month is no exception.
This time around, while searching for the source of a mysterious freezing bug in Super Mario Galaxy
, he rewrote the entire scheduler and brought it in line with the current state of the art in reverse engineering of the Switch kernel.
This fixed issues in a number of games, but most notably fixed the freezing issues users had in Mario Strikers: Battle League
(once you use an intro-skipping mod), and allowed Mononoke Slashdown
to boot for the first time.
While preparing the new scheduler for release, byte[] also noticed an inefficiency in the way guest threads were being emulated. To fix it, he changed the process of starting fibers to use support for C++ language features, and significantly simplified the implementation.
Last month, Behunin contributed a new GPU queue implementation intended to improve the performance of submission handling from the emulated game.
Some time after this, freezing issues in Fire Emblem: Three Houses
started cropping up.
After a long trail of hunting, byte[] thought the issue had been found and fixed by pull requests #8483 and #8538, but more careful debugging revealed that the cause of the freeze was unfortunately from the new GPU queue implementation!
Morph stepped up and reverted the use of the new queue implementation,
finally fixing the issue, at least for now.
Xenoblade Chronicles 3
, one of the most anticipated Switch releases in a while, released, and to the dismay of the yuzu community, would crash on boot when using Vulkan.
Due to differences in time zones, Maide was our first developer to lay hands on the new game, with byte[] lagging behind.
Maide found that there were some Vulkan shaders that crashed the GPU driver when they were compiled. yuzu is different from most Vulkan programs, and it directly generates shaders in binary format to respond to the needs of the game’s shaders, which can cause problems when the way yuzu translates a shader is different from the way a GLSL compiler would translate it.
byte[] quickly helped Maide identify the sources of these shader compilation crashes and, together, fixed both FSwizzleAdd
and ConvertDepthMode
, allowing users to run the game in Vulkan.
Thank you Night for the amazing pics! (Xenoblade Chronicles 3)
We’re aware that AMD Radeon GPUs running on Windows still experience crashes with Vulkan at boot. This is because those drivers lack support for the VK_FORMAT_R16G16B16_SFLOAT
texture format.
We implemented an alternative path emulating this format with a similar one to solve this issue.
We’ll cover it more deeply in the next progress report, along with several other bugfixes for this amazing game.
Another of the various issues affecting this new release is an absurd level of memory usage when running in OpenGL.
yuzu, in the past, cleaned shader sources after dealing with the shader.
Now, for some reason, this game manages to skip that check.
In order to improve the ridiculous memory usage, byte[] implemented
glDetachShader
, a more “official” way to achieve the same result.
While this doesn’t solve the issue entirely, testing shows a 5GB reduction in RAM usage from just a single code line addition.
Let’s stay on the subject of GPU emulation for a bit longer. In a past Progress Report, we explained how toastUnlimited implemented a status check system to ensure good Vulkan compatibility when opening yuzu for the first time.
The original implementation worked by running a small Vulkan instance at boot, detecting if it crashed, and saving the result in the configuration file. On the next boot after the crash, yuzu informs the user and locks itself to only offer OpenGL. This required two boots to get the whole picture, and a manual intervention by the user was needed to re-enable Vulkan as an option, pressing a button in yuzu’s configuration.
This new approach uses a child process that is only tasked with starting the Vulkan loader. If the child process crashes, the parent process marks the currently running instance of yuzu as not being Vulkan compatible. This has the benefit of only having to run yuzu once to detect the current status. If the user solves the issue (updating the drivers or any Vulkan layer application causing issues), only restarting yuzu is needed as nothing is changed in the configuration files now.
This change helps users identify issues and stop potential crashes, but the general recommendations still apply: manually update your GPU drivers (never trust Windows Update), and keep any application that runs an overlay or records the screen updated to their latest version.
Moving on to more specific game fixes not related to GOAT Xenoblade Chronicles 3
, our resident Kirby clone, Morph, implemented a texture format
MONSTER HUNTER RISE
has been asking for: ASTC_10x6_UNORM
.
That’s right, another ASTC
format. Your GPU will hate you while decoding it.
This doesn’t solve the rendering bugs we face with this game, but it makes things look a bit better!
While Flatpak is not the recommended way for our users to enjoy their favourite Switch titles on Linux, due to lower performance and some missing desktop integration features, it is a great option for many Linux users who have Flatpak installed by default and want a low-friction way to get access to yuzu. It has been the preferred choice by Steam Deck users since its release. As the reports from users rolled in, the team fixed some notable Flatpak-exclusive regressions this month.
But why were these issues Flatpak-exclusive, and not found in the regular Linux AppImage builds? Flatpak enables extra checks in the C++ standard library, which are aimed to catch buffer overflow errors before they happen, intending to help with debugging. Unfortunately, if a check fails, it causes yuzu to instantly crash, which makes it more difficult to debug the issue from yuzu’s log files alone.
The switch to Vulkan by default caused games which used any CPU-based rendering to crash. If a game wants to render an image to the screen from the CPU, instead of the GPU, it will first convert the image into an optimized layout that the Switch GPU understands, and then ask the GPU to render the optimized image. To deal with this, yuzu undoes this layout conversion and uploads the data to the host GPU for presentation. byte[] discovered that due to the size of the optimized layout and the unoptimized layout being different, a subspan used in unoptimizing the layout would overflow and cause the check to fail. The fix was simple: just use the optimized size for the converted layer, since it would always be larger.
It wouldn’t be a proper yuzu pull request without a seemingly unrelated regression.
Pokémon: Let's Go, Pikachu!/Eevee!
had a strange performance regression caused by byte[]’s previous change, where the framerate when attempting to play with Pikachu or Eevee would drop to approximately 7 frames per second.
byte[] investigated it and found that using the larger size caused the process of re-optimizing a frame for the game to read back to be much slower, since it was now dealing with a much larger image.
He then fixed it by using different sizes for the optimized and unoptimized images,
finally putting these foolish performance issues to rest.
Project Andio introduced a few new regressions in the Flatpak builds as well. One of these was fixed in the pull request itself before it was merged.
When decoding buffers which were input from the emulated game, it was possible for a span operation to overflow.
Maide fixed this by being more careful about handling the sample buffers when decoding input.
From user reports, there were still crashes, and Maide found an issue with the DepopPrepare
command, causing another overflowing span.
Fixing this finally allowed users to enjoy the Flatpak builds once more.
Flatpak Linux users rejoice!
Flatpak isn’t the only one to get a piece of the cake, AppImage receives some love too!
Vulkan detection is not only a Windows issue, it can also happen in free land.
toastUnlimited found out that the libQt5Multimedia
library causes issues with Vulkan in AppImage builds.
Since the library is used, excluding libwayland-client
is the workaround in place for now.
We’ll evaluate the user response we get from this change and consider keeping it or removing libQt5Multimedia
altogether.
Docteh started working on improving the environment variables used in our build process to give AppImages a proper title bar. Once this work is finished, the title bar should look identical to Windows builds.
A unique feature of the Nintendo Switch is the capability to use infrared cameras installed in the right Joy-Con. The main function of the cameras is to detect shapes and measure the distance to objects, but it can also be used to transmit a feed to a screen, letting you turn your Joy-Con into a heat-seeking monstrosity. Fox-2!
Interested in adding this awesome feature to yuzu, and providing full support for games like Game Builder Garage
or the Nintendo Labo
collection, german77 emulated the clustering processor
required to let the games access the camera on the Joy-Cons or any camera the user wants to provide, even if it is a desktop capture obtained from OBS Studio.
Users wanting to play with this setting can find it in Emulation > Configure… > Controls > Advanced tab > Infrared Camera
.
This work doesn’t include the moment processor required by 1-2-Switch!
just yet.
Steam Deck users reported having issues when using external controllers, but not while using the integrated Deck controls. toastUnlimited hopped onto the issue and found that the reason is the included prerelease SDL2 version we’ve been using. Reverting to a slightly older version solved the issue.
A recent and very interesting community effort is to focus on adding online functionality to single player games, allowing for fun co-op opportunities not possible in the original game.
Super Mario Odyssey
recently received a mod that allows for this online functionality, and the one thing keeping yuzu from supporting it was the on-screen keyboard lacking a way to input an IP address!
Luckily, Morph was on the case and implemented the necessary symbols
to input the required IPv4 addresses by the online mod.
Link4565 implemented some required fixes in yuzu’s network services to improve compatibility with this awesome mod. Thank you very much!
Have fun ruining Bowser’s wedding!
A small regression from the input rewrite revealed itself just now. The WebApplet’s input bit was assumed incorrectly, causing user input to be completely ignored. Thankfully, Morph found the reason and implemented the fix.
Last month, Docteh renamed the status bar’s DOCKED status (redundancy, yeah!). For consistency, this dumb writer decided to do the same for the Controls configuration window, for consistency.
Sometimes something “functioning as designed” can look stalled from the user’s point of view due to how the UX (user experience) is presented, ask any new Linux user for example. In this case, when loading an application, the shader progress bar at boot would appear stuck if a game was started with no previous pipeline cache or if a homebrew was booted. Since this leads to confusion, byte[] decided that it’s better to reset the status bar than let it remain stuck until the program finishes loading. As said before, the devil is in the details.
One of the available configurable hotkey options in Emulation > Configure… > General > Hotkeys
is Audio Volume Up/Down
.
Users have requested to tune the curve in how volume is changed so that it’s more sensitive at lower values.
Human hearing senses volume logarithmically instead of linearly, so it makes perfect sense.
german77 added incremental steps
the closer you are to 0% volume as a way to better copy how our flesh and bone bodies perceive the world.
A beautiful feature of tightly integrated systems is their wonderful control over suspend and resume, and the Steam Deck is no exception. If you’ve ever experienced issues with suspend and resume, you know what I mean. Experienced developer devsnek wants to help yuzu take advantage of this feature over the course of three different pull requests. This includes emulating the actual suspend/resume mechanic of the Switch, as some games make use of it as one of their gameplay features. With these changes, users can suspend their games by simply pressing the power button of the Deck, exactly like on a Switch.
For those of us living in remote places, suffering from terrible ISPs, or both (FML), we have fantastic news!
toastUnlimited reduced the size of each yuzu download by around 24MB by only including what specifically belongs to yuzu
in its source.
Those interested in building the bundled source that comes with the installer must now run git submodule update --init --recursive
in order to be able to compile the project.
This is a new section to communicate and discuss new relevant bugs, fixes, and findings related to specific hardware that can affect the user experience within yuzu.
We mentioned last month how the 516 series of drivers is detrimental to Maxwell and Pascal users, making Vulkan unstable.
We’re still debugging the issue, as it isn’t easy to catch, but a possible cause is suspected: GPU accelerated ASTC
texture decoding.
If you own a Maxwell or Pascal GPU, must remain on the latest driver update, and want to test if you can make Vulkan stable again, try disabling Accelerate ASTC Texture Decoding
in Emulation > Configure… > Graphics
.
Please report your results on our forums or Discord server.
Another known issue caused by the 516 series of drivers is some funny flickering on trees in KOEI TECMO games like Hyrule Warriors: Age of Calamity
.
Day time party! NVIDIA Vulkan Left: 516.94 & Right: 512.95 (Hyrule Warriors: Age of Calamity)
These issues could either be regressions or undocumented behaviour changes, possibly caused after following the API specification more rigorously.
There are also performance related issues affecting users with G-SYNC/FreeSync displays, causing low framerates (usually games get stuck at 24-30 FPS). We have a few ways to bypass this issue:
View > Single Window Mode
.Exclusive Fullscreen
from Emulation > Configure.. > Graphics > Fullscreen Mode
. Then just play your games in fullscreen by pressing F11.The root of the problem is caused by some bad combination of running a Qt window inside another window, and NVIDIA’s way of detecting the framerate of windowed applications. Removing any of the two factors solves the low framerate while still taking advantage of Variable Refresh Rate.
Hell froze over, pigs learned to fly, and starting with the Windows driver version 22.7.1, AMD introduced a completely new OpenGL driver, making Radeon cards on Windows viable options to use both APIs, not just cool kid Vulkan. Performance is close to 100% higher, or more in some titles, and many rendering bugs are fixed. But why write about it, let the numbers do the talking:
Wow! That’s a lot of numbers, let’s try to make it easier to digest:
Thanks toastUnlimited!
We’re not experts in the benchmarking area, so hopefully the above graphs help.
Above are results of an RX 6600 and a GTX 1660 SUPER running a few games in OpenGL and Vulkan. 22.6.1 represents the old infamous OpenGL driver, 22.7.1 is of course the new driver. Linux is represented by Mesa 22.1.3 running radeonsi with the amdgpu kernel module for OpenGL, and RADV for Vulkan. NVIDIA is running its latest (at the time of writing) Windows driver. Remaining relevant hardware used is a 5600X and 16GB of RAM at 3600MHz. The RX 6600 was running at PCIe 4.0 8x with Smart Access Memory enabled, although that won’t make a difference, more on that later. Operating systems used are Windows 11 and Manjaro Linux, both up to date and on their respective default stable branches. yuzu is on Mainline 1112, with GPU accuracy set to normal to make GPU driver bottlenecks easier to measure, 1X resolution multiplier, and Default value for Anisotropic Filtering.
A single regression under investigation and reported to AMD aside (Xenoblade Chronicles 2
crashes loading Abble’s Fountain, the measuring spot, could be caused by some driver thread crash), performance is now very close to Vulkan numbers, be it either from AMD or NVIDIA.
It’s now perfectly valid for a Radeon user to switch to OpenGL if a specific game requires it, like for example with Xenoblade Chronicles 3
, or a Unity/Unreal Engine based game (SHIN MEGAMI TENSEI V
).
As a bonus, while not being very stable, the SPIR-V shader back-end can be used on games with “simple” shaders like Super Smash Bros. Ultimate
or Super Mario Odyssey
, making shader building much more tolerable when compared to GLSL, giving it a performance much closer to the Nvidia-only GLASM.
Another lesson learned from this is that some games, like Legend of Zelda: Breath of the Wild
, just outright prefer NVIDIA’s mature OpenGL driver. Ara ara.
Lastly, to end this Red Team section. In the past, we reported a way to defeat RDNA2’s overcorrecting power manager in order to get decent framerates. This method, while simple, has a downside: It’s an overclock. Or at least counts as one.
We found an alternative that should be more globally applicable. The trick this time is to make the driver force high clocks on a more important section of the GPU when speaking about emulation performance in general: GPU VRAM. All this while keeping the warranty in check.
The process is simple, make the integrated video encoder work in the background while yuzu (or any other emulator) runs.
This is easily achieved from Radeon Software by going to Settings > Record & Stream
and enabling Instant Replay
.
Intel/Linux owners should be able to reach similar results by instead using the Xbox Game Bar or setting OBS to keep a buffer.
After this, in yuzu enable Exclusive Fullscreen from Emulation > Configure.. > Graphics > Fullscreen Mode
.
Then just play your games in fullscreen by pressing F11.
This step can be avoided if you also enable Record Desktop
, but please keep in mind this will increase your power consumption even while idling.
The performance gains are the same as with the previous overclocking method, up to 73% in GPU bottlenecked titles.
RX 6500 XT and RX 6400 users, since you lack a video encoder in the first place, refer to our original method mentioned at the start, or ask for a refund.
Intel recently announced that their Windows driver for Gen. 9, Gen. 9.5, and Gen. 11 GPUs (that is any CPU based on the 14nm Skylake architecture and all its many marketing renames, plus Ice Lake) is now in “legacy software support”, which in layman’s terms means they are officially dead. While this doesn’t affect yuzu immediately, any new Vulkan features we add in the future could potentially break functionality in a similar way to what happened with old AMD GCN hardware last year. This leaves integrated Intel GPU users with a single alternative, Linux, which offers support for even older hardware. For example, an ancient HD Graphics 4400 can run yuzu with the Mesa drivers.
Users should consider learning how to use Linux if a hardware upgrade is not a viable option in the near future, Mesa has always offered better performance for Intel GPUs.
Part 2 of Project Y.F.C.
is a bit delayed for now, real life issues as previously mentioned, but its feature list and expected progression is laid out.
Project London
is progressing in a healthy fashion, we loved the internal testing done so far.
And a possibility has just recently started to open for even better GPU performance in the (not so near) future.
GPU fastmem
is one of the features that Rodrigo had to leave for later, before passing the torch and moving onto “greener sides”.
The main roadblock holding GPU fastmem
back was driver support, which is now a mostly solved issue. We only need to start talks with the AMD, Intel, and the AMD Linux kernel module developers to ask for some increased limits.
Once those obstacles are out of the way, yuzu should, for example, be able to take partial advantage of Resizable BAR/Smart Access Memory, helping reduce PCIe bottlenecks, and should help improve particle rendering to make GPU accuracy a less critical performance setting.
No pressure, Blinkhawk!
That’s all folks! This one turned out to be longer than expected. Thank you for staying until the end, and we hope to see you again next month! Thank you NazD for the summary pic!
Advertisement
Advertisement