Device Loss
Just like software running on the CPU can crash, software running on a GPU can crash too. GPU crashes manifest in the form of device losses and can happen for much the same reason they can happen on a CPU, except that they are much harder to track down and fix. One of the big issues with device losses is that it's a very generic term for something that can be cause for many different reasons and in most cases it's very unclear what exactly caused the crash. Part of the issue from our end is that there are extremely good ways to figure out why something crashed on the CPU, but GPUs present a much more opaque black box. GPUs also run asynchronously from the CPU, so by the time we discover that the GPU has crashed, the CPU has long moved on to different tasks, probably even a different frame. Working backwards from that to see how the GPU crashed is virtually impossible. This also means that remedies for device losses that work for one user, doesn't necessarily mean it'll help out another user because they might get device losses for an entirely different reason.
There are however a few things that are known to cause device losses:
- Outdated drivers. Before filing bug reports, please make sure you have up to date drivers
- Overclocking. Your overclock might be stable in other games, but every game is different in how it utilizes the hardware and overclocking can very easily lead to instability and random crashes. This isn't just GPU overclocking, but any overclocking or overvolting in your system as it's also possible for the CPU or memory to glitch resulting in bad data going into the GPU
- Software hooking into X-Plane. Software like ReShade or various performance overlays that directly hook into games to inject data can very easily cause GPU instability. Please run X-Plane without these and see if this resolves your device loss issue. One thing you can do if you are unsure if there is any additional software is to open the X-Plane Log.txt file and look for the "Vulkan Layers" line. Vulkan layers are additional pieces of software that get injected into the Vulkan runtime and get to alter the command stream produced by X-Plane. There are a couple of known good layers, if you see one starting with VK_LAYER_NV, VK_LAYER_AMD, VK_LAYER_KHRONOS, VK_LAYER_LUNARG or VK_LAYER_VALVE, you probably don't need to worry. These are mainly from the hardware vendors as well as people involved with Vulkan and are most likely good. Although not all software uses the layer mechanism to hook into Vulkan, for example the ASUS GPU Tweak Utility is known to cause crashes with X-Plane but it's not injecting itself through layers.
- Anything between your GPU and motherboard. PCI riser cables are known to introduce interference on the PCI bus, even when bought from a reputable supplier, and can cause device losses.
In general, the best case scenario to get device losses fixed is being able to reproduce them. With good reproduction steps it's possible to fix device losses, although even then it can depend on the specific environment and hardware setup. If you do have a reproducible device loss, please get in contact with us and provide as much detail as possible about your hardware setup, driver versions as well as your settings and the reproduction steps. The more you can narrow it down, the better.
If you happen to be using X-Plane on an Nvidia GeForce 10xx card or newer, there is an additional option to help track down device losses. Nvidia supports a tool called Aftermath, which can be used to collect postmortem crash information from the GPU itself. This comes at the cost of FPS however and is therefore disabled by default. To enable it, please run X-Plane with the --aftermath command line option. The error message for device losses with crash information collected from the GPU is different and will say "GPU Crash" instead of device loss. Please submit these crash reports using the auto crash reporter, as they contain usable information. Additionally, if you can stomach the FPS loss, there is also --heavy_aftermath which enables additional debug information. However, this will incur a heavy burden on your framerate and is probably only useful if you are able to reliably produce device losses in a specific scenario.