Improving Steam Client stability on Linux: setenv and multithreaded environments

Speaking of Steam, the Linux version of Valve’s gaming platform has just received a pretty substantial set of fixes for crashes, and Timothee “TTimo” Besset, who works for Valve on Linux support, has published a blog post with more details about what kind of crashes they’ve been fixing.

The Steam client update on November 5th mentions “Fixed some miscellaneous common crashes.” in the Linux notes, which I wanted to give a bit of background on. There’s more than one fix that made it in under the somewhat generic header, but the one change that made the most significant impact to Steam client stability on Linux has been a revamping of how we are approaching the setenv and getenv functions.

One of my colleagues rightly dubbed setenv “the worst Linux API”. It’s such a simple, common API, available on all platforms that it was a little difficult to convince ourselves just how bad it is. I highly encourage anyone who writes software that will run on Linux at some point to read through “RachelByTheBay”‘s very engaging post on the subject.

↫ Timothee “TTimo” Besset

This indeed seems to be a specific Linux problem, and due to the variability in Linux systems – different distributions, extensive user customisation, and so on – debugging information was more difficult to parse than on Windows and macOS. After a lot of work grouping the debug information to try and make sense of it all, it turned out that the two functions in question were causing issues in threads other than those that used them.

They had to resort to several solutions, from reducing the reliance on setenv and refactoring it with exevpe, to reducing the reliance on getenv through caching, to introducing “an ‘environment manager’ that pre-allocates large enough value buffers at startup for fixed environment variable names, before any threading has started”. It was especially this last one that had a major impact on reducing the number of crashes with Steam on Linux.

Besset does note that these functions are still used far too often, but that at this point it’s out of their control because that usage comes from the libraries of the operating system, like x11, xcb, dbus, and so on. Besset also mentions that it would be much better if this issue can be addressed in glibc, and in the comments, a user by the name of Adhemerval reports that this is indeed something the glibc team is working on.

15 Comments

  1. 2024-11-12 7:08 pm
    • 2024-11-13 9:19 am
      • 2024-11-13 9:56 am
      • 2024-11-13 11:32 am
  2. 2024-11-12 7:11 pm
    • 2024-11-12 7:20 pm
  3. 2024-11-12 7:29 pm
  4. 2024-11-12 8:56 pm
    • 2024-11-13 9:15 am
      • 2024-11-13 3:02 pm
        • 2024-11-13 3:10 pm
          • 2024-11-13 7:32 pm
  5. 2024-11-13 3:20 am
    • 2024-11-13 11:58 am
    • 2024-11-13 12:30 pm