The major takeaway from this is that Rust will be making environment setters unsafe in the next edition. With luck, this will filter down into crates that trigger these crashes (https://github.com/alexcrichton/openssl-probe/issues/30 filed upstream in the meantime).
But that won't actually fix the underlying problem, namely that getenv and setenv (or unsetenv, probably) cannot safely be called from different threads.
It seems like the only reliable way to fix this is to change these functions so that they exclusively acquire a mutex.
I have a different perspective: the underlying problem is calling setenv(). As far as I'm concerned, the environment is a read-only input parameter set on process creation like argv. It's not a mechanism for exchanging information within a process, as used here with SSL_CERT_FILE.
And remember that the exec* family of calls has a version with an envp argument, which is what should be used if a child process is to be started with a different environment — build a completely new structure, don't touch the existing one. Same for posix_spawn.
And, lastly, compatibility with ancient systems strikes again: the environment is also accessible through this:
Environment variables are a gigantic, decades-old hack that nobody should be using... but instead everyone has rejected file-based configuration management and everyone is abusing environment variables to inject config into "immutable" docker containers...
I think there's a narrow window, at least in some programming languages, when environment variables can be set at the start of a process. But since it's global shared state, it needs to be write (0,1) and read many. No libraries should set them. No frameworks should set them, only application authors and it should be dead obvious to the entire team what the last responsible moment is to write an environment variable.
I am fairly certain that somewhere inside the polyhedron that satisfies those constraints, is a large subset that could be statically analyzed and proven sound. But I'm less certain if Rust could express it cleanly.
Mutating argv is fine for how it is usually done. That is, to permute the arguments in a getopt() call so that all nonoptions are at the end.
It is fine because it is usually done during the initialization phase, before starting any other thread. setenv() can be used here too, though I prefer to avoid doing that in any case. I also prefer not to touch argv, but since that's how GNU getopt() works, I just go with it.
Once the program is running and has started its threads, I consider setenv() is a big no no. The Rust documentation agrees with me: "In multi-threaded programs on other operating systems, the only safe option is to not use set_var or remove_var at all.". Note: here, "other operating systems" means "not Windows".
Yes, and if there were "setargv()" or "getargv()" functions, they'd have the same issues ;) … but argv is a function parameter to main()¹, and only that.
¹ or technically whatever your ELF entry point is, _start in crt0 or your poison of choice.
> but argv is a function parameter to main()¹, and only that.
> ¹ or technically whatever your ELF entry point is, _start in crt0 or your poison of choice.
Once you include the footnote, at least on linux/macos (not sure about Windows), you could take the same perspective with regards to envp and the auxiliary array. It's libc that decided to store a pointer to these before calling your `main`, not the abi. At the time of the ELF entry point these are all effectively stack local variables.
I mean, yes, we're in "violent agreement" there. It's nice that libc squirrels away a copy and gives you a `getenv()` function with a string lookup, but… setenv… that was just a horrible idea. It's not really wrong to view it as a tool that allows you to muck around with main()'s local variables. Which to me sounds like one should take a shower after using it ;D
(Ed.: the man page should say "you are required to take a shower after writing code that uses setenv(), both to get off the dirt, but also to give you time to think about what you are doing" :D)
No amount of locking can make the getenv API thread-safe, because it returns a pointer which gets invalidated by setenv, but lacks a way to release ownership over it and unblock setenv safely (or to free a returned copy).
So setenv's existence makes getenv inherently unsafe unless you can ensure the entire application is at a safe point to use them.
C could provide functions to lock/unlock a mutex and require that any attempt to access the environment has to be done holding the mutex. This would still leave the correctness in the hands of the user, but at least it would provide a standard API to secure the environment in a multi threaded application that library and application developers could adopt.
chdir is thread-safe, but interacting with the current directory in any context other than parsing command-line arguments is still nearly always a mistake. Everything past a program's entry point should be working exclusively in absolute paths.
Welcome to the C standard library, the application of mutable global state to literally everything in it has to be the most consistent and predictable feature of the language standard.
It's the same problem with global vars, but at a machine scope. The real solution here would be for the OS to have a better interface to read and write env vars, more like a file where you have to get rw permission (whether that's implemented as a mutex or what).
This is neither an OS nor a machine scope problem. The environment is provided by the OS at startup. What the process does with it from there on is its own concern.
> The environment is provided by the OS at startup.
That's part of the design of the OS. How the OS implements this is primitive, and so it leaves it up to every language to handle. The blog mentions the issue is with getenv, setenv, and realloc, all system calls. To me, that sounds like bad OS design is causing issues downstream with languages, leaving it up to individual programmers to deal with the fallout.
None of these 3 functions is a system call. open(), mmap(), sbrk(), poll(), etc. are system calls. What you're referring to is C library API, which as Go has shown (both to its benefit and its detriment) is optional on almost all operating systems (a major exception being OpenBSD.)
If you really want to lose some sanity I would recommend reading the man page for getauxval(), and then look up how that works on the machine level when the process is started. Especially on some of the older architectures. (No liability accepted for any grey hair induced by this.)
Neither getenv, setenv nor realloc are system calls, they all are functions from C stdandard library, some parts of which for historical reasons are required to be almost impossible to use safely/reliably.
In the React world, the only times I've seen dangerouslySetInnerHTML consistently used is for outputting string literal CSS content (and this one is increasingly rare as build tools need less handholding), string literal JSON content (for JSON+LD), and string literal premade scripts (i.e. pixel tags from the marketing content). That's not to say there's no danger surface there, but it's not broadly used as a tool outside of code that's either really bad or really exhaustively hand-tuned.
I've only really seen dangerouslySetInnerHTML used while transitioning from certain kinds of server side rendering to React. There is still lots of really old internal tools in ancient html out there.
React doesn't have a tag and attribute sanitizer built in, so having non-js-programmers edit JSX isn't especially safe anyways, as an img or a href could exfiltrate data. If it were they could just block out an innerHTML attribute. A js programmer can get around it by setting up a ref and then using the reference to set innerHTML without the word dangerously appearing.
> A js programmer can get around it by setting up a ref and then using the reference to set innerHTML without the word dangerously appearing.
If DOM nodes during the next render differ from what react-dom expects (i.e. the DOM nodes from the previous render), then react-dom may throw a DOMException. Mutating innerHTML via a ref may violate React's invariants, and the library correctly throws an error when programmers, browser extensions, etc. mutate the DOM such that a node's parent unexpectedly changes.
There are workarounds[1] to mutate DOM nodes managed by React and avoid DOMExceptions, but I haven't worked on a codebase where anything like this was necessary.
In the Rust std, `set_var` and `remove_var` will correctly require using an `unsafe {}` block in the next edition (2024). The documentation does now mention the safety issue but obviously it was a mistake to make these functions safe originally (albeit a mistake even higher level languages have made).
There is a patch for glibc which makes `getenv` safe in more cases where the environment is modified but C still allows direct access to the environ so it can't be completely safe in the face of modification https://github.com/bminor/glibc/commit/7a61e7f557a97ab597d6f...
Because the std implementation can not force synchronisation on the libc, so any call into a C library which uses getenv will break... which is exactly what happened in TFA: `openssl-probe` called env::set_var on the Rust side, and the Python interpreter called getenv(3) directly.
But the standard implementation could copy the environment at startup, and only uses its copy.
And the library's use of setenv is clearly a bug as setenv is documented to be not threadsafe in the C standard library. So that would take care of that problem.
If you clone the environment at startup, then you get a situation where code in the same binary can see different values depending if it uses libc or Rust's std. It's also no longer the same environment as in the process metadata.
Using a copy by default may have worked if it was designed as such before Rust 1.0, but Rust took the decision to expose the real environment and changing this now would be more disruptive than marking mutations as unsafe.
In general, no, because of FFI. In special circumstances, yes, but this isn't really important because the libc implementation is trivial (on all platforms that matter, envp is a char** to strings formatted as KEY=VALUE, set_env(key, value) is equivalent to allocating a new KEY=VALUE string and finding the index of a key if it exists or appending to the array).
Under the hood the pointer is initialized by the loader, in a special place in executable memory. Most of the time, the loader gets the initial environment variable list by looking at argv* (try reading past the end of the null separator, you'll find the initial environment variables).
It would be possible for a language to hack it such that on load they initialize their own env var set without using libc and be able to safely set/get those env vars without going through libc, and to inherit them when spawning child processes by reading the special location instead of the standard location initialized by your platforms' loader/updated by libc. But how useful is a language with FFI that's fundamentally broken since callees can't set environment variables? (probably very useful, since software that relies on this is questionably designed in the first place)
If you wanted to make a bullet proof solution, you would specify the location of an envp mutex in the loaders' format and make it libc's (or any language runtime) problem to acquire that mutex.
It can only synchronize if everything using is Rust's functions. But that's not a given. People can use C libraries (especially libc) which won't be aware of Rust's locks. Or they could even use a high level runtime with its own locking but then they'll be distinct from Rust's locks.
The only way to coordinate locking would be to do so in libc itself.
libc does do locking, but it's insufficient. The semantics of getenv/setenv/putenv just aren't safe for multi-threaded mutation, period, because the addresses are exposed. It's not really even a C language issue; were you to design a thread-safe env API, for C or Rust, it would look much different, likely relying on string copying even on reads rather than passing strings by reference (reference counted immutable strings would work, too, but is probably too heavy handed), and definitely not exposing the environ array.
The closest libc can get to MT safety is to never deallocate an environment string or an environ array. Solaris does this--if you continually add new variables with setenv it just leaks environ array memory, or if you continually overwrite a key it just leaks the old value. (IIRC, glibc is halfway there.) But even then it still requires the application to abstain from doing crazy stuff, like modifying the strings you get back from getenv. NetBSD tried adding safer interfaces, like getenv_r, but it's ultimately insufficient to meaningfully address the problem.
The right answer for safe, portable programs is to not mutate the environment once you go multi-threaded, or even better just treat process environment as immutable once you enter your main loop or otherwise finish with initial process setup. glibc could (and maybe should) fully adopt the Solaris solution (currently, IIRC, glibc leaks env strings but not environ arrays), but if applications are using the environment variable table as a global, shared, mutable key-value store, then leaking memory probably isn't what they want, either. Either way, the best solution is to stop treating it as mutable.
Yep. GetEnvironmentStrings and FreeEnvironmentStrings are probably even more noteworthy as they seem to substitute for an exposed environ array, though they push more effort to the application.
It can't ensure synchronization because any code using libc could bypass the sync wrapper. In particular, Rust lets you link C libs which wouldn't use the Rust stdlib.
Because it can still race with C code using the standard library. getenv calls are common in C libraries; the call to getenv in this post was inside of strerror.
you've gotten a lot of answers which say the same thing, but which I don't think answer your question:
synchronization methods impose various complexity and performance penalties, and single threaded applications which don't need that would pay those penalties and get no benefit.
Unix was designed around a lightweight ethos that allowed simple combining of functions by the user on the command line. See "worse is better", but tl;dr that way of doing things proved better, and that's why you find yourself confronting what it doesn't do.
Well it was better in the short term but is worse in the long term. In particular, the error handling situation is generally atrocious, which is fine for interactive/sysadmin use but much worse for serious production use.
Even if C stdlib maintainers are resistant against making setenv multi-thread safe, at a minimum there should be a new alternative thread-safe API defined, whether within POSIX or defining a defacto standard and forcing POSIX to adopt it over time. If instead of explaining why nothing could be done was spent fixing this problem, a new thread-safe API could have replaced the old setenv which could have been deprecated and removed from many software projects.
I'm also not convinced by Musl's maintainer that it can't be fixed within Musl considering glibc is making changes to make this a non-issue.
The biggest problem is not the absence of a thread safe API, it's the existence of this:
extern char **environ;
As long as environ is publicly accessible, there's no guarantee that setenv and getenv will be used at all, since they're not necessary.
If you're willing to get rid of environ, it's pretty trivial to make setenv and getenv thread safe. If not, then it's impossible, although one could still argue that making setenv and getenv thread safe is at least an improvement, even if it's not a complete solution (aka don't let the perfect be the enemy of the good).
> aka don't let the perfect be the enemy of the good
Exactly my point. Over time *environ would disappear, at least from the major software projects that everyone uses (assuming it's even in use in them in the first place).
Guess that would also require some locking for all the exec() functions that don't take the environment as a parameter or that search PATH for the executable.
I'll take existence proofs [1] over personal insults but YMMV. You also may want to be careful assuming the expertise of people on this forum. Some people here are quite technical.
Its like a rite of passage to be hit by an environment related bug on linux, which is mysteriously less a problem on other unix's. Which is sorta funny given how pragmatic Linus and the kernel are about fixing POSIX bugs by making them not happen, while glibc is still lagging here decades after people tried to at least make the problem better. Sure there is all the crap around TZ/etc, but simply providing getenv_r() and synchronizing it with setenv() and warning during compile/link on getenv() would have killed much of the problem. Nevermind, actually doing a COW style system where the env pointer(s) are read only. Instead the problem is pushed to the individual application, which is a huge mistake, because application writers are rarely aware of what their dependencies are doing. Which is the situation I found myself in many many years ago. The closed source library vendor, at the time, told us to stop using that toy unix clone (linux).
> environment related bug on linux, which is mysteriously less a problem on other unix's.
How do you figure? The problem isn't the implementation, it's the API. setenv(), unsetenv(), putenv(), and especially environ, are inherently unsafe in a multithreaded program. Even getenv_r() can't really save you, since another thread may be calling setenv() while the (old) value of an env var is being copied into the provided buffer. Sure, a getenv_r() fixes the case where you get something back from getenv(), and then another thread calls setenv() and makes that memory invalid, but there's no way to protect the other calls breaking the API.
There are ways to mitigate some of the issues, like having libc hold a mutex when inside getenv()/setenv()/putenv()/unsetenv(), but there's still no way for libc to guarantee that something returned by getenv() remains valid long enough for the calling code to use it (which, right, can be fixed by getenv_r(), which could also be protected by that mutex). But there's no good way to make direct access to environ safe. I suppose you could make environ a thread-local, but then different threads' views of the environment could become out of sync, permanently (and you could get different results between calling getenv_r() and examining environ directly).
Back-compat here is just really hard to do. Even adding a mutex to protect those functions could change the semantics enough to break existing programs. (Arguably they're already broken in that case, but still...)
I think you would have to change the API to return a copy of the string as the get_env result which the caller is responsible for free-ing or the env implementation would have to ensure returned values from get_env are stable and never change which is effectively a memory leak.
> Even getenv_r() can't really save you, since another thread may be calling setenv() while the (old) value of an env var is being copied into the provided buffer.
Won't that depends on the libc implementation. For example, maybe setenv writes to another buffer, then swaps pointers atomically; wouldn't that work?
Most of the rest of the problem here seems to be the development environment. They're testing on a remote machine in an Amazon data center and using Docker. This rig fails to report that a process has crashed. Then they don't have enough debug symbol info inside their container to get a backtrace. If they'd gotten a clean backtrace reported on the first failure, this would have been obvious.
Yup, it's mostly just the story and tools we used to get ourselves out of a mess that was made harder by some decisions made earlier -- the tests were running in a container with stripped symbols (we're going to ship symbols after this, no reason to over-optimize), our custom test runner failed to report process death (an oversight).
There's no reason setenv should have been called here. The `openssl-probe` library could simply return the paths to the system cert files and callers could plug those directly into the OpenSSL config.
Oversights all around and hopefully this continues to improve.
It really does not look like a good idea to setenv() . The very notion is quite terrifying. Messing with a bunch of globals, that other code knows about as well? Nuh-uh.
The thing is, the OP people weren't doing that at all, it was some irresponsible library maintainers. If your code does that, you have to include something like the "surgeon general's warning" everywhere: "CAREFUL: USING THIS LIBRARY MAY CAUSE TERMINAL CRASHES".
This reminded me of that whole "12-factor app" movement, which several of my former coworkers had really bought into. One of the "factors" is that apps should be configured by environment variables.
I always thought this was kinda foolish: your configuration method is a flat-namespace basked of stringly-typed values. The perils of getenv()/setenv()/environ are also, I think, a great argument against using env vars for configuration.
Sure, there aren't always great, well-supported options out there. I prefer using a configuration file (you can have templated config and a system that fills in different values for e.g. dev/stage/prod), and I'll usually use YAML, despite its faults and gotchas. There are probably better configuration file formats, but IMO YAML is still significantly better than using env vars.
I have similar reservations about env vars. I dislike how they can be read from anywhere--it interrupts the ability to reason about a function's behavior from its signature and makes impure plenty of functions that could otherwise have been pure.
If there were a language feature that let me mark apps such that during any process env vars are not writable and are readable only once (together, in a batch, not once per var), I'd use it everywhere.
getenv() is perfectly fine, it's setenv() that is the problem. Which in theory this wouldn't be using since the env would be set up prior to starting that mystical app.
But yes, a flat namespace, with string values, shared as a free-for-all with who knows what libraries and modules you're loading… that's not a good idea even if it didn't have safety issues in setenv().
Great article about digging into a non-obvious bug. This one had it all! Intermittent bug, architecture-specific, hidden in a dependency, rust, the python GIL, gettext. Fantastic stuff.
These kinds of detailed troubleshooting reports are the closest thing you can get to having to do it yourself. Thanks to the authors. It's easy to say "don't use X duh" until a dependency relies on it, and how were you supposed to know?
What is the rationale for libc not making setenv/getenv thread safe? It does seem rather odd given how environment variables are explicitly defined as shared between threads in the same process!
It doesn't seem it would take much to do it efficiently, even retaining the poor getenv() pointer-returning API (which could point to a thread local buffer). The coordination between getenv and setenv could be very lightweight - spinlock vs mutex.
If both Rust and C have independent standard libraries loaded into the same process, each would have an independent set of environment variables. So setting a variable from Rust wouldn't make it visible to the C code, which would break the article's usecase of configuring OpenSSL.
The only real solution is to have the operating system provide a thread-safe way of managing environment variables. Windows does so; but in Linux that's the job of libc, which refuses to provide thread-safety.
The crash in the article happened when Python called C's getenv. Rust could very well throw away libc, but then it would also be throwing away its great C interop story. Rust can't force Python to use its own stdlib instead of libc.
> and environment variables require an operating system
Is that true? It's just a process global string -> string map, that can be pre-loaded with values before the process starts, with a copy of the current state being passed to any sub-process. This could be trivially implemented with batch processing/supervisory programs.
Sure, there's a broader concept here, which doesn't require any operating system. But any alternate string->string map you define won't answer to C code calling getenv, won't be passed to child processes created with fork, won't be visible through /proc/$PID/environ, etc.
> They did, it's called core. But it assumes no operating system at all, and environment variables require an operating system.
I think there's some confusion here. The C standard library is an abstraction layer that exists to implement standard behavior on hardware. It's entirely unrelated to the existence of an OS. Things like "/proc/$PID/environ" have nothing to do with C.
There are many standard libraries, for embedded, that implement these things, like getenv, on bare metal [1].
Standard C libraries exist to implement functionality. It does not define how to implement the functionality. That's the whole point of C: it's an abstraction that has very little requirements.
The implementation of environment variables don't require an OS. If they made this "core", they could trivially implement the concept.
Well, it's used by the OS when exec-ing a new process, but at least the Linux syscall for that takes the environment as an explicit parameter. So it could be managed in whatever way by the runtime until execve() is called.
Linux is an unusual platform in that it allows you to call into it via assembly. Most other platforms require you to go through libc to do so. It's not really in Rust's hands.
This is not unusual at all. Windows allowed it for years before Linux came along. It was also true of some other *nix systems - IIRC, Ultrix (DEC) allowed this, and so did Dynix (Sequent).
*BSD allows it too, or used as of 2022.
What is unusual about Linux is that it guarantees a syscall ABI, meaning that if you follow it, you can make a system call "portably" across "any" version of Linux.
Sure, I’m speaking about platforms that are relevant today, not historical ones. Windows, MacOS, {Free,Open,Net}BSD, Solaris, illumos, none of these do.
It's quite easy to find out the actual situation on this since Go decided to do it their way. Last I checked, OpenBSD is the only OS where they go through libc, but I haven't really kept up.
Yep, in 2022 it finally started using libc on *BSD too.
But ... there's a difference between being able to do direct syscalls via asm, and them being portable across kernel versions, which is what this subthread was about.
Granted, most people want version portability, but still on a technical level, it's not the same thing.
No, my comment was about what APIs a platform considers to be their stable, external API. That you can technically call them anyway (except for ones like OpenBSD that actively check and prevent you) doesn't mean you're not doing something unsupported.
It would be a tremendous amount of work, and would take years. Meanwhile, the problems are avoidable. It's not exactly the "rust way" to just remember and avoid problems, but everything in language design is compromises.
> Why use Eyra? It fixes Rust's set_var unsoundness issue. The environment-variable implementation leaks memory internally (it is optional, but enabled by default), so setenv etc. are thread-safe.
I think glibc made the same trade-off. It makes sense for most types of programs, but there's certainly a lot of classes of programs that wouldn't take it.
It is weird that I got this right before Rust did.
Because I use structured concurrency, I can make it so every thread has its own environment stack. To add to a new environment, I duplicate it, add the new variable, and push the new enviroment on the stack.
Then I can use code blocks to delimit where that stack should be popped. [1]
This is all perfectly safe, no `unsafe` required, and can even extend to other things like the current working directory. [2]
IMO, Rust got this wrong 10 years ago when Leakpocalypse broke. [3]
Yet another person is burned by calling setenv() in a multi-threaded context. There really needs to be a big warning banner on the manpage for setenv() that warns about this because it seems like a far more common problem than you would expect.
It's time to move beyond this attitude and make things safe by default. For example, Solaris has a safer version of setenv().
"It is ridiculous that this has been a known problem for so long. It has wasted thousands of hours of people's time, either debugging the problems, or debating what to do about it. We know how to fix the problem." https://www.evanjones.ca/setenv-is-not-thread-safe.html
One of the major differences between X Window and the win32 GUI APIs is that the windows one builds in thread safety, and it cannot be removed. This means that you pay the price of mutexes and the like (what the windows world likes to call "critical sections"), even if you have a single threaded GUI. X Window, on the other hand, decided to do nothing about threads at all, leaving it up to the application.
30 years after these decisions were made, most sensible people do single threaded GUIs anyway (that is, all calls to the windowing API come from a single thread, and all redraws occur synchronously with respect to that thread; this does not block the use of threads functioning as workers on behalf of the GUI, but they are not allowed to make windowing API calls themselves).
Consequently, the overhead present in the win32 API is basically just dead-weight, there to make sure that "things are safe by default".
There's a design lesson here for everyone, though precisely what it is will likely still be argued about.
Yet 30 years later people are calling setenv()/getenv() from different threads even though "it is known" that it crashes. For whatever reason the lesson from GUIs doesn't apply here.
Judging from a lot of the comments in this thread, the idea that there could even be parts of the *POSIX API* that are not thread-safe seems like an idea that hasn't even occured to a lot of (younger?) programmers ...
You could wrap setenv in a mutex, but that's not good enough. It can still be called from different processes, which means you'd need to do a more expensive and complex syncing system to make it safe.
That ballons out to other env related methods needing to honor the synchronization primitive in order for there to be a semblance of safety.
However, you still end up in a scenario where you can call
setenv
getenv
and that would be incorrect because between the set and the get, even with mutexes properly in place and coordinated amongst different applications, you have a race condition where your set can be overwritten by another application's set before your get can run. Now, instead of actually making these functions safe you've buried the fact that external processes (or your own threads) can mess with env state.
The solution is to stop using env as some sort of global variable and instead treat it as a constant when the application starts. Using setenv should be mostly discouraged because of these issues.
How does an external process mess with env state? As far as I know, you pass the environment when doing the execvpe() and then you cannot touch it from outside of the process anymore.
You're correct. Parent comment is inaccurate. The problem is that a different library in the same process can use getenv without locking (or without locking the same lock as your code)
Of course you can. Mutexes are system objects, so it's not a huge problem to sync across processes, if you really have to (is it really expected that one process can set env vars inside another process?).
Making global state, especially state that has no reason to be modified or even read very often like the env, thread safe is a trivial issue, well studied and understood. Could an intern do it? Probably not. Could literally any maintainer of a standard C library? Easily.
This is much more of a culture problem preventing such obvious flaws from being recognized as such.
Side-note: your set-then-get example is a theoretical problem in search of a use case. Why would you ever want to concurrently set an env var and expect to be guaranteed to read that same value? And even if this is a real thing that applications really use, exposing a new function to sync anything on the env mutex is, again, trivial. So, if you really needed that, you could do
That doesn't solve anything. You could be using a library (perhaps a closed-source one) that doesn't use these hypothetical lockenv()/unlockenv() functions.
This needs to be fixed inside libc, but there's no way to do so completely without breaking backward-compatibility.
That is a technical solution. What is your solution to the much more serious social problem of adding this check to every codebase in existence? What points of leverage do you have?
I am not sure making things safe by default is a good idea. This always comes with a cost. Thats also the reason why basic data types (array, dictionaries, etc) are generally not thread safe… because its usually not needed or handled on a much higher level.
Its a different story for languages/environments that are supposed to be safe by default and where you have language features that ensure safety (actors, optionals etc) but not for something like libc which has a standard it has to conform to and like 100 years of history.
The problem with `setenv` is that people expect one process to have one set of environment variables, which is shared across multiple languages running in that process. This implies every language must let its environment variables be managed by a central language-independent library -- and on POSIX systems, that's libc.
So if libc refuses to provide thread-safety, that impacts not just C, but all possible languages (except for those that cannot call into C-libraries; as those don't need to bother synchronizing the environment with libc).
In some cases this is true. In the case of setting and getting env vars, it is not. There is no comceivable reason for making a process that spends any significant portion of its runtime calling setenv() or getenv(). Even if those calls were a thousand times slower than today, it would still be a non-issue.
By definition, a "reentrant function" is a function that may be invoked even when it has not returned yet from a previous invocation.
So a non-reentrant function is a function that may not be invoked again between a previous invocation and returning from that invocation.
When a function may be invoked from different threads, then it is certain that sometimes it will be invoked by a thread before returning from a previous invocation from a different thread.
Therefore any function that may be invoked from different threads must be reentrant. Otherwise the behavior of the program is unpredictable. Reentrant functions may be required even in single-thread programs, when they may be invoked recursively, or they may be invoked by signal handlers.
An implementation of "malloc" may be reentrant or it may be non-reentrant.
Old "malloc" implementations were usually non-reentrant because they used global variables for managing the heap. Such "malloc" functions could not be used in multi-threaded programs.
Modern "malloc" implementations are reentrant, either by using only thread-local storage or by using shared global variables to which some method for concurrent access is implemented, e.g. with mutual exclusion.
Therefore I do not think that anyone has bothered to implement a signal-safe malloc, as this is likely to be complicated.
Allocating memory in a signal handler makes no sense in a well designed program, so not being allowed to use malloc and related functions is not a problem.
The problem is that applications sometimes need to set environment variables which will be read by libraries in the same process. This is safe to do during startup, but at no later times.
Ideally all libraries which use environment variables should have APIs allowing you to override the env variables without calling setenv(), but that isn't always the case.
I’d argue that libraries shouldn’t read environment variables at all. They’re passed on the initial program stack and look just like stack vars, so the issue here is essentially the same as taking the address of a stack variable and misusing it.
Just like a library wouldn’t try to use argv directly, it shouldn’t use envp either (even if done via getenv/setenv)
> The problem is that applications sometimes need to set environment variables which will be read by libraries in the same process. This is safe to do during startup, but at no later times.
No, the problem is that libraries try to do this at all. Libraries should just have those APIs you mention, and not touch env vars, period. If you, the library user, really want to use env vars for those settings, you can getenv() them yourself and pass them to the library's APIs.
Obviously we can't change history; there are libraries that do this anyway. But we should encourage library authors to (in the future) pretend that env vars don't exist.
The place where it makes sense for a library to read environment variables is where the program is not written to use that specific library. For example, I can link a program whose author has never heard of TCMalloc against TCMalloc rather than the system malloc, and then configure TCMalloc via environment variables. This does not require modifying a single line of code, while manually forwarding configuration onto the allocator would. Another common example is configuring sanitizers. Not having to do anything other than pass another command-line switch to the compiler is one of the things that makes them really painless to use.
I do think you'd be hard-pressed to find a situation where a program calling setenv() to configure a library actually makes sense. It's a pretty strong sign that someone made a bad decision. People will, however, make mistakes in API design.
If env vars don't exist, that makes it much harder (and more likely impossible) for users to modify library/application behavior at run time.
I agree with you that it would be much better if, when libA needs to set behavio Foo in libB, it called libB:setBehavior (Foo) rather than setenv ("LibBehavior", "Foo")
But let's not throw the baby out with the bathwater.
Go ahead and write lots of mutable global statics. But when your program crashes randomly and you need my help to debug and it is, once again, a global mutable then you have to perform a walk of shame.
the problem is not linux, not mutable global state or resources and not libc.
the problem is not getting time at work to do things properly. like spotting this in GDB before the issue hit, because your boss gave you time to tirelessly debug and reverse your code and anything it touches....
there is too much money in halfbaked code. sad but true.
It definitely is the current libc. That one's proven by systems which do not have the same problem. Then the next layer problem is trying to pretend we can get everyone to pay attention and avoid bugs in code instead of forcing interfaces and implementations where those bugs are not possible.
Rust literally bakes data race safety into the language. While it does not resolve general race conditions, thread safety issues which cause memory unsafety (which an UAF or dangling pointer would be) are very much within its remit.
The major takeaway from this is that Rust will be making environment setters unsafe in the next edition. With luck, this will filter down into crates that trigger these crashes (https://github.com/alexcrichton/openssl-probe/issues/30 filed upstream in the meantime).
But that won't actually fix the underlying problem, namely that getenv and setenv (or unsetenv, probably) cannot safely be called from different threads.
It seems like the only reliable way to fix this is to change these functions so that they exclusively acquire a mutex.
I have a different perspective: the underlying problem is calling setenv(). As far as I'm concerned, the environment is a read-only input parameter set on process creation like argv. It's not a mechanism for exchanging information within a process, as used here with SSL_CERT_FILE.
And remember that the exec* family of calls has a version with an envp argument, which is what should be used if a child process is to be started with a different environment — build a completely new structure, don't touch the existing one. Same for posix_spawn.
And, lastly, compatibility with ancient systems strikes again: the environment is also accessible through this:
Which is, of course, best described as bullshit.Environment variables are a gigantic, decades-old hack that nobody should be using... but instead everyone has rejected file-based configuration management and everyone is abusing environment variables to inject config into "immutable" docker containers...
Indeed, environment variables should be used to configure child processes, not to configure the current process, for non-shell programs, IMHO.
Note that Java, and the JVM, doesn't allow changing environment variables. It was the right choice, even if painful at times.
I think there's a narrow window, at least in some programming languages, when environment variables can be set at the start of a process. But since it's global shared state, it needs to be write (0,1) and read many. No libraries should set them. No frameworks should set them, only application authors and it should be dead obvious to the entire team what the last responsible moment is to write an environment variable.
I am fairly certain that somewhere inside the polyhedron that satisfies those constraints, is a large subset that could be statically analyzed and proven sound. But I'm less certain if Rust could express it cleanly.
Sure is painful (mostly when writing tests where the environment variables aren't abstracted in some way).
But I think it was actually possible to hack around up until Java 17.
> As far as I'm concerned, the environment is a read-only input parameter set on process creation like argv.
Mutating argv is actually quite popular, or at least it used to be.
Mutating argv is fine for how it is usually done. That is, to permute the arguments in a getopt() call so that all nonoptions are at the end.
It is fine because it is usually done during the initialization phase, before starting any other thread. setenv() can be used here too, though I prefer to avoid doing that in any case. I also prefer not to touch argv, but since that's how GNU getopt() works, I just go with it.
Once the program is running and has started its threads, I consider setenv() is a big no no. The Rust documentation agrees with me: "In multi-threaded programs on other operating systems, the only safe option is to not use set_var or remove_var at all.". Note: here, "other operating systems" means "not Windows".
Yes, and if there were "setargv()" or "getargv()" functions, they'd have the same issues ;) … but argv is a function parameter to main()¹, and only that.
¹ or technically whatever your ELF entry point is, _start in crt0 or your poison of choice.
> but argv is a function parameter to main()¹, and only that.
> ¹ or technically whatever your ELF entry point is, _start in crt0 or your poison of choice.
Once you include the footnote, at least on linux/macos (not sure about Windows), you could take the same perspective with regards to envp and the auxiliary array. It's libc that decided to store a pointer to these before calling your `main`, not the abi. At the time of the ELF entry point these are all effectively stack local variables.
I mean, yes, we're in "violent agreement" there. It's nice that libc squirrels away a copy and gives you a `getenv()` function with a string lookup, but… setenv… that was just a horrible idea. It's not really wrong to view it as a tool that allows you to muck around with main()'s local variables. Which to me sounds like one should take a shower after using it ;D
(Ed.: the man page should say "you are required to take a shower after writing code that uses setenv(), both to get off the dirt, but also to give you time to think about what you are doing" :D)
Oops, didn't mean to come across as disagreeing at all, more of a "yes, and <once you include the footnote>".
Ah, after rereading I think I accidentally read that in, sorry
No amount of locking can make the getenv API thread-safe, because it returns a pointer which gets invalidated by setenv, but lacks a way to release ownership over it and unblock setenv safely (or to free a returned copy).
So setenv's existence makes getenv inherently unsafe unless you can ensure the entire application is at a safe point to use them.
C could provide functions to lock/unlock a mutex and require that any attempt to access the environment has to be done holding the mutex. This would still leave the correctness in the hands of the user, but at least it would provide a standard API to secure the environment in a multi threaded application that library and application developers could adopt.
The underlying problem is that setenv is mutable global state and should never have existed
The process's current directory is mutable global state as well, and yet chdir(2) is thread-safe.
chdir is thread-safe, but interacting with the current directory in any context other than parsing command-line arguments is still nearly always a mistake. Everything past a program's entry point should be working exclusively in absolute paths.
Yeah if you chdir() in a multithreaded program, all cwd-relative file accesses in other threads are fucked.
As well as absolute paths, it’s ok to work with descriptor-relative paths using openat() and friends.
Welcome to the C standard library, the application of mutable global state to literally everything in it has to be the most consistent and predictable feature of the language standard.
It's the same problem with global vars, but at a machine scope. The real solution here would be for the OS to have a better interface to read and write env vars, more like a file where you have to get rw permission (whether that's implemented as a mutex or what).
This is neither an OS nor a machine scope problem. The environment is provided by the OS at startup. What the process does with it from there on is its own concern.
> The environment is provided by the OS at startup.
That's part of the design of the OS. How the OS implements this is primitive, and so it leaves it up to every language to handle. The blog mentions the issue is with getenv, setenv, and realloc, all system calls. To me, that sounds like bad OS design is causing issues downstream with languages, leaving it up to individual programmers to deal with the fallout.
> getenv, setenv, and realloc, all system calls
None of these 3 functions is a system call. open(), mmap(), sbrk(), poll(), etc. are system calls. What you're referring to is C library API, which as Go has shown (both to its benefit and its detriment) is optional on almost all operating systems (a major exception being OpenBSD.)
If you really want to lose some sanity I would recommend reading the man page for getauxval(), and then look up how that works on the machine level when the process is started. Especially on some of the older architectures. (No liability accepted for any grey hair induced by this.)
ed.: https://lwn.net/Articles/631631/
Neither getenv, setenv nor realloc are system calls, they all are functions from C stdandard library, some parts of which for historical reasons are required to be almost impossible to use safely/reliably.
People get trained to ignore the ____UNSAFE_payattention__nevermindthatthisappears50timesinthisfile___ blocks and prefixes
This also shows up in web frameworks where Vue has the v-html directive and react has dangerouslySetInnerHTML. Vue definitely has it better.
In the React world, the only times I've seen dangerouslySetInnerHTML consistently used is for outputting string literal CSS content (and this one is increasingly rare as build tools need less handholding), string literal JSON content (for JSON+LD), and string literal premade scripts (i.e. pixel tags from the marketing content). That's not to say there's no danger surface there, but it's not broadly used as a tool outside of code that's either really bad or really exhaustively hand-tuned.
Code syntax highlighting libraries for react use dangerouslySetInnerHTML.
I've only really seen dangerouslySetInnerHTML used while transitioning from certain kinds of server side rendering to React. There is still lots of really old internal tools in ancient html out there.
React doesn't have a tag and attribute sanitizer built in, so having non-js-programmers edit JSX isn't especially safe anyways, as an img or a href could exfiltrate data. If it were they could just block out an innerHTML attribute. A js programmer can get around it by setting up a ref and then using the reference to set innerHTML without the word dangerously appearing.
> A js programmer can get around it by setting up a ref and then using the reference to set innerHTML without the word dangerously appearing.
If DOM nodes during the next render differ from what react-dom expects (i.e. the DOM nodes from the previous render), then react-dom may throw a DOMException. Mutating innerHTML via a ref may violate React's invariants, and the library correctly throws an error when programmers, browser extensions, etc. mutate the DOM such that a node's parent unexpectedly changes.
There are workarounds[1] to mutate DOM nodes managed by React and avoid DOMExceptions, but I haven't worked on a codebase where anything like this was necessary.
[1] https://github.com/facebook/react/issues/11538#issuecomment-...
The reference is used to operate on the subtree when wrapping libraries like CodeMirror https://github.com/uiwjs/react-codemirror/blob/master/core/s... React leaves it alone if the children doesn't change.
innerHTML is useful when there is a trusted HTML source, which is becoming more popular with stuff like HTMX and FastHTML.
In the Rust std, `set_var` and `remove_var` will correctly require using an `unsafe {}` block in the next edition (2024). The documentation does now mention the safety issue but obviously it was a mistake to make these functions safe originally (albeit a mistake even higher level languages have made).
https://doc.rust-lang.org/stable/std/env/fn.set_var.html
There is a patch for glibc which makes `getenv` safe in more cases where the environment is modified but C still allows direct access to the environ so it can't be completely safe in the face of modification https://github.com/bminor/glibc/commit/7a61e7f557a97ab597d6f...
Why requiring unsafe when the std implementation could take care of the synchronisation?
Because the std implementation can not force synchronisation on the libc, so any call into a C library which uses getenv will break... which is exactly what happened in TFA: `openssl-probe` called env::set_var on the Rust side, and the Python interpreter called getenv(3) directly.
But the standard implementation could copy the environment at startup, and only uses its copy.
And the library's use of setenv is clearly a bug as setenv is documented to be not threadsafe in the C standard library. So that would take care of that problem.
If you clone the environment at startup, then you get a situation where code in the same binary can see different values depending if it uses libc or Rust's std. It's also no longer the same environment as in the process metadata.
Using a copy by default may have worked if it was designed as such before Rust 1.0, but Rust took the decision to expose the real environment and changing this now would be more disruptive than marking mutations as unsafe.
Is it possible to skip libc completely or would this introduce too many portability concerns?
In general, no, because of FFI. In special circumstances, yes, but this isn't really important because the libc implementation is trivial (on all platforms that matter, envp is a char** to strings formatted as KEY=VALUE, set_env(key, value) is equivalent to allocating a new KEY=VALUE string and finding the index of a key if it exists or appending to the array).
Under the hood the pointer is initialized by the loader, in a special place in executable memory. Most of the time, the loader gets the initial environment variable list by looking at argv* (try reading past the end of the null separator, you'll find the initial environment variables).
It would be possible for a language to hack it such that on load they initialize their own env var set without using libc and be able to safely set/get those env vars without going through libc, and to inherit them when spawning child processes by reading the special location instead of the standard location initialized by your platforms' loader/updated by libc. But how useful is a language with FFI that's fundamentally broken since callees can't set environment variables? (probably very useful, since software that relies on this is questionably designed in the first place)
If you wanted to make a bullet proof solution, you would specify the location of an envp mutex in the loaders' format and make it libc's (or any language runtime) problem to acquire that mutex.
* there are platforms where this isn't true
It's not just libc, it's any C or C++ library that calls getenv or setenv.
Specifically, any C or C++ library that calls setenv (despite documentation that says that setenv is not threadsafe).
It can only synchronize if everything using is Rust's functions. But that's not a given. People can use C libraries (especially libc) which won't be aware of Rust's locks. Or they could even use a high level runtime with its own locking but then they'll be distinct from Rust's locks.
The only way to coordinate locking would be to do so in libc itself.
libc does do locking, but it's insufficient. The semantics of getenv/setenv/putenv just aren't safe for multi-threaded mutation, period, because the addresses are exposed. It's not really even a C language issue; were you to design a thread-safe env API, for C or Rust, it would look much different, likely relying on string copying even on reads rather than passing strings by reference (reference counted immutable strings would work, too, but is probably too heavy handed), and definitely not exposing the environ array.
The closest libc can get to MT safety is to never deallocate an environment string or an environ array. Solaris does this--if you continually add new variables with setenv it just leaks environ array memory, or if you continually overwrite a key it just leaks the old value. (IIRC, glibc is halfway there.) But even then it still requires the application to abstain from doing crazy stuff, like modifying the strings you get back from getenv. NetBSD tried adding safer interfaces, like getenv_r, but it's ultimately insufficient to meaningfully address the problem.
The right answer for safe, portable programs is to not mutate the environment once you go multi-threaded, or even better just treat process environment as immutable once you enter your main loop or otherwise finish with initial process setup. glibc could (and maybe should) fully adopt the Solaris solution (currently, IIRC, glibc leaks env strings but not environ arrays), but if applications are using the environment variable table as a global, shared, mutable key-value store, then leaking memory probably isn't what they want, either. Either way, the best solution is to stop treating it as mutable.
A safe API would look a lot like Windows' GetEnvironmentVariable and SetEnvironmentVariable
https://learn.microsoft.com/en-us/windows/win32/api/winbase/...
https://learn.microsoft.com/en-us/windows/win32/api/winbase/...
Yep. GetEnvironmentStrings and FreeEnvironmentStrings are probably even more noteworthy as they seem to substitute for an exposed environ array, though they push more effort to the application.
It can't ensure synchronization because any code using libc could bypass the sync wrapper. In particular, Rust lets you link C libs which wouldn't use the Rust stdlib.
Because it can still race with C code using the standard library. getenv calls are common in C libraries; the call to getenv in this post was inside of strerror.
you've gotten a lot of answers which say the same thing, but which I don't think answer your question:
synchronization methods impose various complexity and performance penalties, and single threaded applications which don't need that would pay those penalties and get no benefit.
Unix was designed around a lightweight ethos that allowed simple combining of functions by the user on the command line. See "worse is better", but tl;dr that way of doing things proved better, and that's why you find yourself confronting what it doesn't do.
The real problem is that getenv() and setenv() were created before threads were really a thing.
Well it was better in the short term but is worse in the long term. In particular, the error handling situation is generally atrocious, which is fine for interactive/sysadmin use but much worse for serious production use.
Even if C stdlib maintainers are resistant against making setenv multi-thread safe, at a minimum there should be a new alternative thread-safe API defined, whether within POSIX or defining a defacto standard and forcing POSIX to adopt it over time. If instead of explaining why nothing could be done was spent fixing this problem, a new thread-safe API could have replaced the old setenv which could have been deprecated and removed from many software projects.
I'm also not convinced by Musl's maintainer that it can't be fixed within Musl considering glibc is making changes to make this a non-issue.
The biggest problem is not the absence of a thread safe API, it's the existence of this:
As long as environ is publicly accessible, there's no guarantee that setenv and getenv will be used at all, since they're not necessary.If you're willing to get rid of environ, it's pretty trivial to make setenv and getenv thread safe. If not, then it's impossible, although one could still argue that making setenv and getenv thread safe is at least an improvement, even if it's not a complete solution (aka don't let the perfect be the enemy of the good).
> aka don't let the perfect be the enemy of the good
Exactly my point. Over time *environ would disappear, at least from the major software projects that everyone uses (assuming it's even in use in them in the first place).
Guess that would also require some locking for all the exec() functions that don't take the environment as a parameter or that search PATH for the executable.
I'm not convinced by you that you know more than the experts who have determined there is no backwards-compatible way to fix this.
I'll take existence proofs [1] over personal insults but YMMV. You also may want to be careful assuming the expertise of people on this forum. Some people here are quite technical.
[1] https://github.com/bminor/glibc/commit/7a61e7f557a97ab597d6f...
That isn't thread safe, it's safER.
I am also quite technical, thanks.
Its like a rite of passage to be hit by an environment related bug on linux, which is mysteriously less a problem on other unix's. Which is sorta funny given how pragmatic Linus and the kernel are about fixing POSIX bugs by making them not happen, while glibc is still lagging here decades after people tried to at least make the problem better. Sure there is all the crap around TZ/etc, but simply providing getenv_r() and synchronizing it with setenv() and warning during compile/link on getenv() would have killed much of the problem. Nevermind, actually doing a COW style system where the env pointer(s) are read only. Instead the problem is pushed to the individual application, which is a huge mistake, because application writers are rarely aware of what their dependencies are doing. Which is the situation I found myself in many many years ago. The closed source library vendor, at the time, told us to stop using that toy unix clone (linux).
> environment related bug on linux, which is mysteriously less a problem on other unix's.
How do you figure? The problem isn't the implementation, it's the API. setenv(), unsetenv(), putenv(), and especially environ, are inherently unsafe in a multithreaded program. Even getenv_r() can't really save you, since another thread may be calling setenv() while the (old) value of an env var is being copied into the provided buffer. Sure, a getenv_r() fixes the case where you get something back from getenv(), and then another thread calls setenv() and makes that memory invalid, but there's no way to protect the other calls breaking the API.
There are ways to mitigate some of the issues, like having libc hold a mutex when inside getenv()/setenv()/putenv()/unsetenv(), but there's still no way for libc to guarantee that something returned by getenv() remains valid long enough for the calling code to use it (which, right, can be fixed by getenv_r(), which could also be protected by that mutex). But there's no good way to make direct access to environ safe. I suppose you could make environ a thread-local, but then different threads' views of the environment could become out of sync, permanently (and you could get different results between calling getenv_r() and examining environ directly).
Back-compat here is just really hard to do. Even adding a mutex to protect those functions could change the semantics enough to break existing programs. (Arguably they're already broken in that case, but still...)
Why does adding a mutex break the API? I guess it breaks `char**environ`. But the API wouldn't be broken.
I think you would have to change the API to return a copy of the string as the get_env result which the caller is responsible for free-ing or the env implementation would have to ensure returned values from get_env are stable and never change which is effectively a memory leak.
> Even getenv_r() can't really save you, since another thread may be calling setenv() while the (old) value of an env var is being copied into the provided buffer.
Won't that depends on the libc implementation. For example, maybe setenv writes to another buffer, then swaps pointers atomically; wouldn't that work?
Previously on setenv being a terrible thing: https://www.evanjones.ca/setenv-is-not-thread-safe.html (discussion: https://news.ycombinator.com/item?id=38342642 first comment is even about it causing issues in Rust)
Yes. That's known.
Most of the rest of the problem here seems to be the development environment. They're testing on a remote machine in an Amazon data center and using Docker. This rig fails to report that a process has crashed. Then they don't have enough debug symbol info inside their container to get a backtrace. If they'd gotten a clean backtrace reported on the first failure, this would have been obvious.
Why is anyone using "setenv" anyway?
Yup, it's mostly just the story and tools we used to get ourselves out of a mess that was made harder by some decisions made earlier -- the tests were running in a container with stripped symbols (we're going to ship symbols after this, no reason to over-optimize), our custom test runner failed to report process death (an oversight).
There's no reason setenv should have been called here. The `openssl-probe` library could simply return the paths to the system cert files and callers could plug those directly into the OpenSSL config.
Oversights all around and hopefully this continues to improve.
> Why is anyone using "setenv" anyway?
Because it’s there and it looks like a good idea until it takes one of your fingers.
It really does not look like a good idea to setenv() . The very notion is quite terrifying. Messing with a bunch of globals, that other code knows about as well? Nuh-uh.
The thing is, the OP people weren't doing that at all, it was some irresponsible library maintainers. If your code does that, you have to include something like the "surgeon general's warning" everywhere: "CAREFUL: USING THIS LIBRARY MAY CAUSE TERMINAL CRASHES".
It's OpenSSL. It's basically a sea urchin turned into code in terms of safe handling.
This reminded me of that whole "12-factor app" movement, which several of my former coworkers had really bought into. One of the "factors" is that apps should be configured by environment variables.
I always thought this was kinda foolish: your configuration method is a flat-namespace basked of stringly-typed values. The perils of getenv()/setenv()/environ are also, I think, a great argument against using env vars for configuration.
Sure, there aren't always great, well-supported options out there. I prefer using a configuration file (you can have templated config and a system that fills in different values for e.g. dev/stage/prod), and I'll usually use YAML, despite its faults and gotchas. There are probably better configuration file formats, but IMO YAML is still significantly better than using env vars.
I have similar reservations about env vars. I dislike how they can be read from anywhere--it interrupts the ability to reason about a function's behavior from its signature and makes impure plenty of functions that could otherwise have been pure.
If there were a language feature that let me mark apps such that during any process env vars are not writable and are readable only once (together, in a batch, not once per var), I'd use it everywhere.
getenv() is perfectly fine, it's setenv() that is the problem. Which in theory this wouldn't be using since the env would be set up prior to starting that mystical app.
But yes, a flat namespace, with string values, shared as a free-for-all with who knows what libraries and modules you're loading… that's not a good idea even if it didn't have safety issues in setenv().
Great article about digging into a non-obvious bug. This one had it all! Intermittent bug, architecture-specific, hidden in a dependency, rust, the python GIL, gettext. Fantastic stuff.
These kinds of detailed troubleshooting reports are the closest thing you can get to having to do it yourself. Thanks to the authors. It's easy to say "don't use X duh" until a dependency relies on it, and how were you supposed to know?
What is the rationale for libc not making setenv/getenv thread safe? It does seem rather odd given how environment variables are explicitly defined as shared between threads in the same process!
It doesn't seem it would take much to do it efficiently, even retaining the poor getenv() pointer-returning API (which could point to a thread local buffer). The coordination between getenv and setenv could be very lightweight - spinlock vs mutex.
I wonder why it is so hard for Rust to implement its own safe stdlib independent of C.
How exactly would that help in this situation?
If both Rust and C have independent standard libraries loaded into the same process, each would have an independent set of environment variables. So setting a variable from Rust wouldn't make it visible to the C code, which would break the article's usecase of configuring OpenSSL.
The only real solution is to have the operating system provide a thread-safe way of managing environment variables. Windows does so; but in Linux that's the job of libc, which refuses to provide thread-safety.
The crash in the article happened when Python called C's getenv. Rust could very well throw away libc, but then it would also be throwing away its great C interop story. Rust can't force Python to use its own stdlib instead of libc.
They did, it's called core. But it assumes no operating system at all, and environment variables require an operating system.
> and environment variables require an operating system
Is that true? It's just a process global string -> string map, that can be pre-loaded with values before the process starts, with a copy of the current state being passed to any sub-process. This could be trivially implemented with batch processing/supervisory programs.
Sure, there's a broader concept here, which doesn't require any operating system. But any alternate string->string map you define won't answer to C code calling getenv, won't be passed to child processes created with fork, won't be visible through /proc/$PID/environ, etc.
This is the context:
> They did, it's called core. But it assumes no operating system at all, and environment variables require an operating system.
I think there's some confusion here. The C standard library is an abstraction layer that exists to implement standard behavior on hardware. It's entirely unrelated to the existence of an OS. Things like "/proc/$PID/environ" have nothing to do with C.
There are many standard libraries, for embedded, that implement these things, like getenv, on bare metal [1].
Standard C libraries exist to implement functionality. It does not define how to implement the functionality. That's the whole point of C: it's an abstraction that has very little requirements.
The implementation of environment variables don't require an OS. If they made this "core", they could trivially implement the concept.
[1] https://en.wikipedia.org/wiki/Newlib [2] getenv: https://sourceware.org/newlib/libc.html
Well, it's used by the OS when exec-ing a new process, but at least the Linux syscall for that takes the environment as an explicit parameter. So it could be managed in whatever way by the runtime until execve() is called.
Environment variables are not just technical, they're social. You need to get everyone on board with your scheme.
Linux is an unusual platform in that it allows you to call into it via assembly. Most other platforms require you to go through libc to do so. It's not really in Rust's hands.
This is not unusual at all. Windows allowed it for years before Linux came along. It was also true of some other *nix systems - IIRC, Ultrix (DEC) allowed this, and so did Dynix (Sequent).
*BSD allows it too, or used as of 2022.
What is unusual about Linux is that it guarantees a syscall ABI, meaning that if you follow it, you can make a system call "portably" across "any" version of Linux.
Sure, I’m speaking about platforms that are relevant today, not historical ones. Windows, MacOS, {Free,Open,Net}BSD, Solaris, illumos, none of these do.
It's quite easy to find out the actual situation on this since Go decided to do it their way. Last I checked, OpenBSD is the only OS where they go through libc, but I haven't really kept up.
In my understanding, Go initially disregarded various platforms' rules here, and have ended up walking it back. I could be wrong though.
It's hard to find good details here, but here's a mailing list thread from 2019 mentioning libc usage: https://groups.google.com/g/golang-nuts/c/uX8eUeyuuAY/m/Cfhl...
> On Solaris (and Windows), and more recently in macOS as well we link with libc (or equivalent).
> Go used to do raw system calls on macOS, and binaries were occasionally broken by kernel updates. Now Go uses libc on macOS.
Yep, in 2022 it finally started using libc on *BSD too.
But ... there's a difference between being able to do direct syscalls via asm, and them being portable across kernel versions, which is what this subthread was about.
Granted, most people want version portability, but still on a technical level, it's not the same thing.
No, my comment was about what APIs a platform considers to be their stable, external API. That you can technically call them anyway (except for ones like OpenBSD that actively check and prevent you) doesn't mean you're not doing something unsupported.
It would be a tremendous amount of work, and would take years. Meanwhile, the problems are avoidable. It's not exactly the "rust way" to just remember and avoid problems, but everything in language design is compromises.
"Impossibru!!"
https://github.com/sunfishcode/eyra
Oh look:
> Why use Eyra? It fixes Rust's set_var unsoundness issue. The environment-variable implementation leaks memory internally (it is optional, but enabled by default), so setenv etc. are thread-safe.
That only works on Linux though right?
That's quite a trade-off
I think glibc made the same trade-off. It makes sense for most types of programs, but there's certainly a lot of classes of programs that wouldn't take it.
What is? Leaking memory? It's going to be a few kB at absolute most. Not an issue unless you are doing something very weird.
Couldn't we have a better pattern for this?
It is weird that I got this right before Rust did.
Because I use structured concurrency, I can make it so every thread has its own environment stack. To add to a new environment, I duplicate it, add the new variable, and push the new enviroment on the stack.
Then I can use code blocks to delimit where that stack should be popped. [1]
This is all perfectly safe, no `unsafe` required, and can even extend to other things like the current working directory. [2]
IMO, Rust got this wrong 10 years ago when Leakpocalypse broke. [3]
[1]: https://git.yzena.com/Yzena/Yc/src/branch/master/tests/yao/e...
[2]: https://gavinhoward.com/2024/09/rewriting-rust-a-response/#g...
[3]: https://gavinhoward.com/2024/05/what-rust-got-wrong-on-forma...
This isn't _really_ a Rust problem. Rust is a victim of POSIX.
If you have 1) C FFI interop in Yao, there's still a chance you might have two C libraries cause a crash without your code even being involved.
Except if there is dymanic linking, I can use that to inject my own setenv and getenv, just like people inject jemalloc or other malloc alternatives.
We ended up overriding and replacing with our own thread-safe version years ago when we also hit this.
We had so many of these issues that we ended up LD_PRELOAD-ing patch getenv / setenv / putenv
With a fixed implementation that leaks environments (like the one that just landed in glibc)?
Yet another person is burned by calling setenv() in a multi-threaded context. There really needs to be a big warning banner on the manpage for setenv() that warns about this because it seems like a far more common problem than you would expect.
The man page says:
> POSIX.1 does not require setenv() or unsetenv() to be reentrant.
A non-reentrant function cannot be thread safe.
In general (for POSIX, libc and many other libraries: if the docs do not explicitly say "this function is thread safe" they are not).
It's time to move beyond this attitude and make things safe by default. For example, Solaris has a safer version of setenv().
"It is ridiculous that this has been a known problem for so long. It has wasted thousands of hours of people's time, either debugging the problems, or debating what to do about it. We know how to fix the problem." https://www.evanjones.ca/setenv-is-not-thread-safe.html
One of the major differences between X Window and the win32 GUI APIs is that the windows one builds in thread safety, and it cannot be removed. This means that you pay the price of mutexes and the like (what the windows world likes to call "critical sections"), even if you have a single threaded GUI. X Window, on the other hand, decided to do nothing about threads at all, leaving it up to the application.
30 years after these decisions were made, most sensible people do single threaded GUIs anyway (that is, all calls to the windowing API come from a single thread, and all redraws occur synchronously with respect to that thread; this does not block the use of threads functioning as workers on behalf of the GUI, but they are not allowed to make windowing API calls themselves).
Consequently, the overhead present in the win32 API is basically just dead-weight, there to make sure that "things are safe by default".
There's a design lesson here for everyone, though precisely what it is will likely still be argued about.
Yet 30 years later people are calling setenv()/getenv() from different threads even though "it is known" that it crashes. For whatever reason the lesson from GUIs doesn't apply here.
Judging from a lot of the comments in this thread, the idea that there could even be parts of the *POSIX API* that are not thread-safe seems like an idea that hasn't even occured to a lot of (younger?) programmers ...
You can't.
You could wrap setenv in a mutex, but that's not good enough. It can still be called from different processes, which means you'd need to do a more expensive and complex syncing system to make it safe.
That ballons out to other env related methods needing to honor the synchronization primitive in order for there to be a semblance of safety.
However, you still end up in a scenario where you can call
and that would be incorrect because between the set and the get, even with mutexes properly in place and coordinated amongst different applications, you have a race condition where your set can be overwritten by another application's set before your get can run. Now, instead of actually making these functions safe you've buried the fact that external processes (or your own threads) can mess with env state.The solution is to stop using env as some sort of global variable and instead treat it as a constant when the application starts. Using setenv should be mostly discouraged because of these issues.
How does an external process mess with env state? As far as I know, you pass the environment when doing the execvpe() and then you cannot touch it from outside of the process anymore.
You're correct. Parent comment is inaccurate. The problem is that a different library in the same process can use getenv without locking (or without locking the same lock as your code)
Of course you can. Mutexes are system objects, so it's not a huge problem to sync across processes, if you really have to (is it really expected that one process can set env vars inside another process?).
Making global state, especially state that has no reason to be modified or even read very often like the env, thread safe is a trivial issue, well studied and understood. Could an intern do it? Probably not. Could literally any maintainer of a standard C library? Easily.
This is much more of a culture problem preventing such obvious flaws from being recognized as such.
Side-note: your set-then-get example is a theoretical problem in search of a use case. Why would you ever want to concurrently set an env var and expect to be guaranteed to read that same value? And even if this is a real thing that applications really use, exposing a new function to sync anything on the env mutex is, again, trivial. So, if you really needed that, you could do
And problem solved.That doesn't solve anything. You could be using a library (perhaps a closed-source one) that doesn't use these hypothetical lockenv()/unlockenv() functions.
This needs to be fixed inside libc, but there's no way to do so completely without breaking backward-compatibility.
That is a technical solution. What is your solution to the much more serious social problem of adding this check to every codebase in existence? What points of leverage do you have?
You didn't read the link, did you?
I am not sure making things safe by default is a good idea. This always comes with a cost. Thats also the reason why basic data types (array, dictionaries, etc) are generally not thread safe… because its usually not needed or handled on a much higher level.
Its a different story for languages/environments that are supposed to be safe by default and where you have language features that ensure safety (actors, optionals etc) but not for something like libc which has a standard it has to conform to and like 100 years of history.
The problem with `setenv` is that people expect one process to have one set of environment variables, which is shared across multiple languages running in that process. This implies every language must let its environment variables be managed by a central language-independent library -- and on POSIX systems, that's libc. So if libc refuses to provide thread-safety, that impacts not just C, but all possible languages (except for those that cannot call into C-libraries; as those don't need to bother synchronizing the environment with libc).
It's not just that "libc refuses to provide thread-safety" ... the POSIX standard specifies that these functions are non-reentrant.
In some cases this is true. In the case of setting and getting env vars, it is not. There is no comceivable reason for making a process that spends any significant portion of its runtime calling setenv() or getenv(). Even if those calls were a thousand times slower than today, it would still be a non-issue.
> A non-reentrant function cannot be thread safe.
Actually, a non-reentrant function can be thread-safe. A common example of such a function in libc being malloc().
By definition, a "reentrant function" is a function that may be invoked even when it has not returned yet from a previous invocation.
So a non-reentrant function is a function that may not be invoked again between a previous invocation and returning from that invocation.
When a function may be invoked from different threads, then it is certain that sometimes it will be invoked by a thread before returning from a previous invocation from a different thread.
Therefore any function that may be invoked from different threads must be reentrant. Otherwise the behavior of the program is unpredictable. Reentrant functions may be required even in single-thread programs, when they may be invoked recursively, or they may be invoked by signal handlers.
An implementation of "malloc" may be reentrant or it may be non-reentrant.
Old "malloc" implementations were usually non-reentrant because they used global variables for managing the heap. Such "malloc" functions could not be used in multi-threaded programs.
Modern "malloc" implementations are reentrant, either by using only thread-local storage or by using shared global variables to which some method for concurrent access is implemented, e.g. with mutual exclusion.
Who has a signal safe malloc?
POSIX does not require malloc to be signal safe.
Therefore I do not think that anyone has bothered to implement a signal-safe malloc, as this is likely to be complicated.
Allocating memory in a signal handler makes no sense in a well designed program, so not being allowed to use malloc and related functions is not a problem.
I could be wrong but isnt that because each thread has its own heap?
Funny enough, the Rust wrapper `std::env::set_var` does have a big warning https://doc.rust-lang.org/std/env/fn.set_var.html
Looks like that Safety section was added in 1.76.0. It'll be an even bigger warning in the future since it's now going to be unsafe in Rust 2024
Sounds like you just didn't know it's not threadsafe. This is common knowledge in the C and C++ world.
A function which sets global process state is not thread safe? Why, I'm shocked; shocked and chagrined.
But really, I don't understand why a sensitive security-related library would implicitly use an unsafe function like setenv().
> A function which sets global process state is not thread safe? Why, I'm shocked; shocked and chagrined.
This is a oversimplification. Windows has essentially the exact same API and it works just fine in multithreaded contexts.
The issue here is unix allows the underlying pointer to be accessed, bypassing any possible thread-safe APIs.
Mutable global state is evil. Friends don’t let friends use mutable global state.
I hate envvars. It’s “the Linux way”. I avoid them like the plague. A++ strong recommend.
libc is terrible. The world needs to move on.
Env vars are good if you treat them as read-only within the process
Yeah, setenv should probably just not exist, and environment variables should be only set when spawning new processes.
The problem is that applications sometimes need to set environment variables which will be read by libraries in the same process. This is safe to do during startup, but at no later times.
Ideally all libraries which use environment variables should have APIs allowing you to override the env variables without calling setenv(), but that isn't always the case.
I’d argue that libraries shouldn’t read environment variables at all. They’re passed on the initial program stack and look just like stack vars, so the issue here is essentially the same as taking the address of a stack variable and misusing it.
Just like a library wouldn’t try to use argv directly, it shouldn’t use envp either (even if done via getenv/setenv)
> The problem is that applications sometimes need to set environment variables which will be read by libraries in the same process. This is safe to do during startup, but at no later times.
No, the problem is that libraries try to do this at all. Libraries should just have those APIs you mention, and not touch env vars, period. If you, the library user, really want to use env vars for those settings, you can getenv() them yourself and pass them to the library's APIs.
Obviously we can't change history; there are libraries that do this anyway. But we should encourage library authors to (in the future) pretend that env vars don't exist.
The place where it makes sense for a library to read environment variables is where the program is not written to use that specific library. For example, I can link a program whose author has never heard of TCMalloc against TCMalloc rather than the system malloc, and then configure TCMalloc via environment variables. This does not require modifying a single line of code, while manually forwarding configuration onto the allocator would. Another common example is configuring sanitizers. Not having to do anything other than pass another command-line switch to the compiler is one of the things that makes them really painless to use.
I do think you'd be hard-pressed to find a situation where a program calling setenv() to configure a library actually makes sense. It's a pretty strong sign that someone made a bad decision. People will, however, make mistakes in API design.
If env vars don't exist, that makes it much harder (and more likely impossible) for users to modify library/application behavior at run time.
I agree with you that it would be much better if, when libA needs to set behavio Foo in libB, it called libB:setBehavior (Foo) rather than setenv ("LibBehavior", "Foo")
But let's not throw the baby out with the bathwater.
Yeah, the cows have certainly gotten out already.
I’ll take a config file over an envvar 100% of the time.
> Mutable global state is evil. Friends don’t let friends use mutable global state.
Throw away your CPU and RAM then.
Your CPU has an MMU in order to (among other things) let the OS prevent mutable global state.
And disks. And the cloud. Or basically, you know, computers.
Don't threaten me with a good time.
The universe, you mean.
Ah yes, the cloud where we all happily share compute resources without any restrictions to avoid stomping on each others toes.
I can not possibly roll my eyes hard enough.
Go ahead and write lots of mutable global statics. But when your program crashes randomly and you need my help to debug and it is, once again, a global mutable then you have to perform a walk of shame.
what do you suggest as alternative?
the problem is not linux, not mutable global state or resources and not libc.
the problem is not getting time at work to do things properly. like spotting this in GDB before the issue hit, because your boss gave you time to tirelessly debug and reverse your code and anything it touches....
there is too much money in halfbaked code. sad but true.
It definitely is the current libc. That one's proven by systems which do not have the same problem. Then the next layer problem is trying to pretend we can get everyone to pay attention and avoid bugs in code instead of forcing interfaces and implementations where those bugs are not possible.
libc moved the world into the Information Age
In the same way that Yersinia pestis moved the world into the Renaissance?
Yes, neither were memory or thread safe
What's your preferred alternative?
Don’t use a mouse or a monitor then.
The whole point of Rust is memory safety, not thread safety...
Rust literally bakes data race safety into the language. While it does not resolve general race conditions, thread safety issues which cause memory unsafety (which an UAF or dangling pointer would be) are very much within its remit.