Before knowing about binfmt, I always wondered how wine is able is able to execute .exe files directly, i.e. ./prog.exe instead of wine ./prog.exe. Turns out the wine package (at least on Arch) comes with a handler for them and the Arch wiki mentions that you may want to remove it for security reasons.
It can also be used to automatically execute jar files with "java -jar". I don't think arch is set up to do that automatically, but it is fairly easy to do[1].
binfmt can also be used to register qemu for binaries for foreign architectures. This allows running programs compiled for another architecture, and makes it really simply to run podman/docker containers with images for other architectures.
The qemu and container case is a little interesting because if for example /usr/bin/qemu-system-aarch64 or similar is registered as a binfmt_misc handler for AArch64 ELF binaries; the kernel will execute qemu for AArch64 ELF binaries.
But inside a container (with its own mount namespace) or inside a chroot then the qemu binaries does not necessarily exist. But the binfmt_misc handler will still work in this case because of two features.
1. The kernel will open the qemu binaries in the original mount namespace when the binfmt_misc handler is registered with the F-flag (Fix binary) so the kernel will always have an open file reference to the qemu binary independent of mount namespace.
2. Distributions (at least Debian) ships statically linked qemu binaries so that qemu does not need to load any shared libraries inside the target namespace/chroot.
Why can you register interpreters as non-root and why do these custom interpreters take precedence?
EDIT: Checked on may dated ubuntu laptop, /proc/sys/fs/binfmt_misc/register is root:root owned with --w-------. An important detail that the article omits and that changes this "vulnerability".
> provides a nifty way (once the attacker has gained root rights on the machine) to create a little backdoor to regain root access when the original access no longer works
so it does imply it needs root rights
but it's an example of why it's a bad idea to "cleanup" a system from a virus without a full reinstall
it also matters for other reasons, as some ways to gain root are unreliable and don't persist reboot and you don't want to hide that you have root access
To be fair, the article is not claiming that binfmt_misc is a security vulnerability, or at least I didn't come away with that impression (and the word "vulnerability" doesn't appear in the page either.) It's just being pointed out that you can use it as a pretty sneaky way to leave yourself a backdoor, which I think it is, among many.
Even so, this is a fairly weak persistence primitive. It requires root access, isn't available in containers, can be checked for in a single location, and doesn't survive a reboot.
If it didn't require root access... it would be a privilege escalation. I don't think that counts as a strike against it.
> isn't available in containers
Well, you can't apply it inside of (unprivileged) containers, but I think it does at least work as a backdoor inside of containers.
> can be checked for in a single location
Almost all of them can if you know where to look, though? The point here is that nobody checks for this. If I got pwned I would just light the box on fire and start anew but if I had no choice but to try to clean it up I would never guess about binfmt_misc as a way to regain root. It could go undetected for quite a long time, even if the original problem is patched, which could potentially happen without the administrators realizing the box was compromised.
> and doesn't survive a reboot.
Both this and the comment about containers makes me think you're thinking of modern infrastructure where you use containers and mostly-immutable or actually-immutable OS images, but I think this sort of mechanism is pretty squarely aimed at old-school pets-not-cattle infrastructure. I'd love to say all of my infrastructure is "modern" but sometimes modern infrastructure is just a bit overkill, so while I still would just burn everything down, I do have some infrastructure that is "oldschool". In this case, the threat of a reboot is pretty minuscule. Here, I will demonstrate from a real live server:
$ uptime
23:19:03 up 133 days 8:27, 1 user, load average: 0.53, 0.51, 0.49
Of course, I'm not gloating. I've had uptimes counted in years in the past, and I'm sure there are plenty of people here with more impressive uptimes (and probably a lot more unpatched vulnerabilities, lol.)
And the reason the uptime is so high is because the server is relatively important but there is no redundancy, so any updates have to be done as online as possible. In my case it's a matter of reducing costs.
If a box gets pwned I feel like you just need to reformat; and in my case I can, because I have backups and a way to reprovision everything again from scratch. I am going to guess, though, that there's literally tons of infrastructure out there where they don't have adequate backups or a way to reprovision the OS image from scratch.
It's not omitted by the article, the threat model is stated explicitly:
> TL;DR: binfmt_misc provides a nifty way (once the attacker has gained root rights on the machine) to create a little backdoor to regain root access when the original access no longer works.
Traditionally I've seen these adapters primarily used to pass binaries for other architectures to QEMU and similar.
Years ago on FreeBSD I created a "Volkswagen mode" by using the similar `imgact_binmisc` kernel module to register a handler for binaries with the system's native ELF headers. It took a bit of hacking to make it all work with the native architecture, but when it was done, the handler would simply execute the binary, drop its return code, and return 0 instead - effectively making the system think that every command was "successful"
The system failed to boot when I finally got it all working (which was expected) but it was a fun adventure to do something so pointless and silly.
It would be a similarly clever place to maintain persistence and transparently inject bytecode or do other rude things on FreeBSD as well
Yup, using this approach it's possible to build/use aarch64 containers on an x86 machine. This technique means that a much smaller set of operations are being emulated (doesn't have to emulate the entire kernel etc)
For something I was building, it enabled me to get a full aarch64 compilation done, with a native toolkit, without having to run a full emulation layer. The time savings of doing it this way vs full emulation were huge. Off the top of my head, emulated it was taking over an hour to do the full build, whereas within a container it was only about 10-15 minutes.
I used to run sco xenix and unix binaries on linux via ibcs. That worked by registering a binfmt-something-else not -misc, because it didn't load an interpreter like qemu or wine, the kernel ran the binary directly, so the binfmt was something like -sysv or -ibcs2 or something. Not for real / production, just for fun. I got it go but no situation ever arised that wasn't better solved some other way. And good thing because I don't think that has worked for many years.
I wonder if the compiler could still be a native binary (but still producing code for the target architecture). The motivation is to have the performance of a cross compile with the simplicity of a native build. I had that idea a long time ago but never tried it.
For languages that support cross-compiling e.g Golang, the Docker docs recommend cross-compiling natively and copying the resulting binary to the various platform images. binfmt_misc with QEMU is needed for languages that don't support that, or when you want to run a binary from the base image. For example, if you're building a x86 Docker image on ARM and you run `RUN apt install` in the Dockerfile, you're essentially running an x86 ELF on ARM, and that's where QEMU/binfmt_misc step in.
Another reason I compile my own kernels and disable features like this. I also disable loadable kernel modules.
Of course this makes standard support channels... Difficult.
Before knowing about binfmt, I always wondered how wine is able is able to execute .exe files directly, i.e. ./prog.exe instead of wine ./prog.exe. Turns out the wine package (at least on Arch) comes with a handler for them and the Arch wiki mentions that you may want to remove it for security reasons.
It can also be used to automatically execute jar files with "java -jar". I don't think arch is set up to do that automatically, but it is fairly easy to do[1].
[1]: https://wiki.archlinux.org/title/Binfmt_misc_for_Java
binfmt can also be used to register qemu for binaries for foreign architectures. This allows running programs compiled for another architecture, and makes it really simply to run podman/docker containers with images for other architectures.
The qemu and container case is a little interesting because if for example /usr/bin/qemu-system-aarch64 or similar is registered as a binfmt_misc handler for AArch64 ELF binaries; the kernel will execute qemu for AArch64 ELF binaries.
But inside a container (with its own mount namespace) or inside a chroot then the qemu binaries does not necessarily exist. But the binfmt_misc handler will still work in this case because of two features.
1. The kernel will open the qemu binaries in the original mount namespace when the binfmt_misc handler is registered with the F-flag (Fix binary) so the kernel will always have an open file reference to the qemu binary independent of mount namespace.
2. Distributions (at least Debian) ships statically linked qemu binaries so that qemu does not need to load any shared libraries inside the target namespace/chroot.
also chroot into ie raspi sd cards.
WSL also uses binfmt so that you can run windows executables from inside whatever distro you have running. I thought that was pretty neat.
Why can you register interpreters as non-root and why do these custom interpreters take precedence?
EDIT: Checked on may dated ubuntu laptop, /proc/sys/fs/binfmt_misc/register is root:root owned with --w-------. An important detail that the article omits and that changes this "vulnerability".
> provides a nifty way (once the attacker has gained root rights on the machine) to create a little backdoor to regain root access when the original access no longer works
so it does imply it needs root rights
but it's an example of why it's a bad idea to "cleanup" a system from a virus without a full reinstall
it also matters for other reasons, as some ways to gain root are unreliable and don't persist reboot and you don't want to hide that you have root access
> but it's an example of why it's a bad idea to "cleanup" a system from a virus without a full reinstall
This x1000.
You can't. This is a classic example of an "other side of this airtight hatchway"[1] problem.
[1]: https://devblogs.microsoft.com/oldnewthing/20060508-22/?p=31...
To be fair, the article is not claiming that binfmt_misc is a security vulnerability, or at least I didn't come away with that impression (and the word "vulnerability" doesn't appear in the page either.) It's just being pointed out that you can use it as a pretty sneaky way to leave yourself a backdoor, which I think it is, among many.
Even so, this is a fairly weak persistence primitive. It requires root access, isn't available in containers, can be checked for in a single location, and doesn't survive a reboot.
> It requires root access
If it didn't require root access... it would be a privilege escalation. I don't think that counts as a strike against it.
> isn't available in containers
Well, you can't apply it inside of (unprivileged) containers, but I think it does at least work as a backdoor inside of containers.
> can be checked for in a single location
Almost all of them can if you know where to look, though? The point here is that nobody checks for this. If I got pwned I would just light the box on fire and start anew but if I had no choice but to try to clean it up I would never guess about binfmt_misc as a way to regain root. It could go undetected for quite a long time, even if the original problem is patched, which could potentially happen without the administrators realizing the box was compromised.
> and doesn't survive a reboot.
Both this and the comment about containers makes me think you're thinking of modern infrastructure where you use containers and mostly-immutable or actually-immutable OS images, but I think this sort of mechanism is pretty squarely aimed at old-school pets-not-cattle infrastructure. I'd love to say all of my infrastructure is "modern" but sometimes modern infrastructure is just a bit overkill, so while I still would just burn everything down, I do have some infrastructure that is "oldschool". In this case, the threat of a reboot is pretty minuscule. Here, I will demonstrate from a real live server:
Of course, I'm not gloating. I've had uptimes counted in years in the past, and I'm sure there are plenty of people here with more impressive uptimes (and probably a lot more unpatched vulnerabilities, lol.)And the reason the uptime is so high is because the server is relatively important but there is no redundancy, so any updates have to be done as online as possible. In my case it's a matter of reducing costs.
If a box gets pwned I feel like you just need to reformat; and in my case I can, because I have backups and a way to reprovision everything again from scratch. I am going to guess, though, that there's literally tons of infrastructure out there where they don't have adequate backups or a way to reprovision the OS image from scratch.
It's not omitted by the article, the threat model is stated explicitly:
> TL;DR: binfmt_misc provides a nifty way (once the attacker has gained root rights on the machine) to create a little backdoor to regain root access when the original access no longer works.
Traditionally I've seen these adapters primarily used to pass binaries for other architectures to QEMU and similar.
Years ago on FreeBSD I created a "Volkswagen mode" by using the similar `imgact_binmisc` kernel module to register a handler for binaries with the system's native ELF headers. It took a bit of hacking to make it all work with the native architecture, but when it was done, the handler would simply execute the binary, drop its return code, and return 0 instead - effectively making the system think that every command was "successful"
The system failed to boot when I finally got it all working (which was expected) but it was a fun adventure to do something so pointless and silly.
It would be a similarly clever place to maintain persistence and transparently inject bytecode or do other rude things on FreeBSD as well
Yup, using this approach it's possible to build/use aarch64 containers on an x86 machine. This technique means that a much smaller set of operations are being emulated (doesn't have to emulate the entire kernel etc)
For something I was building, it enabled me to get a full aarch64 compilation done, with a native toolkit, without having to run a full emulation layer. The time savings of doing it this way vs full emulation were huge. Off the top of my head, emulated it was taking over an hour to do the full build, whereas within a container it was only about 10-15 minutes.
> effectively making the system think that every command was "successful"
I can only imagine the havoc this would wreak on shell scripts that call out to the test/[/[[ binaries on a system.
nit: while test and [ are binaries, [[ is a bash keyword.
Another nit, while test and [ are indeed binaries, they are also bash built-ins (for performance, presumably) so bash won’t exec them normally.
True! And for those curious, you can enable disable this shadowing per command, like so:
You can also use To override builtins once.Ah, you're right of course. Thank goodness for shellcheck keeping my .sh scripts compatible.
https://search.nixos.org/options?show=boot.binfmt.emulatedSy...
Set this one line setting on a nixos system, and it can run foreign binaries. Magic.
I used to run sco xenix and unix binaries on linux via ibcs. That worked by registering a binfmt-something-else not -misc, because it didn't load an interpreter like qemu or wine, the kernel ran the binary directly, so the binfmt was something like -sysv or -ibcs2 or something. Not for real / production, just for fun. I got it go but no situation ever arised that wasn't better solved some other way. And good thing because I don't think that has worked for many years.
One cool usage of Binfmt_misc is multi-platform builds in Docker (through QEMU), although it can be painfully slow.
I wonder if the compiler could still be a native binary (but still producing code for the target architecture). The motivation is to have the performance of a cross compile with the simplicity of a native build. I had that idea a long time ago but never tried it.
For languages that support cross-compiling e.g Golang, the Docker docs recommend cross-compiling natively and copying the resulting binary to the various platform images. binfmt_misc with QEMU is needed for languages that don't support that, or when you want to run a binary from the base image. For example, if you're building a x86 Docker image on ARM and you run `RUN apt install` in the Dockerfile, you're essentially running an x86 ELF on ARM, and that's where QEMU/binfmt_misc step in.
binfmt_misc helped me out a lot some years ago
I had a build system which was able to cross compile.
And a test system which wasn't able to handle cross compiled/emulated/remote code but needed to run test on cross compiled code.
In the end with binfmt the test system never knew it was running the code with qemu instead of native and "just worked".
Sounds like a useful trick for getting a coding agent to run/test/debug cross compiling rules.
Another reason I compile my own kernels and disable features like this. I also disable loadable kernel modules. Of course this makes standard support channels... Difficult.