Note about GitHub Windows Actions runners: I think I understand what is wrong with them, though it's somewhat conjecture since I don't actually know how it works internally.
It looks like the free CI runners have C: drive pointing to a disk that is restored from a snapshot, but often times it hasn't finished restoring the entire snapshot by the time your workflow runs, so IO can be very slow, even if you don't need to read from the still-frozen parts of the disk. Some software ran inside workflows will do heavy R/W on C: drive, but it's better to move anything that will be written to disk, e.g. caches, to D: if possible. This often leads to much better performance with I/O and more predictable runtimes, particularly when there isn't a lot of actual compute to do.
That would make sense as that is an optimization that applies to AzDO Pipelines, too, and is documented there. Because it is documented there, most third-party extensions do the right thing and try to write to the right/write drive. It would make sense that the agent/container tech on Windows would be similar between both AzDO Pipelines and GH Actions. It also makes sense that AzDO Pipelines being "windows-first" for a long time would have better documentation and third-party awareness of the optimizations whereas Windows use on GitHub Actions is much more rare and awareness less evenly spread.
TIL. I noticed that lots of GitHub Actions addons use D: drive as intended, but I actually can not find the documentation that documents this for Azure DevOps Pipelines. If you happen to remember where this is documented, could you link it? It would be great to have an authoritative source for this information. I've searched pretty heavily to try to find any information about this and struggled in the past, though it could just be Google being how Google is lately. I can't even find official documentation mentioning the D: drive beyond the variable definitions that point to it.
This isn't a github runner issue, it's an Azure VM issue. They use Azure VM SKUs with temp disks, and windows VMs on those SKUs by default spin up with C on a remote-backed, persistent file share and D on the local temp disk. The remote-backed file share OS disk is absolutely slow, especially if they're using Standard HDD or Standard SSD disks. You can spin up your own VMs that use Premium OS disks for slightly better performance, but if you are doing anything serious on an azure VM, you should use the temp drive (which, by the way, mounts to /mnt on Linux) or ephemeral OS disks.
(And also this is all for v5 and earlier skus and changes slightly for v6 skus but whatever).
There's also the gotcha that a few years ago Microsoft decided that Defender can no longer be turned off or even uninstalled. Even if you install a third-party antimalware product, both will always run. Similarly, folder exclusion rules just stop it reporting viruses it finds, but it'll still scan them and even report them back to Microsoft's cloud services whether you like it or not.
This is so obnoxiously difficult to work around that Microsoft themselves couldn't do it for the Windows runner images used by both GitHub Actions and Azure DevOps. As a consequence, their performance tanked 4x and stayed there.
We're also seeing massive performance regressions for apps that have many small files, such as SQL Server Analysis Services. Basic operations such as backup, restore, or sync are 10x slower with no recourse.
Similarly, the IntelliJ IDEs have a feature to disable A/V scanning on your source code folders for performance. This now does nothing to improve performance.
This is so bad that Microsoft's own developers had to "hack in a workaround" called Dev Drive into Windows 11 so that they could get their work done despite the best efforts of their own company to slow them down. (This isn't included in Windows Server 2022, and hence isn't available for use by the GitHub Agent runner images.)
Note about GitHub Windows Actions runners: I think I understand what is wrong with them, though it's somewhat conjecture since I don't actually know how it works internally.
It looks like the free CI runners have C: drive pointing to a disk that is restored from a snapshot, but often times it hasn't finished restoring the entire snapshot by the time your workflow runs, so IO can be very slow, even if you don't need to read from the still-frozen parts of the disk. Some software ran inside workflows will do heavy R/W on C: drive, but it's better to move anything that will be written to disk, e.g. caches, to D: if possible. This often leads to much better performance with I/O and more predictable runtimes, particularly when there isn't a lot of actual compute to do.
That would make sense as that is an optimization that applies to AzDO Pipelines, too, and is documented there. Because it is documented there, most third-party extensions do the right thing and try to write to the right/write drive. It would make sense that the agent/container tech on Windows would be similar between both AzDO Pipelines and GH Actions. It also makes sense that AzDO Pipelines being "windows-first" for a long time would have better documentation and third-party awareness of the optimizations whereas Windows use on GitHub Actions is much more rare and awareness less evenly spread.
TIL. I noticed that lots of GitHub Actions addons use D: drive as intended, but I actually can not find the documentation that documents this for Azure DevOps Pipelines. If you happen to remember where this is documented, could you link it? It would be great to have an authoritative source for this information. I've searched pretty heavily to try to find any information about this and struggled in the past, though it could just be Google being how Google is lately. I can't even find official documentation mentioning the D: drive beyond the variable definitions that point to it.
The closest I've ever found to a real acknowledgement is this issue with relation to GitHub Actions: https://github.com/actions/runner-images/issues/8755
In case you don't see my comment on the parent, it's an Azure VM thing: https://news.ycombinator.com/item?id=42788912
Look up the difference between Dv5 and Ddv5 VMS, for instance, or anything talking about azure VM temp disks for more info.
This isn't a github runner issue, it's an Azure VM issue. They use Azure VM SKUs with temp disks, and windows VMs on those SKUs by default spin up with C on a remote-backed, persistent file share and D on the local temp disk. The remote-backed file share OS disk is absolutely slow, especially if they're using Standard HDD or Standard SSD disks. You can spin up your own VMs that use Premium OS disks for slightly better performance, but if you are doing anything serious on an azure VM, you should use the temp drive (which, by the way, mounts to /mnt on Linux) or ephemeral OS disks.
(And also this is all for v5 and earlier skus and changes slightly for v6 skus but whatever).
Comments like these are why I come to hn :)
There's also the gotcha that a few years ago Microsoft decided that Defender can no longer be turned off or even uninstalled. Even if you install a third-party antimalware product, both will always run. Similarly, folder exclusion rules just stop it reporting viruses it finds, but it'll still scan them and even report them back to Microsoft's cloud services whether you like it or not.
This is so obnoxiously difficult to work around that Microsoft themselves couldn't do it for the Windows runner images used by both GitHub Actions and Azure DevOps. As a consequence, their performance tanked 4x and stayed there.
We're also seeing massive performance regressions for apps that have many small files, such as SQL Server Analysis Services. Basic operations such as backup, restore, or sync are 10x slower with no recourse.
Similarly, the IntelliJ IDEs have a feature to disable A/V scanning on your source code folders for performance. This now does nothing to improve performance.
This is so bad that Microsoft's own developers had to "hack in a workaround" called Dev Drive into Windows 11 so that they could get their work done despite the best efforts of their own company to slow them down. (This isn't included in Windows Server 2022, and hence isn't available for use by the GitHub Agent runner images.)
See:
https://github.com/actions/runner-images/issues/7320
and
https://github.com/actions/runner-images/issues/8380
I love the various associated PRs and commits futilely twiddling the A/V settings to no avail.
Oh ffs, is that what it is? Wow.
Tale as old as time; Song as old as rhyme; Beauty of the suite