The problem is that the mainstream programming languages are incomplete, because they do not have a simple way to specify whether a set of programming language statements must be executed in sequence, exactly in the order in which they appear in the source text, or they may be executed in any order, concurrently.
Because of this syntax defect, the compiler must guess when it may execute parts of the program concurrently. Very frequently it is impossible for the compiler do decide whether changing the order of execution is valid, so it gives up and it does not parallelize the execution.
There are many programming languages, or extensions for traditional programming languages, like OpenMP or CUDA, which remove this limitation, so parallelization is deterministic and not unpredictable and easily broken by any minor editing of the program source, like in mainstream programming languages.
Instead of relying on a compiler to figure out when an abstract series of operations exactly matches the single operation of a SIMD instruction, I'd rather the language support operations that closer match SIMD instructions.
Instead of recognizing a loop that acts on each memory location in an array, a language that supports performing an operation on an array can much more easily compile to SIMD instructions.
The problem is that compilers are really bad at automatically adding SIMD instructions. We need better, smarter compilers that abstract this out.
The problem is that the mainstream programming languages are incomplete, because they do not have a simple way to specify whether a set of programming language statements must be executed in sequence, exactly in the order in which they appear in the source text, or they may be executed in any order, concurrently.
Because of this syntax defect, the compiler must guess when it may execute parts of the program concurrently. Very frequently it is impossible for the compiler do decide whether changing the order of execution is valid, so it gives up and it does not parallelize the execution.
There are many programming languages, or extensions for traditional programming languages, like OpenMP or CUDA, which remove this limitation, so parallelization is deterministic and not unpredictable and easily broken by any minor editing of the program source, like in mainstream programming languages.
Instead of relying on a compiler to figure out when an abstract series of operations exactly matches the single operation of a SIMD instruction, I'd rather the language support operations that closer match SIMD instructions.
Instead of recognizing a loop that acts on each memory location in an array, a language that supports performing an operation on an array can much more easily compile to SIMD instructions.
I am finding the approach to intrinsics in .NET to be compelling. For example, a Vector<T> type is specifically handled by the JIT:
https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/...
https://learn.microsoft.com/en-us/dotnet/api/system.numerics...