intermediate programs (interpreters, compilers, assemblers) are used to translate human programming languages into increasingly repetitive and specific languages until they become hardware-readable machine code. This translation is typically done through strict, unambiguous rules, which is good from an organizational and cleanliness perspective, but often results in code which consumes orders of magnitude more low-level instructions (and consequently, time) than if they were hand-translated by a human. This problem is amplified when those compilers do not understand that they are optimizing for machine learning: compilation protocols optimized to render graphics, or worse for CPUs, are far slower.
This is at best an imperfect description of how compilers work. I’m not sure what you mean by “repetitive”, but yeah, the purpose is to translate high-level languages to machine code. However:
Hardware does not care about code organization and cleanliness, nor does the compiler. When designing a compiler/hardware stack the principal metrics are correctness and performance. (Performance is very important, but in relative terms is a distant second to correctness.)
The number of instructions in a program, assembly or otherwise, is not equivalent to runtime. As a trivial example, “while(1)” is a short program with infinite runtime. Some optimizations such as loop unrolling increase instruction count while reducing runtime.
Such optimizations are trivial for a compiler, and tricky but possible for a human to get right.
“often results in code which consumes orders of magnitude more low-level instructions”: not sure what this means. Compilers are pretty efficient, you can play around with source code and see the actual assembly pretty easy (e.g. Godbolt is good for this). There’s no significant section of dead code being produced in the common case.
(Of course the raw number of instructions increases from C or whatever language, this is simply how RISC-like assembly works. “int C = A + B;” turns into “Load A. Load B. Add A and B. Allocate C on the stack. Write the computed value to C’s memory location.”)Humans can sometimes beat the compiler (particularly for tight loops), but compilers in 2023 are really good. I think the senior/junior engineer vs compiler example is wrong. I would say (for a modest loop or critical function): the senior engineer (who has much more experience and knowledge of which tools, metrics, and techniques to use) can gain modest improvement by spending significant time. The junior engineer would probably spend even more time for only a slight improvement.
“This problem is amplified when those compilers do not understand that they are optimizing for machine learning”: Compilers never know the purpose of the code they are optimizing; as you say they are following rule-based optimizations based on various forms of analysis. In LLVM this is basically analysis passes which produce data for optimization passes. For something like PyTorch, “compilation” means PyTorch is analyzing the operation graph you created and mapping it to kernel operations which can be performed on your GPU.
“compilation protocols optimized to render graphics, or worse for CPUs, are far slower”: I don’t understand what you mean by this. What is a compilation protocol for graphics? Can you explain in terms of common compiler/ML tools? (E.g. LLVM MLIR, PyTorch, CUDA?)
I honestly don’t understand how the power plant/flashlight analogy corresponds to compilers. Are you saying this maps to something like LLVM analysis and optimization passes? If so this is wrong; running multiple passes with different optimizations increases performance. Multiple optimization passes was historically (i.e. circa early 2000s) hard for compilers to do but (LLVM author) Chris Lattner’s key idea was to perform all the optimizations on a simple intermediate layer of code (IR) before lowering to machine code.
[deleted]