Strange, unless the original comment from Gerald has been edited since I responded I think I must have misread most of the comment, as I thought it was making a different point (i.e., “could someone explain how misalignment could happen”). I was tired and distracted when I read it, so it wouldn’t be surprising. However, the final paragraph in the comment (which I originally thought was reflected in the rest of the comment) still seems out of place and arrogant.
The ‘final paragraph’ was simply noting that when you try to make concrete AI risks—instead of an abstract machine that is overwhelmingly smarter than human intelligence and randomly aligned, but a real machine that humans have to train and run on their computers—numerous technical mitigation methods are obvious. The one I was alluding to was (1)
Sparsity and myopia are general alignment strategies and as it happens are general software engineering practices. Many of the alignment enthusiasts on lesswrong have rediscovered software architectures that already exist. Not just exist, but are core to software systems ranging from avionics software to web hyperscalers.
Strange, unless the original comment from Gerald has been edited since I responded I think I must have misread most of the comment, as I thought it was making a different point (i.e., “could someone explain how misalignment could happen”). I was tired and distracted when I read it, so it wouldn’t be surprising. However, the final paragraph in the comment (which I originally thought was reflected in the rest of the comment) still seems out of place and arrogant.
The ‘final paragraph’ was simply noting that when you try to make concrete AI risks—instead of an abstract machine that is overwhelmingly smarter than human intelligence and randomly aligned, but a real machine that humans have to train and run on their computers—numerous technical mitigation methods are obvious. The one I was alluding to was (1)
(1) https://www.lesswrong.com/posts/C8XTFtiA5xtje6957/deception-i-ain-t-got-time-for-that
Sparsity and myopia are general alignment strategies and as it happens are general software engineering practices. Many of the alignment enthusiasts on lesswrong have rediscovered software architectures that already exist. Not just exist, but are core to software systems ranging from avionics software to web hyperscalers.
Sparsity happens to be TDD.
Myopia happens to be stateless microservices.