Suppose someone is an ethical realist: the One True Morality is out there, somewhere, for us to discover. Is it likely that AGI will be able to reason its way to finding it?
What are the best examples of AI behavior we have seen where a model does something “unreasonable” to further its goals? Hallucinating citations?
Suppose someone is an ethical realist: the One True Morality is out there, somewhere, for us to discover. Is it likely that AGI will be able to reason its way to finding it?
What are the best examples of AI behavior we have seen where a model does something “unreasonable” to further its goals? Hallucinating citations?