“having strong or intimate connections with employees of Open Philanthropy greatly enhances the chances of having funding, and it seems almost necessary”
Is this a well-identified phenomenon (in the causal inference sense) ?
Consider the following directed acyclic graph:
Connected with OpenPhil employees ----------> Gets funding from OpenPhil
^ ^
| |
| |
| |
| |
+------ Works on alignment ------+
One explanation for this correlation you identify is that being connected with OpenPhil employees leads to people getting funding from OpenPhil (as demonstrated by the horizontal arrow). However, another explanation is that working on alignment causes one to connect with others who are interested in the problem of AI alignment, as well as getting funding from a philanthropic organisation which funds work on AI alignment (as demonstrated by the vertical arrows).
These two explanations are observationally equivalent, in the absence of exogenous variation with respect to how connected one is to OpenPhil employees. Since claiming that it is “almost necessary” to have “strong or intimate connections with employees of Open Philanthropy” to get funding implies wrongdoing from OpenPhil, I’d be interested in evidence which would demonstrate this!
These don’t seem very compelling to me.
This argument proves too much. The same could be said of “go and donate your money, this (list of charities we think are most effective) is the way to do it”.
My takeaway was that messages which could be spread include: “we should worry about conflict between misaligned AI and all humans”, “AIs could behave deceptively, so evidence of safety might be misleading, “AI projects should establish and demonstrate safety (and potentially comply with safety standards) before deploying powerful systems”, “alignment research is prosocial and great” and “we’re not ready for this”. (I excluded “it might be important for companies and other institutions to act in unusual ways”, because I agree this doesn’t seem like a straightforward message to spread).
The answer is probably (a).
“Disproportionate” seems like it boils down to an object-level disagreement about relative cause prioritisation between AI safety and other causes.