An independent researcher of ethics, AI safety, and AI impacts. LessWrong: https://www.lesswrong.com/users/roman-leventov. Twitter: https://twitter.com/leventov. E-mail: leventov.ru@gmail.com (the preferred mode of communication).
Roman Leventov
You can say exactly the same about Pause AI.
However, talented individuals who have invested in upskilling themselves to go do AIS research (e.g. SERI MATS graduates) are largely unable to secure research positions.
It would be interesting to see the actual numbers, I think Ryan Kidd should have them.
It’s hard to imagine more general and capability-demanding activity as doing good (superhuman!) science in such an absurdly cross-disciplinary field as AI safety (and among the disciplines that are involved there are those that are notoriously not very scientific yet: psychology, sociology, economics, the studies of consciousness, ethics, etc.). So if there is an AI that can do that but still is not counted as AGI, I don’t know what the heck ‘AGI’ should even refer to. Compare with chess, which is a very narrow problem which can be formally defined and doesn’t require AI to operate with any science (and world models) whatsoever.
AI safety is a field concerned with preventing negative outcomes from AI systems and ensuring that AI is beneficial to humanity.
This is a bad definition of “AI safety” as a field, which muddles the water somewhat. I would say that AI safety is a particular R&D branch (plus we can add here meta and proxy activities for this R&D field, such as AI safety fieldbuilding, education, outreach and marketing among students, grantmaking, and platform development such as what apartresearch.com are doing), of the gamut of activity that strives to “prevent the negative result of civilisational AI transition”.
There are also other sorts of activity that strive for that more or less directly, some of which are also R&D (such as governance R&D (cip.org), R&D in cryptography, infosec, and internet decentralisation (trustoverip.org)), and others are not R&D: good old activism and outreach to the general public (StopAI, PauseAI), good old governance (policy development, UK foundational model task force), and various “mitigation” or “differential development” projects and startups, such as Optic, Digital Gaia, Ought, social innovations (I don’t know about any good examples as of yet, though), innovations in education and psychological training of people (I don’t know about any good examples as of yet). See more details and ideas in this comment.
It’s misleading to call this whole gamut of activities “AI safety”. It’s maybe “AI risk mitigation”. By the way, 80000 hours, despite properly calling “Preventing an AI-related catastrophe”, also suggest that the only two ways to apply one’s efforts to this cause is “technical AI safety research” and “governance research and implementation”, which is wrong, as I demonstrated above.
Somebody may ask, isn’t technical AI safety research more direct and more effective way to tackle this cause area? I suspect that it might not be the case for people who don’t work at AGI labs. That is, I suspect that independent or academic AI safety research might be inefficient enough (at least for most people attempting it) that it would be more effective to apply themselves to various other activities, and “mitigation” or “differential development” projects of the likes that are described above. (I will publish a post that details reasoning behind this suspicion later, but for now this comment has the beginning of it.)
Note: this comment is cross-posted on LessWrong.
Classification of AI safety work
Here I proposed a systematic framework for classifying AI safety work. This is a matrix, where one dimension is the system level:
A monolithic AI system, e.g., a conversational LLM
AGI lab (= the system that designs, manufactures, operates, and evolves monolithic AI systems and systems of AIs)
A cyborg, human + AI(s)
A system of AIs with emergent qualities (e.g., https://numer.ai/, but in the future, we may see more systems like this, operating on a larger scope, up to fully automatic AI economy; or a swarm of CoEms automating science)
A human+AI group, community, or society (scale-free consideration, supports arbitrary fractal nestedness): collective intelligence, e.g., The Collective Intelligence Project
The whole civilisation, e.g., Open Agency Architecture, or the Gaia network
Another dimension is the “time” of consideration:
Design time: research into how the corresponding system should be designed (engineered, organised): considering its functional (“capability”, quality of decisions) properties, adversarial robustness (= misuse safety, memetic virus security), and security. AGI labs: org design and charter.
Manufacturing and deployment time: research into how to create the desired designs of systems successfully and safely:
AI training and monitoring of training runs.
Offline alignment of AIs during (or after) training.
AI strategy (= research into how to transition into the desirable civilisational state = design).
Designing upskilling and educational programs for people to become cyborgs is also here (= designing efficient procedures for manufacturing cyborgs out of people and AIs).
Operations time: ongoing (online) alignment of systems on all levels to each other, ongoing monitoring, inspection, anomaly detection, and governance.
Evolutionary time: research into how the (evolutionary lineages of) systems at the given level evolve long-term:
How the human psyche evolves when it is in a cyborg
How humans will evolve over generations as cyborgs
How AI safety labs evolve into AGI capability labs :/
How groups, communities, and society evolve.
Designing feedback systems that don’t let systems “drift” into undesired state over evolutionary time.
Considering system property: property of flexibility of values (i.e., the property opposite of value lock-in, Riedel (2021)).
IMO, it (sometimes) makes sense to think about this separately from alignment per se. Systems could be perfectly aligned with each other but drift into undesirable states and not even notice this if they don’t have proper feedback loops and procedures for reflection.
There would be 6*4 = 24 slots in this matrix, and almost all of them have something interesting to research and design, and none of them is “too early” to consider.
Richard’s directions within the framework
Scalable oversight: (monolithic) AI system * manufacturing time
Mechanistic interpretability: (monolithic) AI system * manufacturing time, also design time (e.g., in the context of the research agenda of weaving together theories of cognition and cognitive development, ML, deep learning, and interpretability through the abstraction-grounding stack, interpretability plays the role of empirical/experimental science work)
Alignment theory: Richard phrases it vaguely, but referencing primarily MIRI-style work reveals that he means primarily “(monolithic) AI system * design, manufacturing, and operations time”.
Evaluations, unrestricted adversarial training: (monolithic) AI system * manufacturing, operations time
Threat modeling: system of AIs (rarely), human + AI group, whole civilisation * deployment time, operations time, evolutionary time
Governance research, policy research: human + AI group, whole civilisation * mostly design and operations time.
Takeaways
To me, it seems almost certain that many current governance institutions and democratic systems will not survive the AI transition of civilisation. Bengio recently hinted at the same conclusion.
Human+AI group design (scale-free: small group, org, society) and the civilisational intelligence design must be modernised.
Richard mostly classifies this as “governance research”, which has a connotation that this is a sort of “literary” work and not science, with which I disagree. There is a ton of cross-disciplinary hard science to be done about group intelligence and civilisational intelligence design: game theory, control theory, resilience theory, linguistics, political economy (rebuild as hard science, of course, on the basis of resource theory, bounded rationality, economic game theory, etc.), cooperative reinforcement learning, etc.
I feel that the design of group intelligence and civilisational intelligence is an under-appreciated area by the AI safety community. Some people do this (Eric Drexler, davidad, the cip.org team, ai.objectives.institute, the Digital Gaia team, and the SingularityNET team, although the latter are less concerned about alignment), but I feel that far more work is needed in this area.
There is also a place for “literary”, strategic research, but I think it should mostly concern deployment time of group and civilisational intelligence designs, i.e., the questions of transition from the current governance systems to the next-generation, computation and AI-assisted systems.
Also, operations and evolutionary time concerns of everything (AI systems, systems of AIs, human+AI groups, civilisation) seem to be under-appreciated and under-researched: alignment is not a “problem to solve”, but an ongoing, manufacturing-time and operations-time process.
[...] we are impressed by [...] ‘Eliciting Latent Knowledge’ [that] provided conceptual clarity to a previously confused concept
To me, it seems that ELK is (was) attention-captivating (among the AI safety community) but doesn’t assume a solid basis: logic and theories of cognition and language, and therefore is actually confusing, which prompted at least several clarification and interpretation atttempts (1, 2, 3). I’d argue that most people leave original ELK writings more confused than they were before. So, I’d classify ELK as a mind-teaser and maybe problem-statement (maybe useful than distracting, or maybe more distracting than useful; it’s hard to judge as of now), but definitely not as great “conceptual clarification” work.
If OpenAI still had a moral compass, and were still among the good guys, they would pause AGI (and ASI) capabilities research until they have achieved a viable, scalable, robust set of alignment methods that have the full support and confidence of AI researchers, AI safety experts, regulators, and the general public.
I disagree with multiple things in this sentence. First, you take a deontology stance, whereas OpenAI clearly acts within consequentialist stance, assuming that if they don’t create ‘safe’ AGI, reckless open-source hackers will (upon the continuing exponential decrease in the cost of effective training compute, and/or the next breakthrough in DNN architecture or training that will make it much more efficient, and/or will enable effective online training). Second, I largely agree with OpenAI as well as Anthropic that iteration is important for building an alignment solution. One probably cannot design a robust, safe AI without empirical iteration, including with increasing capabilities.
I agree with your assessment of the strategy they are taking probably will fail, but mainly because I think we have inadequate human intelligence, human psychology, and coordination mechanisms to execute it. That is, I would support Yudkowsky’s proposal: halt all AGI R&D, develop narrow AI and tech for improving the human genome, make humans much smarter (von Neumann-level of intelligence should be just the average) and have much more peaceful psychology, like bonobos, reform coordination and collective decision-making, and only then re-visit the AGI project roughly with the same methodology as OpenAI proposes, albeit with more diversified methodology: I agree with your criticism that OpenAI is too narrowly focused on some sort of computationalism, to the detriment of the perspectives from psychology, neuroscience, biology, etc. BTW, it seems that DeepMind is more diversified in this regard.
The things that the proposed startup is going to do seems to overlap in various ways with MATS, AI Safety Camp, Orthogonal (https://www.lesswrong.com/posts/b2xTk6BLJqJHd3ExE/orthogonal-a-new-agent-foundations-alignment-organization), European Network for AI Safety (ENAIS, https://forum.effectivealtruism.org/posts/92TAmcppCL7t54Ajn/announcing-the-european-network-for-ai-safety-enais), Nonlinear.org, and LTFF (if you plan to ‘hire’ researchers and pay them salary, i.e., effectively fund them, you basically plan to increase the total fundraising for AI safety, which is currently the LTFF’s role).
Detailing similarities, differences, and partnerships with these projects and orgs would be useful
The “nature coin” complicates this experiment a lot. Also, it sounds like a source of inherent randomness in policy’s outcomes, i.e., aleatoric uncertainty, which is perhaps rarely or never the case for actual policies and therefore evaluating such policies ethically is unnatural: the brain is not trained to do this. When people discuss and think about the ethics of policies, even the epistemic uncertainty is often assumed away, though it is actually very common that we don’t know whether a potential policy will turn out good or bad.
Due to this, I would say I have a preference for the intervention E because it’s the only one which actually doesn’t depend on the “nature coin”.
How about coordination and multi-scale planning (optimising both for short term and long term) failures? They both have economic value (i.e., economic value is lost when these failures happen), and they are both at least in part due to the selfish, short-term, impulsive motives/desires/”values” of humans.
E.g., I think people would like to buy an AI that manipulated them into following their exercise plan through some tricks, and likewise they would like to “buy” (build) collectively an AI that restricts their selfishness for the median benefit and the benefit of their own children and grandchildren.
(Cross-posted from LW)
Roko would probably call “the most important century” work “building a stable equilibrium to land an AGI/ASI on”.
I broadly agree with you and Roko that this work is important and that it would often make more sense for people to do this kind of work than “narrowly-defined” technical AI safety.
An aspect for why this may be the case that you didn’t mention is money: technical AI safety is probably bottlenecked on funding, but more of the “most important century/stable equilibrium” are more amenable to conventional VC funding, and the funders shouldn’t even be EA/AI x-risk/”most important century”-pilled.
In a comment to Roko’s post, I offered my classification of this “stable equilibrium” systems and work that should be done. Here I will reproduce it, with extra directions that appeared to me later:
Digital trust infrastructure: decentralised identity, secure communication (see Layers 1 and 2 in Trust Over IP Stack), proof-of-humanness, proof of AI (such as, a proof that such and such artifact is created with such and such agent, e.g., provided by OpenAI—watermarking failed, so need new robust solutions with zero-knowledge proofs).
Infrastructure for collective sensemaking and coordination: the infrastructure for communicating beliefs and counterfactuals, making commitments, imposing constraints on agent behaviour, and monitoring the compliance. We at Gaia Consortium are doing this.
Infrastructure and systems for collective epistemics: next-generation social networks (e.g., https://subconscious.network/), media, content authenticity, Jim Rutt’s “info agents” (he advises “three different projects that are working on this”).
Related to the previous item, in particular, to content authenticity: systems for personal data sovereignty (I don’t know any good examples besides Inrupt), dataset verification/authenticity more generally, dataset governance.
The science/ethics of consciousness and suffering mostly solved, and much more effort in biology to understand whom (or whose existence, joy, or non-suffering) the civilisation should value, to better inform the constraints and policy for the economic agents (which is monitored and verified through the infra from item 2.)
Systems for political decision-making and collective ethical deliberation: see Collective Intelligence Project, Policy Synth, simulated deliberative democracy. These types of systems should also be used for governing all of the above layers.
Accelerating enlightenment using AI teachers (Khanmigo, Quantum Leap) and other tools for individual epistemics (Ought) so that the people who participate in governance (the previous item) could do a better job.
The list above covers all the directions mentioned in the post, and there are a few more important ones.
Bing definitely “helps” people to over-anthropomorphise it by actively corroborating that it has emotions (via self-report and over-use of emojis), consciousness, etc.
Alien values
Maximalist desire for world domination
Convergence to a utility function
Very competent strategizing, of the “treacherous turn” variety
Self-improvement
Alien values are guaranteed unless we explicitly impart non-alien ethics to AI, which we currently don’t know how to do, and don’t know (or can’t agree) what that ethics should be like. Next two points are synonyms and are also basically synonyms to “alien values”. The treacherous turn is indeed unlikely (link).
Self-improvement is given, the only question is where is the “ceiling” of this improvement. It might not be that “far”, by some measure, from human intelligence, or that difference may still not allow AI to plan that far ahead due to the intrinsic unpredictability of the world. So the world may start to move extremely fast (see below), but the horizon of planning and predictability of that movement may not be longer than it is now (or it could be even shorter).
For a given operationalization of AGI, e.g., good enough to be forecasted on, I think that there is some possibility that we will reach such a level of capabilities, and yet that this will not be very impressive or world-changing, even if it would have looked like magic to previous generations. More specifically, it seems plausible that AI will continue to improve without soon reaching high shock levels which exceed humanity’s ability to adapt.
I think you implicitly underestimate the cost of coordination among humans. Huge corporations are powerful but also very slow to act. AI corporations will be very powerful and also very fast and potentially very coherent in their strategy. This will be a massive change.
The two measures you quoted may be “short lived”, or maybe they could (if successful, which Acemoglu himself is very doubtful about) send the economy and the society on a somewhat different trajectory which may have rather different eventualities (including in terms of meaning) than if these measures are not applied.
I agree that developing new ideas in the social, psychological, and philosophical domains (the domain of meaning; may also be regarded as part of “psychology”) is essential. But it could only be successful in the context of the current technological, social, and economic reality (which may be “set in motion” by other economic and political measures).
For example, currently, a lot of people seem to derive their meaning in life from blogging on social media. I can relatively easily imagine that this will become a dominant source of meaning for most of the world’s population. Without judging whether this is “good” or “bad” meaning in some grand scheme of things and the effects of this, discussing this seriously is contingent on the existence of social media platforms and their embeddedness in society and the economy.
Hello Agustín, thanks for engaging with our writings and sharing your feedback.
Regarding the ambitiousness, low chances of overall success, and low chances of uptake by human developers and decision-makers (I emphasize “human” because if some tireless near-AGI or AGI comes along it could change the cost of building agents for participation in the Gaia Network dramatically), we are in complete agreement.
But notice Gaia Network could be seen as a much-simplified (from the perspectives of mathematics and Machine Learning) version of Davidad’s OAA, as we framed it in the first post. Also, Gaia Network tries to leverage (at least approximately and to some degree) the existing (political) institutions and economic incentives. In contrast, it’s very unclear to me how the political economy in the “OAA world” could look like, and what is even a remotely plausible plan for switching from the incumbent political economy of the civilisation to OAA, or “plugging” OAA “on top” of the incumbent political economy (and hasn’t been discussed publicly anywhere, to the best of our knowledge). We also discussed this in the first post. Also, notice that due to its extreme ambitiousness, Davidad doesn’t count on humans implementing OAA with their bare hands, it’s a deal-breaker if there isn’t an AI that can automate 99%+ of technical work needed to convert the current science into Infra-Bayesian language.[1] And yes, the same applies to Gaia Network: it’s not feasible without massive assistance from AI tools that can do most of the heavy lifting. But if anything, this reliance on AI is less extreme in the case of Gaia Network than in the case of OAA.
The above makes me think that you should therefore be even more skeptical of OAA’s chances of success than you are about Gaia’s chances. Is this correct? If not, what do you disagree about in the reasoning above, or what elements of OAA make you think it’s more likely to succeed?
Adoption
The “cold start” problem is huge for any system that counts on network effect, and Gaia Network is no exception. But this also means that the cost of convincing most decision-makers (businesses, scientists, etc.) to use the system is far smaller than the cost of convincing the first few, multiplied by the total number of agents. We have also proposed how various early adopters could get value out of the “model-based and free energy-minimising way” of doing decision-making (we don’t need the adoption of Gaia Network right off the bat, more on this below) very soon, in absolutely concrete terms (monetary and real-world risk mitigation) in this thread.
In fact, we think that if there are sufficiently many AI agents and decision intelligence systems that are model-based, i.e., use some kinds of executable state-space (“world”) models to do simulations, hypothesise counterfactually about different courses of actions and external conditions (sometimes in collaboration with other agents, i.e., planning together), and deploy regularisation techniques (from Monte Carlo aggregation of simulation results to amortized adversarial methods suggested by Bengio on slide 47 here) to permit compositional reasoning about risk and uncertaintly that scales beyond the boundary of a single agent, the benefits of collaborative inference of the most accurate and well-regularised models will be so huge that something like Gaia Network will emerge pretty much “by default” because a lot of scientists and industry players will work in parallel to build some versions and local patches of it.
Blockchains, crypto, DeFi, DAOs
I understand why the default prior when hearing anything about crypto, DeFi, and DAOs now is that people who propose something like this are either fantaseurs, or cranks, or, worse, scammers. That’s unfortunate to everyone who just wants to use the technical advances that happen to be loosely associated with this field, which now includes almost anything that has to do with cryptography, identity, digital claims, and zero-knowledge computation.
Generally speaking, zero-knowledge (multi-party) computation is the only solution to make some proofs (of contribution, of impact, of lack of deceit, etc.) without compromising privacy (e.g., proprietary models, know-how, personal data). The ways to deal with this dilemma “in the real world” today inevitably come down to some kind of surveillance which many people become very uneasy about. For example, consider the present discussion of data center audits and compute governance. It’s fine with me and most other people except for e/accs, for now, but what about the time when the cost of training powerful/dangerous models will drop so much that anyone can buy a chip to train the next rogue AI for 1000$? How does compute governance look in this world?
Governance
I’m also skeptical of the theory of change. Even if AI Safety timelines were long, and we managed to pull this Herculean effort off, we would still have to deal with problems around AI Safety governance.
I don’t think AI Safety governance is that special among other kinds of governance. But more generally on this point, of course, governance is important, and Gaia Network doesn’t claim to “solve” it; rather, it plans to rely on some solutions developed by other projects (see numerous examples in CIP ecosystem map, OpenAI’s “Democratic Inputs to AI” grantees, etc.).
We just mention in passing incorporating preferences of system’s stakeholders into Gaia agents’ subjective value calculations (i.e., building reward models for these agents/entities, if you wish), but there is a lot to be done there: how the preferences of the stakeholders are aggregated and weighted, who can claim to be a stakeholders of this or that system in the first place, etc. Likewise, on the general Gaia diagram in the post, there is a small arrow from “Humans and collectives” box to “Decision Engines” box labelled “Review and oversight”, and, as you can imagine, there is a lot to be going on there as well.
Why would AGI companies want to stick to this way of developing systems?
IDK, convinced that this is a safe approach? Being coerced (including economically, not necessary by force) by the broader consensus of using such Gaia Network-like systems? This is a collective action problem. This question could be addressed to any AI Safety agenda and the answer would be the same.
Moloch
It’s also the case that this project also claims to be able to basically be able to slay Moloch[3]. This seems typical of solutions looking for problems to solve, especially since apparently this proposal came from a previous project that wasn’t related to AI Safety at all.
I wouldn’t say that we “claim to be able to slay Moloch”. Rafael is more bold in his claims and phrasing than me, but I think even he wouldn’t say that. I would say that the project looks very likely to help to counteract Molochian pressures. But this seems to me almost a self-evident statement, given the nature of the proposal.
Compare with Collective Intelligence Project. It has started with the mission to “fix governance” (and pretty much “help to counteract Moloch” in the domain of political economy, too, they barely didn’t use this concept, or maybe they even did, I don’t want to check it now), and now they “pivoted” to AI safety and achieved great legibility on this path: e.g., they partner with OpenAI, apparently, on more than one project now. Does this mean that CIP is a “solution looking for a problem”? No, it’s just the kind of project that naturally lends to helps both with Moloch and AI safety. I’d say the same could be said of Gaia Network (if it is realised in some forms) and this lies pretty much in plain sight.
Furthermore, this shouldn’t be surprising in general, because AI transition of the economy is evidently an accelerator and a risk factor in the Moloch model, and therefore these domains (Moloch and AI safety) almost merge in my overall model of risk. Cf. Scott Aaronson’s reasoning that AI will inevitably be in the causal structure of any outcome of this century so “P(doom from AI)” is not well defined; I agree with him and only think about “P(doom)” without specification what this doom “comes from”. Again, note that it seems that most narratives about possible good outcomes (take OpenAI’s superalignment plan, Conjecture’s CoEm agenda, OAA, Gaia Network) all rely on developing very advanced (if not superhuman) AI along the way.
- ^
Notice here again: you mention that most scientists don’t know about Bayesian methods, but perhaps at least two orders of magnitude still fewer scientists have even heard of Infra-Bayesianism, let alone being convinced it’s a sound and a necessary methodology for doing science. Whereas for Bayesianism, from my perspective, it seems there is quite a broad consensus of its soundness: there are numerous pieces and even books written about how P-values are a bullshit value of doing science and that scientists should take up (Bayesian) causal inference instead.
There are a few notable voices that dismiss Bayesian inference, for example, David Deutsch, but then no less notable voices, such as Scott Aaronson and Sean Carroll (of the people that I’ve heard, anyway), that dismiss Deutsch’s dismissal in turn.
- ^
This post has led me to this idea: “Workshop (hackathon, residence program, etc.) about for-profit AI Safety projects?”
Fertility rate may be important but to me it’s not worth restricting (directly or indirectly) people’s personal choices for.
This is a radical libertarian view that most people don’t share. Is it worth restricting people’s access to hard drugs? Let’s abstract for a moment from the numerous negative secondary effects that come with the fact that hard drugs are illegal, as well as from the crimes committed by drug users: if we can imagine that hard drugs could be just eliminated from Earth completely, with a magic spell, should we do it, or we “shouldn’t restrict people’s choices”? With AI romantic partners, and other forms of tech, we do have a metaphorical magic wand: we could decide whether such products ever get created or not.
A lot of socially regressive ideas have been justified in the name of “raising the fertility rate” – for example, the rhetoric that gay acceptance would lead to fewer babies (as if gay people can simply “choose to be straight” and have babies the straight way).
The example that you give doesn’t work as evidence for your argument at all, due to the direct disanalogy: the “young man” from the “mainline story” which I outlined could want to have kids in the future or even wants to have kids already when he starts his experiment with the AI relationship, but his experience with the AI partner will prevent him from realising this desire and value over his future life.
I think it’s better to encourage people who are already interested in having kids to do so, through financial and other incentives.
Technology, products, and systems are not value-neutral. We are so afraid of consciously shaping our own values that we are happy to offload this to the blind free market whose objective is not to shape values that reflectively endorse the most.
There are many more interventions that might work on decades-long timelines that you didn’t mention:
Collective intelligence/sense-making/decision-making/governance/democracy innovation (and it’s introduction in organisations, communities, and societies on larger scales), such as https://cip.org
Innovation in social network technology that fosters better epistemics and social cohesion rather than polarisation
Innovation in economic mechanisms to combat the deficiencies and blind spots of free markets and the modern money-on-money return financial system, such as various crypto projects, or https://digitalgaia.earth
Fixing other structural problems of the internet and money infrastructure that exacerbate risks: too much interconnectedness, too much centralisation of information storage, money is traceless, as I explained in this comment. Possible innovations: https://www.inrupt.com/, https://trustoverip.org/ , other trust-based (cryptocurrency) systems.
Other infrastructure projects that might address certain risks, notably https://worldcoin.org, albeit this is a double-edged sword (could be used for surveillence?)
OTOH, fostering better interconnectedness between humans and humans to computers, primarily via brain-computer interfaces such as Neuralink. (Also, I think in mid- to long-term, human-AI merge is only viable “good” outcome for humanity at least.) However, this is a double-edged sword (could be used by AI to manipulate humans or quickly take over humans?)
I think it’s better not to increase the number of distinct slack spaces without necessity. We can create a channel for independent researchers in the AI Alignment slack (see https://coda.io/@alignmentdev/alignmentecosystemdevelopment)
Except that RSPs don’t concern with long-term economic, social, and political implications. The ethos of AGI labs is to assume, for the most part, that these things will sort out themselves, and they only need to check technical and momentary implications, i.e., do “evals”.
The public should push for “long-term evals”, or even mandatory innovation in political and economic systems coupled with the progress in AI models.
The current form of capitalism is simply unprepared for autonomous agents, no amount of RLHF and “evals” will fix this.