Once upon a time, some people were arguing that AI might kill everyone, and EA resources should address that problem instead of fighting Malaria. So OpenPhil poured millions of dollars into orgs such as EpochAI (they got 9 million). Now 3 people from EpochAI created a startup to provide training data to help AI replace human workers. Some people are worried that this startup increases AI capabilities, and therefore increases the chance that AI will kill everyone.
harfe
However, a model trained to obey the RLHF objective will expect negative reward if decided taking over the world
If an AI takes over the world there is no-one around to give it a negative reward. So the AI will not expect a negative reward for taking over the world
The issue is not whether the AI understands human morality. The issue is whether it cares.
The arguments from the “alignment is hard” side that I was exposed to don’t rely on the AI misinterpreting what the humans want. In fact, superhuman AI assumed to be better at humans at understanding human morality. It still could do things that go against human morality. Overall I get the impression you misunderstand what alignment is about (or maybe you just have a different association to words as “alignment” than me).
Whether a language model can play a nice character that would totally give back the dictatorial powers after takeover is barely any evidence whether the actual super-human AI system will step back from its position of world dictator after it has accomplished some tasks.
How is that better than individuals just donating to wherever they think makes sense on the margin?
I think the comment already addresses that here:
moreover, rule by committee enables deliberation and information transfer, so that persuasion can be used to make decisions and potentially improve accuracy or competence at the loss of independence.
This article has a lot of downvoting (net karma of 39 from 28)
This does not seem to be an unusual amount of downvoting to me. The net karma is even higher than the number of votes!
As a more general point, I think people should worry less about downvotes on posts with a high net karma.
As for existential risk from AI takeover, I don’t think having a self-sustaining civilization on Mars would help much.
If an AI has completed takeover on earth and killed all humans on earth, taking over Mars too does not sound that hard, especially since the human civilization is likely quite fragile. (There might be some edge cases, where you solve the AI control problem well enough to guarantee that all advanced AIs leave Mars alone, but not well enough for AI to leave Australia alone, but I think scenarios like these are extremely unlikely).
For other existential risks, it might be in principle useful, but practically very difficult. Building a self-sustaining city on Mars will take a lot of time and resources. On the scale of centuries, it seems like a viable option though.
At the same time though I don’t think you mean to endorse 1).
I have read or skimmed some of his posts and my sense is that he does endorse 1). But at the same time he says
critics seem to frequently conflate my arguments with other, simpler positions that can be more easily dismissed.
so maybe this is one of these cases and I should be more careful.
A recent comment says that restriction has been lifted and the website will be updated next week: https://forum.effectivealtruism.org/posts/aBkALPSXBRjnjWLnP/announcing-the-q1-2025-long-term-future-fund-grant-round?commentId=FFFMBth8v7WBqYFzP
the AI won’t ever have more [...] capabilities to hack and destroy infrastructure than Russia, China or the US itself.
Having better hacking capability than China seems like a low bar for super-human AGI. The AGI would need to be better at writing and understanding code than a small group of talented humans, and have access to some servers. This sounds easy if you accept the premise of smarter-than-human AGI.
Merely listing EA under “Memetics adjacence” does not support the claim “is also an avowed effective altruist.”
it’s the arguments you least agree with that you should extend the most charity to
I strongly disagree with flat earthers, but I don’t think that I should extend a lot of charity to arguments for a flat earth.
Also, on a quick skim, I could not find where this is argued for in the linked “I Can Tolerate Anything Except The Outgroup”
Caveat: I consider these minor issues, I hope I don’t come across as too accusatory.
Interesting, why’s that? :)
It seems that the reason for cross-posting was that you personally found it interesting. If you use the EA forum team account, it sounds a bit like an “official” endorsement, and makes the Forum Team less neutral.
Even if you use another account name (eg “selected linkposts”) that is run by the Forum Team, I think there should be some explanation how those linkposts are selected, otherwise it seems like arbitrarily privileging some stuff over other stuff.
A “LinkpostBot” account would be good if the cross-posting is automated (e.g. every ACX article who mentions Effective Altruism).
I also personally feel kinda weird getting karma for just linking to someone else’s work
I think its fine to gain Karma by virtue of linkposting and being an active forum member, I will not be bothered by it and I think you should not worry about that (although i can understand that it might feel uncomfortable to you). Other people are also allowed to link-post.
Personally when I see a linkpost, I generally assume that the author here is also the original author
I think starting the title with [linkpost] fixes that issue.
I think its best to post this under your own account.
there are so many projects which continuously replicate what others have done before; projects with very similar messages, but mostly separate communities, infrastructures and identities.
Some examples would be useful.
have you seen someone that is not top 10 attendee in an EA organization?
Julia Wise, who works at CEA, studied at Bryn Mawr College.
(Another comment of yours makes me think by “top 10 attendee” you mean “has attended a top 10 university”.)
the child is adding an expected value of $27,275 per year in social surplus.
It would take $133,333 per year to raise a child to adulthood for it to not be worthwhile
I think the comparison of “social surplus” to effective donations is mistaken here. A social surplus of $27,275 (in the US) does not save 5 lives, but an effective donation of that size might.
There used to be such a system: https://forum.effectivealtruism.org/posts/YhPWq784eRDr5999P/announcing-the-ea-donation-swap-system It got shut down 7 months ago (see the comments on that post).
Some (or all?) Lightspeed grants are part of SFF: https://survivalandflourishing.fund/sff-2023-h2-recommendations
One outstanding question is at what point AI capabilities are too close to loss of control. We propose to delegate this question to the AI Safety Institutes set up in the U.K., U.S., China, and other countries.
I consider it clickbait if you write “There Is a Solution”, but then say that there are these AI safety institutes that will figure out the crucial details of the solution some time in the future.
I do not recall seeing this usage in AI safety or LW circles. Can you link to examples?