How about reducing the number of catered meals while increasing support for meals outside the venue? Silly example: someone could fill a hotel room with Soylent so that everyone can grab liquid meals and go chat somewhereāsort of a ābaguettes and hummusā vibe. Or as @Matt_Sharp pointed out, we could reserve nearby restaurants. No idea if these exact plans are feasible, but I can imagine similarly scrappy solutions going well if planned by actual logistics experts.
Thanks so much for your work and this information!
tae šø
Iām having an ongoing discussion with a couple professors and a PhD candidate in AI about āThe Alignment Problem from a Deep Learning Perspectiveā by @richard_ngo, @Lawrence Chan, and @SoerenMind. They are skeptical of ā3.2 Planning Towards Internally-Represented Goals,ā ā3.3 Learning Misaligned Goals,ā and ā4.2 Goals Which Motivate Power-Seeking Would Be Reinforced During Trainingā. Hereās my understanding of some of their questions:
The argument for power-seeking during deployment depends on the model being able to detect the change from the training to deployment distribution. Wouldnāt this require keeping track of the distribution thus far, which would require memory of some sort, which is very difficult to implement in the SSL+RLHF paradigm?
What is the status of the model after the SSL stage of training?
How robust could its goals be?
Would a model be able to know:
what misbehavior during RLHF fine-tuning would look like?
that it would be able to better achieve its goals by avoiding misbehavior during fine-tuning?
Why would a model want to preserve its weights? (Sure, instrumental convergence and all, but whatās the exact mechanism here?)
To what extent would all these phenomena (situationally-aware reward hacking, misaligned internally-represented goals, and power-seeking behaviors) show up in current LLMs (say, GPT-4) vs. current agentic LLM-based systems (say, AutoGPT) vs. different future systems?
Do we get any evidence for these arguments from the fact that existing LLMs can adopt goal-directed personas?
Iām guessing this has been discussed in the animal welfare movement somewhere
Yep, The Sexual Politics of Meat by Carol J. Adams is the classic Iām aware of.
I recorded the rough audio and passed it along to the audio editor, but I havenāt heard back since then :(
Hi! I relate so much to you. Iām seven years older than you and Iām pretty happy with how my life is going, so although Iām no wise old sage, I think I can share some good advice.
Iāve also been involved in EA, Buddhism, veganism, minimalism, sustainable fashion, etc. from a young age, plus I was part of an Orthodox Christian community as a teenager (as I assume you are in Greece).
So, hereās my main advice.
The philosophies of EA, Buddhism, etc. are really really morally demanding. Working from the basic principles of these philosophies, it is difficult to find reasons to prioritize your own wellbeing; there are only pragmatic reasons such as ādevote time and money to your own health so that you can work more effectively to help othersā. Therefore, if you predominantly engage in these communities through the philosophy, you will be exhausted.
So, instead of going down internet rabbit holes and reading serious books, engage with the people in these communities. Actual EAs goof around at parties and write stories. Actual Buddhists have silly arguments at nice restaurants and go on long treks through the mountains. While good philosophies are optimized to be hard to argue with, good communities are optimized to be healthy and sustainable.
Iām guessing you donāt have strong EA and Buddhist communities near you, though. Same here. In that case, primarily engage in other communities instead. When I was your age (ha that sounds ridiculous), I was deeply involved in choir. Would highly recommend! Having fun is so important to balance out the philosophies that can consume your life if you let them.
In non-EA non-Buddhist communities, it might feel like youāre the only one who takes morality seriously, and that can be lonely. Personally, I gravitate toward devout religious friends, because theyāre also trying to confront selfishness. Just make sure that you donāt go into depressing rabbit holes together.
Of course, there are nice virtual EA and Buddhist communities too. They canāt fully replace in-person communities, though. Also, people in virtual communities are more likely to only show their morally intense side.
I hope this helps! Youāre very welcome to DM me about anything. Iāll DM you first to get the conversation going.
P. S. Youāve got soooo much time to think about monasticism, so thereās no reason to be concerned about the ethics of it for now, especially since the world could change so much by the time we retire! Still, just for the philosophical interest of it, Iām happy to chat about Buddhist monasticism if you like. Having lived at a monastery for several months and written my undergrad thesis on a monastic text, Iāve got some thoughts :)
General information about people in low-HDI countries to humanize them in the eyes of the viewer.
Similar for animals (except not āhumanizingā per se!). Spreading awareness that e.g. pigs act like dogs may be a strong catalyst for caring about animal welfare. Would need to consult an animal welfare activism expert.
My premise here: it is valuable for EAs to viscerally care about others (in addition to cleverly working toward a future that sounds neat).
Yes, I am pretty amused about this
Iāll just continue my anecdote! As it happens, the #1 concern that my friend has about EA is that EAs work sinisterly hard to convince people to accept the narrow-minded longtermist agenda. So, the frequency of ads itself increases his skepticism of the integrity of the movement. (Another manifestation of this pattern is that many AI safety researchers see AI ethics researchers as straight-up wrong about what matters in the broader field of AI, and therefore need to be convinced rather than collaborated with.)
(Edit: the above paragraph is an anecdote, and Iām speaking generally in the following paragraphs)
I think it is quite fair for someone with EA tendencies, who is just hearing of EA for the first time through these ads, to form a skeptical first impression of a group that invests heavily in selling an unintuitive worldview.
I strongly agree that itās a good sign if a person investigates such things instead of writing them off immediately, indicating a willingness to take unusual ideas seriously. However, the mental habit of openness/ācuriosity is also unusual and is often developed through EA involvement; we canāt expect everyone to come in with full-fledged EA virtues.
Sure! Thank you very much for your, ahem, forethought about this complicated task. Please pardon the naive post about a topic that you all have worked hard on already :)
These are excellent answers, thanks so much!
As more and more students get interested in AI safety, and AI-safety-specific research positions fail to open up proportionally, I expect that many of them (like me) will end up as graduate students in mainstream ethical-AI research groups. Resources like these are helping me to get my bearings.
Thanks very much, that helps!
Adding more not to defend myself, but to keep the conversation going:
I think that many Enlightenment ideas are great and valid regardless of their creatorsā typical-for-their-time ideas.
Education increasingly includes rather radical components of critical race theory. Students are taught that if someone is racist, then all of their political and philosophical views are tainted. By extension, many people learn that the Enlightenment itself is tainted. Like Charles, I think that this āproduces misguided perspectivesā.
Iāmāapparently badlyātrying to communicate the following. These students, who have been taught that the Enlightenment is tainted by association with racism, who (reasonably!) havenāt bothered to thoroughly research this particular historical movement to come to their own conclusions, who may totally make great EAs, would initially be turned off.
Itās quite plausible that it shouldnāt be the case that Enlightenment aesthetics might turn people off. But I think this is the case, and I argue that itās likely more important to make a good first impression than to take a stand in favor of a particular historical movement.
Hope that makes sense!
Could someone who downvoted please explain which of these premises you disagree with?
Short version: if we can avoid it, letās not filter potential EAs by the warmth of their feelings toward a specific group of historical figures (especially because history education is inevitably biased)
I actually wouldnāt know where to find a liberal student who respects classics (let alone āour cultural heritageā) at my large American university, after four years in the philosophy department!
Yes, these are great reasons to take inspiration from the Enlightenment!
The point I most want to get across is that, by using Enlightenment aesthetics, EAs could needlessly open themselves up to negative perception.
If EAs use Enlightenment aesthetics more, then EA will be associated with the Enlightenment more.
Regardless of their positive qualities, Enlightenment philosophers racked up plenty of negative qualities among them. Maybe there were 10x as many purely virtuous ones as problematic ones; maybe every problematic one made contributions that vastly outweighed their issues; nonetheless, there are some problems.
People who would otherwise engage with EA might have heard of enough problems that theyād be put off by Enlightenment associations entirely. (I suspect many of my social-justice-y friends would have this reaction.)
Hereās the more nebulous point. I hinted in my original comment that I take issue with the ārational individualistic actorā view. This alone puts me off Enlightenment aesthetics, because I think that particular view is especially dangerous considering how innocent it looks. But thatās a whole big discussion, and I respect the other side! The relevant part here is just that, anecdotally, at least one EA isnāt a huge Enlightenment fan.
Yeah, the magnitude of the problem depends on the empirical question of how many people associate the Enlightenment with racism and such.
Descartesā moral circle issue is that he believed animals have no moral standing whatsoever, so he enthusiastically practiced vivisection (dissecting animals while they were still alive).
Weād need to be really careful.
The Enlightenment led to good foundational ideas of EA, but it was also full of philosophers who conceptualized humans as individualistic rational actors, excluded pretty much everybody except for white men from the moral circle, and advocated for constant growth with no regard for sustainability (e.g. Immanuel Kant, Rene Descartes, Adam Smith).
I do think historical aesthetics are great (see my other comment on this post), but I think we should stick to historical art that isnāt so closely tied to questionable philosophy.EDIT: I see how this came across differently than I intended! I do not mean that we should cancel the Enlightenment. Please see child comments for explanation.
Iām more inspired by the āaltruisticā aesthetic than the āeffectiveā aesthetic.
āEffectiveā blends into the Silicon Valley productivity/āefficiency crowd. While thereās a lot to appreciate about the Bay Area, Iād prefer not to tie EA to that culture.
On the other hand, there are truly beautiful exemplars of altruism throughout history and around the world.
Personally, I associate altruism with AvalokiteÅvara. Art portraying him is colorful and full of details, which, to me, represents that Effective Altruism can bridge all kinds of cultures, theories, and life experiences. Hereās why he has so many heads and arms:
One prominent Buddhist story tells of AvalokiteÅvara vowing never to rest until he had freed all sentient beings from saį¹sÄra. Despite strenuous effort, he realizes that many unhappy beings were yet to be saved. After struggling to comprehend the needs of so many, his head splits into eleven pieces. AmitÄbha, seeing his plight, gives him eleven heads with which to hear the cries of the suffering. Upon hearing these cries and comprehending them, AvalokiteÅvara tries to reach out to all those who needed aid, but found that his two arms shattered into pieces. Once more, AmitÄbha comes to his aid and invests him with a thousand arms with which to aid the suffering multitudes.[34]
Iām gonna need help coming up with more examples of historical altruistic artā¦ Civil rights art from the US? (I love this painting of Harriet Tubman reaching out to the viewer.) Some Christian saints?
Try and sell me on AGI safety if Iām a social justice advocate! Thatās a big one I come across.
Please accept my delayed gratitude for the comprehensive response! The conversation continues with my colleagues. The original paper, plus this response, have become pretty central to my thinking about alignment.