Have you tried cooking your best vegan recipes for others? In my experience sometimes people ask for the recipe and make it for themselves later, especially health-conscious people. For instance, I really like this vegan pumpkin pie that’s super easy to make: https://itdoesnttastelikechicken.com/easy-vegan-pumpkin-pie/
Sean Sweeney
Interesting idea, thanks for putting it out there. I’m currently trying to figure out better answers to some of the things you mentioned (at least “better” in terms of more in-line with my own intuitions). For example, I’ve been working on incorporating apparently non-consequentialist considerations into a utilitarian framework:
I’m currently doing this work unpaid and independently. I don’t have a Patreon page for individuals to support it directly, in part because the lack of upvotes on my work has indicated little interest. If you’d like to support my work, though, please consider buying my ebook on honorable speech:
Honorable Speech: What Is It, Why Should We Care, and Is It Anywhere to Be Found in U.S. Politics?
Thanks!
I admit I get a bit lost in reading your comments as to what exactly you want me to respond to, so I’m going to try to write it out in a numbered list. Please correct/add to this list as you see fit and send it back to me and I’ll try to answer your actual points rather than what I think they are if I have them wrong:
Explain how you think an AGI system that has sufficient capabilities to follow your “conscience calculator” methodology wouldn’t have sufficient capabilities to follow a simple single sentence command from a super-user human of good intent, such as, “Always do what a wise version of me would want you to do.”
Justify that going through the exercise of manually writing out conscience breaches and assigning formulas for calculating their weights could speed up a future AGI in figuring out an optimal ethical decision making system for itself. (I’m taking it as a given that most people would agree it’d be good, i.e., generally yield better results in the world, for an AGI to have a consistent ethical decision making system onboard.)
#1 was what I was trying to get at with my last reply about how you could use a “weak AI” (something that’s less capable than an agentic AGI) to do the “conscience calculator” methodology and then just output a go/no go response to an inner aligned AGI as to what decision options it was allowed to take or not. The AGI would come up with the decision options based on some goal(s) it has, such as doing what a user asks of it, e.g., “make me lots of money!” The AGI would “brainstorm” possible paths to make lots of money and the “weak AI” would come back with a go/no go on a certain path because, for instance, it doesn’t involve or does involve stealing. Here I’ve been trying to illustrate that an AI system that had sufficient capabilities to follow my “conscience calculator” methodology wouldn’t need to have sufficient capabilities to follow a broad super-user command such as “Always do what a wise version of me would want you to do.”
Of course, to be useful, the AGI needs to be able to follow a non-super-user’s, i.e., a user’s, commands reasonably well, such as figuring out what the user means by “make me lots of money!” The crux, I think, is that I see “make me lots of money” as a significantly simpler concept that “always do what the wise me would want.” And basically what I’m trying to do with my conscience calculator is provide a framework to make it possible for an AGI of limited abilities to straight off the bat calculate what “wise me” would want with a sufficiently high accuracy for me to not be too worried about really bad outcomes. Do I have a lot of work to do to get to this goal? Yes. I have to define the conscience breaches more precisely (something I mentioned in my post and that you made reference to in your comment), and assign “wise me” formulas for conscience weights, then test the system on actual AI’s as they get closer and closer to AGI to make sure it consistently works and any bugs can be ironed out before it’d be used as actual guard rails for a real world AGI agent.
Regarding #2, it sounds again like you’re expecting early AGI’s to be more capable than I do:
What is latent in human text
When I personally try to figure new things out, such as a consistent system of ethics an AGI could use, I’ll come up with some initial ideas, then read some literature, then update my ideas, which then might point me to new literature I should read, so I’ll read that, and keep going back and forth between my own ideas and the literature when I get stuck with my own ideas. This seems like a much more efficient process for me than simply trying to figure out everything myself based on what I know right now, or of trying to read all possible related literature and then decide what I think from there.
An AGI, though, should be able to read all possible literature very quickly. It seems likely that it would do this to be able to most quickly come up with a list of hypotheses (its own ideas) to test. The further anything is from the “right” answer in the literature, and the lesser the variety of “wrong’ ideas explored there, the more the AGI will have to work to come up with the “right” answer itself.[1] So at the very least, I hope to contribute to the variety of “wrong” ideas in the literature, but of course I’m aiming for something closer to the “right’ answer than what’s currently out there.
I’m of the opinion there’s a good chance (and I’d take anything higher than, say, 1 in 10,000 as a “good” chance when we’re talking about potentially horrible outcomes) someone “bad” will let loose a not-so-well-aligned AGI before we have super-well-aligned (both inner and outer aligned) AGI’s ready to autonomously defend against them.[2] Since my expertise is more well-suited for outer alignment than anything else in the alignment space, if I can make a tiny contribution towards speeding up outer alignment and making good AGI’s more likely to win these initial battles, great.
I’ll try to clarify my vision:
For a conscience calculator to work as a guard rail system for an AGI, we’ll need an AGI or weak AI to translate reality into numerical parameters: first identifying which conscience breaches apply in a certain situation, drawing from the list in Appendix A, and then estimating the parameters that will go into the “conscience weight” formulas (to be provided in a future post)[1] to calculate the total conscience weight for a given decision option. The system should choose the decision option(s) with the minimum conscience weight. So I’m not saying, “Hey, AGI, don’t make any of the conscience breaches I list in Appendix A, or at least minimize them.” I’m saying, “Hey, human person, bring me that weak AI that doesn’t even really understand what I’m talking about, and let’s have it translate reality into the parameters it’ll need for calculating, using Appendix A and the formulas I’ll provide, what the conscience weights are for each decision option. Then it can output to the AGI (or just be a module in the AGI) which decision option or options have the minimum, or ideally zero, total conscience breach weight. And hopefully those people who’ve been worrying about how to align AGI’s will be able to make the decision option(s) with the minimum conscience breach weight binding on the AGI so it can’t choose anything else.”
Basically, I’m trying to come up with a system to align an AGI to once people figure out how to rigorously align an AGI to anything. It seems to me that people under-estimate how important exactly what to align to will end up being, and/or how difficult it’s going to be to come up with the specifications on what to align to so they generalize well to all possible situations.
Regarding your paragraph 3 about the difficulty of AI understanding our true values:
and that there’s some large probability it implies preventing (human and nonhuman) tragedies in the meantime…
Personally, I’m not comfortable with “large” probabilities of preventing tragedies—people could say that’s the case for “bottom up” ML ethics systems if they manage to achieve >90% accuracy and I’d say, “Oh, man, we’re in trouble if people let an AGI loose thinking that’s good enough.” But this is just a gut feel, really—maybe the first AGI’s will have enough “common sense” to generalize well and not do the big unethical bad stuff. I’d rather not bank on that, though. My work for AI’s is geared first and foremost towards reducing risks from the first alignable agentic AGI’s to be let out in the world.
Btw, I think there are a couple of big holes in the ethics literature, that’s why I think my work could help speed up an AGI figuring out ethics for itself:
There’ve been very few attempts to quantify ethics and make it calculable
There’s an under-appreciation, or at least under-emphasis, on the importance of personal responsibility for longterm human well-being
I hope this clears some things up—if not, let me know, thanks!
- ^
Example parameters include people’s ages and life expectancies, and pain levels they may experience.
(Also, this quote looks like a rationalization/sunk-cost-fallacy to me; as I’m not you, I can’t say whether it is for sure. But if I seemed (to someone) to do this, I would want that someone to tell me, so I’m telling you.)
I do appreciate you calling it like you see it, thank you! I don’t think I’m making a rationalization/sunk-cost-fallacy here, but I could be wrong—I seem to see things much differently than the average EA Forum/LessWrong reader as evidenced by the lack of upvotes for my work on trying to figure out how to quantify ethics and conscience for AI’s.
I think perhaps our main point of disagreement is how easy we think it’ll be for an AGI to (a) understand the world well enough to function at a human level over many domains, and (b) understand from our words and actions what we humans really want (what we deeply value rather than just surface value). I think the latter will be much more difficult.
Maybe my model for how an AGI would go about figuring out human values and ethics and conscience is flawed, but it seems like it would be efficient for an AGI to read the literature and then form its own best hypotheses and test them. So here I’m trying to contribute to the literature to speed up its process (that’s not my only motivation for my posts, but it’s one).
FYI, the above reply is in response to your original reply. I’ll type up a new reply to your edited reply at some later time, thanks.
Ah, I see, thank you for the clarification. I’m not sure how the trajectory of AGI’s will go, but my worry is that we’ll have some kind of a race dynamic wherein the first AGI’s will quickly have to go on the defensive against bad actors’ AGI’s, and neither will really be at the level you’re talking about in terms of being able to extract a coherent set of human values (which I think would require ASI, since no human has been successful at doing this, as far as I know, but everyday humans can tell what a lie is and what stealing is). If I can create a system that everyday humans can follow, then “everyday” AGI’s should be able to follow it, too, at least to some degree of accuracy. That may be enough to avoid significant collateral damage in a “fight” between some of the first AGI’s to come online. But time will tell… Thanks again for the thought-provoking comment.
Thanks for the comment!
If I understand you correctly, you’re saying that any AGI that could apply the system I’m coming up with could just come up with an idealized system better itself, is that right? I don’t know if that’s true (since I don’t know what the first “AGI’s” will really look like), but even if my work only speeds up an AGI’s ability to do this itself by a small amount, that might still make a big difference in how things turn out in the world, I think.
Creating a “Conscience Calculator” to Guard-Rail an AGI
Thanks for the post. There are some writings out of the Center for Reducing Suffering that may interest you. They tend to take a negative utilitarian view of things, which has some interesting implications, in particular for the repugnant conclusion(s).
I’ve been trying to come up with my own version of utilitarianism that I believe takes better account of the effects of rights and self-esteem/personal responsibility. In doing so, it’s become more and more apparent to me that our consciences are not naturally classic utilitarian in nature, and this is likely from whence some apparent disagreements between utilitarian implications and our moral intuitions (as from our consciences) arise. I’m planning on writing something up soon on how we might go about quantifying our consciences so that they could be used in a quantitative decision making process (as by an AI) rather than trying to make a full utilitarian framework into a decision making framework for an AI. This has some similarities to what is often suggested by Richard Chappell, i.e., that we follow heuristics (in this case, our consciences) when making decisions rather than some “utilitarian calculus.”
Thanks for the post. I just today was thinking through some aspects of expected value theory and fanaticism (i.e., being fanatic about applying expected value theory) that I think might apply to your post. I had read through some of Hayden Wilkinson’s Global Priorities Institute report from 2021, “In defense of fanaticism,” and he brought up a hypothetical case of donating $2000 (or whatever it takes to statistically save one life) to the Against Malaria Foundation (AMF), or giving the money instead to have a very tiny, non-zero chance of an amazingly valuable future by funding a very speculative research project. I changed the situation for myself to consider why would you give $2000 to AMF instead of donating it to try to reduce existential risk by some tiny amount, when the latter could have significantly higher expected value. I’ve come up with two possible reasons so far to not give your entire $2000 to reducing existential risk, even if you initially intellectually estimate it to have much higher expected value:
As a hedge—how certain are you of how much difference $2000 would make to reducing existential risk? If 8 billion people were going to die and your best guess is that $2000 could reduce the probability of this by, say, 1E-7%/year, the expected value of this in a year would be 8 lives saved, which is more than the 1 life saved by AMF (for simplicity, I’m assuming that 1 life would be saved from malaria for certain, and only considering a timeframe of 1 year). (Also, for ease of discussion, I’m going to ignore all the value lost in future lives un-lived if humans go extinct.) So now you might say your $2000 is estimated to be 8 times more effective if it goes to existential risk reduction than malaria reduction. But how sure are you of the 1E-7%/year number? If the “real” number is 1E-8%/year, now you’re only saving 0.8 life in expectation. The point is, if you assigned some probability distribution to your estimate of existential risk reduction (or even increase), you’d find that some finite percentage of cases in this distribution would favor malaria reduction over existential risk reduction. So the intellectual math of fanatical expected value maximizing, when considered more fully, still supports sending some fraction of money to malaria reduction rather than sending it all to existential risk reduction. (Of course, there’s also the uncertainty of applying expected value theory fanatically, so you could hedge that as well if a different methodology gave different prioritization answers.)
To appear more reasonable to people who mostly follow their gut—“What?! You gave your entire $2000 to some pie in the sky project on supposedly reducing existential risk that might not even be real when you could’ve saved a real person’s life from malaria?!” If you give some fraction of your money to a cause other people are more likely to believe is, in their gut, valuable, such as malaria reduction, you may have more ability to persuade them into seeing existential risk reduction as a reasonable cause for them to donate to as well. Note: I don’t know how much this would reap in terms of dividends for existential risk reduction, but I wouldn’t rule it out.
I don’t know if this is exactly what you were looking for, but these seem to me to be some things to think about to perhaps move your intellectual reasoning closer to your gut, meaning you could be intellectually justified in putting some of your effort into following your gut (how much exactly is open to argument, of course).
In regards to how to make working on existential risk more “gut wrenching,” I tend to think of things in terms of responsibility. If I think I have some ability to help save humanity from extinction or near extinction, and I don’t act on that, and then the world does end, imagining that situation makes me feel like I really dropped the ball on my part of responsibility for the world ending. If I don’t help people avoid dying from malaria, I do still feel a responsibility that I haven’t fully taken up, but that doesn’t hit me as hard as the chance of the world ending, especially if I think I have special skills that might help prevent it. By the way, if I felt like I could make the most difference personally, with my particular skill set and passions, in helping reduce malaria deaths, and other people were much more qualified in the area of existential risk, I’d probably feel more responsibility to apply my talents where I thought they could have the most impact, in that case malaria death reduction.
How to Give Coming AGI’s the Best Chance of Figuring Out Ethics for Us
American Philosophical Association (APA) announces two $10,000 AI2050 Prizes for philosophical work related to AI, with June 23, 2024 deadline:
Thanks for the comment and the link to the review paper!
I think most people, including researchers, don’t have a good handle on what self-esteem is, or at least what truly raises or lowers it—I would expect the effect of praise to be weak, but the effect of promoting responsibility for one’s emotions and actions to be strong. The closest to my views on self-esteem that I’ve found so far are those in N. Branden’s “Six Pillars of Self-Esteem”—the six pillars are living consciously, self-acceptance, self-responsibility, self-assertiveness, living purposefully, and personal integrity.
Unfortunately, because many researchers don’t follow this conception of self-esteem, I tend not to trust much research on the real-world effects of self-esteem. Honestly, though, I haven’t done a hard search for any research that uses something close to my conception of self-esteem, and your comment has basically pointed out that I should get on that, so thank you!
The New York Declaration on Animal Consciousness and an article about it:
https://sites.google.com/nyu.edu/nydeclaration/declaration
Sean Sweeney’s Quick takes
Thank you for sharing some of your struggle. I’ve done a fair amount of personal development work in my life, and it greatly helped me get over work-related stress. Perhaps some of the references in this post could help you with your situation.
In particular, I’d recommend Guttormsen’s udemy course, which you can often get on sale for <$20. I hope that helps some.
Thanks for your post. I’m not exactly part of the EA “community”—I’ve never met someone who was an EA in person, but, at least from people’s online presences, it seems like EA leaders are generally thoughtful, earnest, and open to feedback. I hope your post will be some feedback they’ll consider.
From what I’ve seen of this community so far, I suspect that some EA’s reluctance to support your work could stem from a couple of things:
People don’t like to feel “duped”—I’m not saying you’re trying to pull one over on them, but there’s “safety” in just going with what GiveWell or some other EA vetting organization recommends. I know I wouldn’t feel good if I thought my donations were going to support corruption. So maybe think about how you could better establish credibility—perhaps ask some EA’s if this is a factor and what you could do to allay their fears (honestly, this may be tough, since the prevalence of online scams has made many people pretty skeptical of anything they only interact with online, although Remmelt’s comment seems like an example of something that could give one more confidence).
People don’t want to be wrong—this is related to #1, but instead of worrying about whether your organization is legit or not, it’s a question of if it’s the “best” thing for them to support it over other organizations. Again, it probably feels easier to trust that EA vetters know what they’re doing in their analyses of how to do the most good. I like a lot of what I’ve seen of GiveWell, the EA organization I’ve most looked into, but I personally feel they’re missing a couple of big chunks of the puzzle (and the real world is a hard puzzle, in my opinion). One big chunk is something your organization might bring to the table, but which can be difficult to quantify, namely, promoting personal responsibility and, in turn, self-esteem building due to the taking of more responsibility for oneself. I’m not sure what else you could do to help people see that your organization could be “better than its EA numbers” due to this effect, but from my end, I’ll keep trying to convince people in this community (as here) that it’s a real and significant effect, and hopefully it’ll start to catch on at some point (or someone will convince me I’m wrong, which is always a possibility).
Note that these are just my own “outsider” impressions of EA, and I could very well be mistaken, but I hope this comment might be helpful to both you and EA’s in your efforts to do more good.
Thank you for this interesting post, even though I don’t agree with your conclusions.
I believe one key difference between killing someone and letting someone die is its effect on one’s conscience.
If I kill someone, I violate their rights. Even if no one would directly know what I did with the invisible button, I’d know what I did, and that would eat at my conscience, and affect how I’d interact with everyone after that. Suddenly, I’d have less trust in myself to do the right thing (to not do what my conscience strongly tells me not to do), and the world would seem like a less safe place because I’d suspect that others would’ve made the same decision I did, and now might be effectively willing to kill me for a mere $6,000 if they could get away with it.
If I let someone die, I don’t violate their rights, and, especially if I don’t directly experience them dying, there’s just less of a pull on my conscience.
One could argue that our consciences don’t make sense and they should be more inline with classic utilitarianism, but I’d argue that we should be extremely careful about making big changes to human consciences in general without thoroughly thinking through and understanding the full range of the effects of these.
Also, I don’t think use of the term “moral obligation” is optimal, since to me it implies a form of emotional bullying/blackmail: you’re not a good person unless you satisfy your moral obligations. Instead, I’d focus on people being true to their own consciences. In my mind, it’s a question of trying to use someone’s self-hate to “beat goodness into them” versus trying to inspire their inner goodness to guide them because that’s what’s ultimately best for them.
By “self-hate,” I mean hate of the parts of ourselves that we think are “bad person” parts, but are really just “human nature” parts that we can accept about ourselves without that meaning we have to indulge them.