Ideas of posts I could write in comments. Agreevote with things I should write. Don’t upvote them unless you think I should have karma just for having the idea, instead upvote the post when I write it :P
Feel encouraged also to comment with prior art in cases where someone’s already written about something. Feel free also to write (your version of) one of these posts, but give me a heads-up to avoid duplication :)
(some comments are upvoted because I wrote this thread before we had agreevotes on every comment; I was previously removing my own upvotes on these but then I learned that your own upvotes don’t affect your karma score)
Something to try to dispel the notion that every EA thinker is respected/ thought highly of by every EA community member. Like, you tend to hear strong positive feedback, weak positive feedback, and strong negative feedback, but weak negative feedback is kind of awkward and only comes out sometimes
I would really like this. I’ve been thinking a bunch about whether it would be better if we had slightly more bridgewater-ish norms on net (I don’t know the actual structure that underlies that and makes it work), where we’re just like yeah, that person has these strengths, these weaknesses, these things people disagree on, they know it too, it’s not a deep dark secret.
something about the role of emotions in rationality and why the implicit / perceived Forum norm against emotions is unhelpful, or at least not precisely aimed
(there’s a lot of nuance here, I’ll put it in dw)
edit: I feel like the “notice your confusion” meme is arguably an example of emotional responses providing rational value.
the forum should not have a norm against emotional expression
is two separate posts. I’ll probably write it as two posts, but feel free to agree/disagree on this comment to signal that you do/don’t want two posts. (One good reason to want two posts is if you only want to read one of them.)
Take a list of desirable qualities of a non-profit board (either Holden’s or another that was posted recently) and look at some EA org boards and do some comparison / review their composition and recent activity.
edit: I hear Nick Beckstead has written about this too
I was surprised to hear anyone claim this was an applause light. My prediction was that many people would hate this idea, and, well, at time of writing the karma score stands at −2. Sure doesn’t seem like I’m getting that much applause :)
I think the optimal number of most bad things is zero, and it’s only not zero when there’s a tradeoff at play. I think most people will agree in the abstract that there’s a tradeoff between stopping bad actors and sometimes punishing the innocent, but they may not concretely be willing to accept some particular costs in the kind of abusive situations we’re faced with at the moment. So, were I to write a post about this, it would be trying to encourage people to more seriously engage with flawed systems of abuse prevention, to judge how their flaws compare to the flaws in doing nothing.
I post about the idea here partly to get a sense of whether this unwillingness to compromise rings true for anyone else as a problem we might have in these discussions. So far, it hasn’t got a lot of traction, but maybe I’ll come back to it if I see more compelling examples in the wild.
Assuming both false-positives and false-negatives exist at meaningful rates and the former cannot be zeroed while keeping an acceptable FN rate, this seems obviously true (at least to me) and only worthy of a full post if you’re willing to ponder what the balance should be.
ETA: An edgy but theoretically interesting argument is that we should compensate the probably-guilty for the risk of error. E.g., if you are 70 percent confident the person did it, boot them but compensate them 30 percent of the damages that would be fair if they were innocent. The theory would be that a person may be expected to individually bear a brutal cost (career ruin despite innocence), but the benefit (of not allowing people who are 70 percent likely to be guilty be running around in power) accrues to the community from which the person has been booted. So compensation for risk that the person is innocent would transfer some of the cost of providing that benefit to the community. I’m not endorsing that as a policy proposal, mind you...
I think “human-level” is often a misleading benchmark for AI, because we already have AIs that are massively superhuman in some respects and substantially subhuman in others. I sometimes worry that this is leading people to make unwarranted assumptions of how closely future dangerous AIs will track humans in terms of what they’re capable of. This is related to a different post I’m writing, but maybe deserves its own separate treatment too.
A problem with a lot of AI thoughts I have is that I’m not really in enough contact with the AI “mainstream” to know what’s obvious to them or what’s novel. Maybe “serious” AI people already don’t say human-level, or apply a generous helping of “you know what I mean” when they do?
I have an intuition that the baseline average for institutional dysfunction is quite high, and I think I am significantly less bothered by negative news about orgs than many people because I already expect the average organisation (from my experience both inside and outside EA) to have a few internal secrets that seem “shockingly bad” to a naive outsider. This seems tricky to communicate / write about because my sense of what’s bad enough to be worthy of action even relative to this baseline is not very explicit, but maybe something useful could be said.
Things I’ve learned about good mistake culture, no-blame post-mortems, etc. This is pretty standard stuff without a strong EA tilt so I’m not sure it merits a place on the forum, but it’s possible I overestimate how widely known it is, and I think it’s important in basically any org culture.
Something contra “excited altruism”: lots of our altruistic opportunities exist because the world sucks and it’s ok to feel sad about that and/or let down by people who have failed to address it.
Encouraging people to take community health interventions into their own hands. Like, ask what you wish someone in community health would do, and then consider just doing it. With some caveats for unilateralist curse risks.
I think the forum would be better if people didn’t get hit so hard by negative feedback, or by people not liking what they have to say. I don’t know how to fix this with a post, but at least arguing the case might have some value.
I think the forum would be even better if people were much kinder and empathic when giving negative feedback. (I think we used to be better at this?) I find it very difficult to not get hit hard by negative feedback that’s delivered in a way that makes it clear they’re angry with me as a person; I find it relatively easy to not get upset when I feel like they’re not being adversarial. I also find it much easier to learn how to communicate negative feedback in a more considerate way than to learn how to not take things personally. I suspect both of these things are pretty common and so arguing the case for being nicer to each other is more tractable?
Assessments of non-AI x-risk are relevant to AI safety discussions because some of the hesitance to pause or slow AI progress is driven by a belief that it will help eliminate other threats if it goes well.
I tend to believe that risk from non-AI sources is pretty low, and I’m therefore somewhat alarmed when I see people suggest or state relatively high probabilities of civilisational collapse without AI intervention. Could be worth trying to assess how widespread this view is and trying to argue directly against it.
my other quick take, AI Safety Needs To Get Serious About Chinese Political Culture is basically a post idea, but it was substantial enough I put it at the top level rather than have it languish in the comments here. Nevertheless, here it is so I can keep all the things in one place.
“ask not what you can do for EA, but what EA can do for you”
like, you don’t support EA causes or orgs because they want you to and you’re acquiescing, you support them because you want to help people and you believe supporting the org will do that – when you work an EA job, instead of thinking “I am helping them have an impact”, think “they are helping me have an impact”
of course there is some nuance in this but I think broadly this perspective is the more neglected one
If everyone has no idea what other people are funding and instead just donates a scaled down version of their ideal community-wide allocation to everything, what you get is a wealth-weighted average of everyone’s ideal portfolios. Sometimes this is an okay outcome. There’s some interesting dynamics to write about here, but equally I’m not sure it leads to anything actionable.
I’d like to write something about my skepticism of for-profit models of doing alignment research. I think this is a significant part of why I trust Redwood more than Anthropic or Conjecture.
(This could apply to non-alignment fields as well, but I’m less worried about the downsides of product-focused approaches to (say) animal welfare.)
That said, I would want to search for existing discussion of this before I wade into it.
A related but distinct point is that the disvalue of anonymous rumours is in part a product of how people react to them. Making unfounded accusations is only harmful to the extent that people believe them uncritically. There’s always some tension there but we do IMO collectively have some responsibility to react to rumours responsibly, as well as posting them responsibly.
I’d love it if it could include something on the disvalue of rumours too? (My inside view is that I’d like to see a lot less gossip, rumours etc in EA. I may be biased by substantial personal costs that I and friends have experienced from false rumours, but I also think that people positively enjoy gossip and exaggerating gossip for a better story and so we generally want to be pushing back on that usually net-harmful incentive.)
Ideas of posts I could write in comments. Agreevote with things I should write. Don’t upvote them unless you think I should have karma just for having the idea, instead upvote the post when I write it :P
Feel encouraged also to comment with prior art in cases where someone’s already written about something. Feel free also to write (your version of) one of these posts, but give me a heads-up to avoid duplication :)
(some comments are upvoted because I wrote this thread before we had agreevotes on every comment; I was previously removing my own upvotes on these but then I learned that your own upvotes don’t affect your karma score)
Edit: This is now The illusion of consensus about EA celebrities
Something to try to dispel the notion that every EA thinker is respected/ thought highly of by every EA community member. Like, you tend to hear strong positive feedback, weak positive feedback, and strong negative feedback, but weak negative feedback is kind of awkward and only comes out sometimes
I would really like this. I’ve been thinking a bunch about whether it would be better if we had slightly more bridgewater-ish norms on net (I don’t know the actual structure that underlies that and makes it work), where we’re just like yeah, that person has these strengths, these weaknesses, these things people disagree on, they know it too, it’s not a deep dark secret.
something about the role of emotions in rationality and why the implicit / perceived Forum norm against emotions is unhelpful, or at least not precisely aimed
(there’s a lot of nuance here, I’ll put it in dw)
edit: I feel like the “notice your confusion” meme is arguably an example of emotional responses providing rational value.
thinking about this more, I’ve started thinking:
emotions are useful for rationality
the forum should not have a norm against emotional expression
is two separate posts. I’ll probably write it as two posts, but feel free to agree/disagree on this comment to signal that you do/don’t want two posts. (One good reason to want two posts is if you only want to read one of them.)
Take a list of desirable qualities of a non-profit board (either Holden’s or another that was posted recently) and look at some EA org boards and do some comparison / review their composition and recent activity.
edit: I hear Nick Beckstead has written about this too
The Optimal Number of Innocent People’s Careers Ruined By False Allegations Is Not Zero
(haha just kidding… unless? 🥺)
Seems like a cheap applause light unless you accompany it the equivalent stories about how the optimal number of almost any bad thing is not zero.
I was surprised to hear anyone claim this was an applause light. My prediction was that many people would hate this idea, and, well, at time of writing the karma score stands at −2. Sure doesn’t seem like I’m getting that much applause :)
I think the optimal number of most bad things is zero, and it’s only not zero when there’s a tradeoff at play. I think most people will agree in the abstract that there’s a tradeoff between stopping bad actors and sometimes punishing the innocent, but they may not concretely be willing to accept some particular costs in the kind of abusive situations we’re faced with at the moment. So, were I to write a post about this, it would be trying to encourage people to more seriously engage with flawed systems of abuse prevention, to judge how their flaws compare to the flaws in doing nothing.
I post about the idea here partly to get a sense of whether this unwillingness to compromise rings true for anyone else as a problem we might have in these discussions. So far, it hasn’t got a lot of traction, but maybe I’ll come back to it if I see more compelling examples in the wild.
I am confused by the parenthetical.
Assuming both false-positives and false-negatives exist at meaningful rates and the former cannot be zeroed while keeping an acceptable FN rate, this seems obviously true (at least to me) and only worthy of a full post if you’re willing to ponder what the balance should be.
ETA: An edgy but theoretically interesting argument is that we should compensate the probably-guilty for the risk of error. E.g., if you are 70 percent confident the person did it, boot them but compensate them 30 percent of the damages that would be fair if they were innocent. The theory would be that a person may be expected to individually bear a brutal cost (career ruin despite innocence), but the benefit (of not allowing people who are 70 percent likely to be guilty be running around in power) accrues to the community from which the person has been booted. So compensation for risk that the person is innocent would transfer some of the cost of providing that benefit to the community. I’m not endorsing that as a policy proposal, mind you...
I think “human-level” is often a misleading benchmark for AI, because we already have AIs that are massively superhuman in some respects and substantially subhuman in others. I sometimes worry that this is leading people to make unwarranted assumptions of how closely future dangerous AIs will track humans in terms of what they’re capable of. This is related to a different post I’m writing, but maybe deserves its own separate treatment too.
A problem with a lot of AI thoughts I have is that I’m not really in enough contact with the AI “mainstream” to know what’s obvious to them or what’s novel. Maybe “serious” AI people already don’t say human-level, or apply a generous helping of “you know what I mean” when they do?
Google Doc draft: Stop focusing on “human-level” AI
I’ll ask specific people to comment and aim to publish in the next couple of weeks, but I’m happy for any passers-by to offer their thoughts too.
This became When “human-level” is the wrong threshold for AI
I have an intuition that the baseline average for institutional dysfunction is quite high, and I think I am significantly less bothered by negative news about orgs than many people because I already expect the average organisation (from my experience both inside and outside EA) to have a few internal secrets that seem “shockingly bad” to a naive outsider. This seems tricky to communicate / write about because my sense of what’s bad enough to be worthy of action even relative to this baseline is not very explicit, but maybe something useful could be said.
Things I’ve learned about good mistake culture, no-blame post-mortems, etc. This is pretty standard stuff without a strong EA tilt so I’m not sure it merits a place on the forum, but it’s possible I overestimate how widely known it is, and I think it’s important in basically any org culture.
Disclosure-based regulation (in the SEC style) as a tool either for internal community application or perhaps in AI or biosecurity
Something contra “excited altruism”: lots of our altruistic opportunities exist because the world sucks and it’s ok to feel sad about that and/or let down by people who have failed to address it.
edit: relevant prior work:
https://forum.effectivealtruism.org/posts/Nk5nJYPYYheQsZ6zn/impossible-ea-emotions
https://forum.effectivealtruism.org/posts/bkjNa2WAZvqahqpoH/it-s-supposed-to-feel-like-this-8-emotional-challenges-of
Encouraging people to take community health interventions into their own hands. Like, ask what you wish someone in community health would do, and then consider just doing it. With some caveats for unilateralist curse risks.
I think the forum would be better if people didn’t get hit so hard by negative feedback, or by people not liking what they have to say. I don’t know how to fix this with a post, but at least arguing the case might have some value.
I think the forum would be even better if people were much kinder and empathic when giving negative feedback. (I think we used to be better at this?) I find it very difficult to not get hit hard by negative feedback that’s delivered in a way that makes it clear they’re angry with me as a person; I find it relatively easy to not get upset when I feel like they’re not being adversarial. I also find it much easier to learn how to communicate negative feedback in a more considerate way than to learn how to not take things personally. I suspect both of these things are pretty common and so arguing the case for being nicer to each other is more tractable?
very sad that this got downvoted 😭
(jk)
Assessments of non-AI x-risk are relevant to AI safety discussions because some of the hesitance to pause or slow AI progress is driven by a belief that it will help eliminate other threats if it goes well.
I tend to believe that risk from non-AI sources is pretty low, and I’m therefore somewhat alarmed when I see people suggest or state relatively high probabilities of civilisational collapse without AI intervention. Could be worth trying to assess how widespread this view is and trying to argue directly against it.
This one might be for LW or the AF instead / as well, but I’d like to write a post about:
should we try to avoid some / all alignment research casually making it into the training sets for frontier AI models?
if so, what are the means that we can use to do this? how do they fare on the ratio between reduction in AI access vs. reduction in human access?
I made this into two posts, my first LessWrong posts:
Keeping content out of LLM training datasets
Should we exclude alignment research from LLM training datasets?
my other quick take, AI Safety Needs To Get Serious About Chinese Political Culture is basically a post idea, but it was substantial enough I put it at the top level rather than have it languish in the comments here. Nevertheless, here it is so I can keep all the things in one place.
“ask not what you can do for EA, but what EA can do for you”
like, you don’t support EA causes or orgs because they want you to and you’re acquiescing, you support them because you want to help people and you believe supporting the org will do that – when you work an EA job, instead of thinking “I am helping them have an impact”, think “they are helping me have an impact”
of course there is some nuance in this but I think broadly this perspective is the more neglected one
I have a Google Sheet set up that daily records the number of unread emails in my inbox. Might be a cute shortform post.
Some criticism of the desire to be the donor of last resort, skepticism of the standard counterfactual validity concerns.
I think that this already did a decent job, not sure there’s more to say
If everyone has no idea what other people are funding and instead just donates a scaled down version of their ideal community-wide allocation to everything, what you get is a wealth-weighted average of everyone’s ideal portfolios. Sometimes this is an okay outcome. There’s some interesting dynamics to write about here, but equally I’m not sure it leads to anything actionable.
I’d like to write something about my skepticism of for-profit models of doing alignment research. I think this is a significant part of why I trust Redwood more than Anthropic or Conjecture.
(This could apply to non-alignment fields as well, but I’m less worried about the downsides of product-focused approaches to (say) animal welfare.)
That said, I would want to search for existing discussion of this before I wade into it.
Something about the value of rumours and the whisper network
A related but distinct point is that the disvalue of anonymous rumours is in part a product of how people react to them. Making unfounded accusations is only harmful to the extent that people believe them uncritically. There’s always some tension there but we do IMO collectively have some responsibility to react to rumours responsibly, as well as posting them responsibly.
I’d love it if it could include something on the disvalue of rumours too? (My inside view is that I’d like to see a lot less gossip, rumours etc in EA. I may be biased by substantial personal costs that I and friends have experienced from false rumours, but I also think that people positively enjoy gossip and exaggerating gossip for a better story and so we generally want to be pushing back on that usually net-harmful incentive.)
I have a doc written on this that I wanted to make a forum post out of but haven’t gotten to—happen to share.
I enjoy a lot that this document will be shared in private. Great meta comment.