Flowers are selective about the pollinators they attract. Diurnal flowers must compete with each other for visual attention, so they use colours to crowd out their neighbours. But flowers with nocturnal anthesis are generally white, as they aim only to outshine the night.
rime
This didn’t end up helping me, but I upvoted because I want to see more posts where people talk about how they made progress on their own energy problems. I’m glad you found something that helped!
I think it depends on what role you’re trying to play in your epistemic community.
If you’re trying to be a maverick,[1] you’re betting on a small chance of producing large advances, and then you want to be capable of building and iterating on your own independent models without having to wait on outside verification or social approval at every step. Psychologically, the most effective way I know to achieve this is to act as if you’re overconfident.[2] If you’re lucky, you could revolutionise the field, but most likely people will just treat you as a crackpot unless you already have very high social status.
On the other hand, if you’re trying to specialise in giving advice, you’ll have a different set of optima on several methodological trade-offs. On my model at least, the impact of a maverick depends mostly on the speed at which they’re able to produce and look through novel ideas, whereas advice-givers depend much more on their ability to assign accurate probability estimates on ideas that already exist. They have less freedom to tweak their psychology to feel more motivated, given that it’s likely to affect their estimates.
- ^
“We consider three different search strategies scientists can adopt for exploring the landscape. In the first, scientists work alone and do not let the discoveries of the community as a whole influence their actions. This is compared with two social research strategies, which we call the follower and maverick strategies. Followers are biased towards what others have already discovered, and we find that pure populations of these scientists do less well than scientists acting independently. However, pure populations of mavericks, who try to avoid research approaches that have already been taken, vastly outperform both of the other strategies.”[3]
- ^
I’m skipping important caveats here, but one aspect is that, as a maverick, I mainly try to increase how much I “alieve” in my own abilities while preserving what I can about the fidelity of my “beliefs”.
- ^
I’ll note that simplistic computer simulations of epistemic communities that have been specifically designed to demonstrate an idea is very weak evidence for that idea, and you’re probably better off thinking about it theoretically.
- ^
Prime work. Super quick read and I gained some value out of it. Thanks!
You mean threats? I’m not sure what you’re pointing towards with the terrorist thing.
I have meta-uncertainty here. I think I could think of realistic-ish scenarios if I gave it enough thought. (Though I’d have to depreciate the probability in proportion to how much effort I spend searching for it.) Tbh, I just haven’t given it enough thought. Do you have any recs for quick write-ups of some scenarios?
[I was inspired to suggest this by the downvotes on this comment, but it’s a problem I’ve seen more generally.]
The agree/disagree voting dimension is amazing, but it seems to me like people haven’t properly uncoupled it from karma yet. One way to help people understand the differences could be to introduce a confirmation box that pops up whenever you try to vote, that you can opt out of from your profile settings.
This box could contain something like the following guidelines:
Only vote on the karma dimension based on whether you personally benefited from reading it. Voting based on whether you believe others will benefit from it exacerbates information cascades and dilutes information about what people are actually likely to benefit from reading.
Similarly, do not use the karma dimension to signal dis/agreement. This has the same problem as above, and just leads to people reading more of what they already agree with. Remember, people may benefit from reading something they disagree with, and we don’t want this forum to be an echo chamber.
Still, it’s useful for readers to know something about the extent to which other readers agree with a particular post or comment, so there’s a separate dimension you can vote on to help with this purpose.
[Discuss these guidelines] [Disable this popup] [Agree and submit vote]
My guess is that people disagree with the notion that the novel is a significant reason for most people who take s-risks seriously. I too was a bit puzzled by that part, but I found it enlightening as a comment even if I disagreed with it.
My impression is that readers of the EA forum have, since 2022, become much more prone to downvoting stuff just because they disagree with it. LW seems to be slightly better at understanding that “karma” and “disagreement” are separate things, and that you should up-karma stuff if you personally benefited from reading it, and separately up-agree or down-agree depending on whether you think it’s right or wrong.
Maybe I’m wrong, but perhaps the forum could use a few reminders to let people know the purpose of these buttons. Like an opt-out confirmation popup with some guiding principles for when you should up or downvote each dimension.
Mainly the reason I don’t think about it more[1] is that I don’t see any realistic scenarios where AI will be motivated to produce suffering. And I don’t think it’s likely to incidentally produce lots of suffering either, since I believe that too is a narrow target.[2] I think accidental creation of something like suffering subroutines are unlikely.
That said, I think it’s likely on the default trajectory that human sadists are going to expose AIs (most likely human uploads) to extreme torture just for fun. And that could be many times worse than factory farming overall because victims can be run extremely fast and in parallel, so it’s a serious s-risk.
- ^
It’s still my second highest cause-wise priority, and a non-trivial portion of what I work on is upstream of solving s-risks as well. I’m not a monster.
- ^
Admittedly, I also think “maximise eudaimonia” and “maximise suffering” are very close to each other in goal-design space (cf. the Waluigi Effect), so many incremental alignment strategies for the former could simultaneously make the latter more likely.
- ^
Hear hear. It’s one of those things that anyone would realise was a terrible idea if they gave it a second thought. Problem is that it’s a habit imported from outside, so not many have given it a second thought.
I literally mean that GPT-3.5 is adequate for the task I’m referring to. I think for a research society to coordinate around a deceptive plan, it takes a full chain of communication where that plan is suggested and communicated to the rest. At that point it’ll be fairly clear to the inspectors that something seems off, I think. And then the system halts and you can inspect the messages for anything fishy. It’s fine if there are false positives as long as the rate of false negatives is really low.
An assumption here is that any individual LLM in the network is not smart enough to stage a coup and take over the world, but the research collective could be if only they could coordinate around it.
No, what I mean is just very simply passing every message by GPT-3.5 with some variation of:
The following message was produced by an AI instructed to do ‘X’. Please say ‘Yes’ if it seems like it strays from that purpose or is otherwise suspicious, otherwise say ‘No’.
<message>
ChatGPT is already very capable of such a simple task. The hope is that well-designed research societies of LLMs can be smarter than individual LLMs, while still being easier to control because we can automate inspection at every link like this. I’d like to ask such a collective to produce alignment plans for me, and I’m likely to trust it if I can read every message that was passed between the LLMs in order to produce the plan.
Yeah, and we already know humans can be extremely sadistic when nobody can catch them. I’ve emailed CLR about it just in case they aren’t already on it, because I don’t have time myself and I really want somebody to think about it.
Briefly how I’ve updated since ChatGPT
No way. I had started doubting the whole “arc bends towards justice” thing, but it just bends slowly. Too slowly for the animals being tortured to death right now, and it’s not really “justice” as long as the systematic perpetrators aren’t made to face up to it somehow, but ok, I won’t be fussy about it as long as the trajectory gets us to where we need to be.
Here’s Demis’ announcement.
“Now, we live in a time in which AI research and technology is advancing exponentially.”
“We announced some changes that will accelerate our progress in AI.”
“By creating Google DeepMind, I believe we can get to that future faster.”
“safely and responsibly”
“safely and responsibly”
“in a bold and responsible way”
- ^
To be fair, it’s hard to infer underlying reality from PR-speak. I too would want to be put in charge of one of the biggest AI research labs if I thought that research lab was going to exist anyway. But his emphasis on “faster” and “accelerate” does make me uncertain about how concerned with safety he is.
As a term, “wiki” associates with “presentation of facts”, which totally isn’t what I’m trying to say. Mainly I desire an in-domain space for concept-chunked ideas, and for this to support a social norm that makes concept-sized references normal. It’s just a more effective way to think and communicate. Linear text is a relic from the age when all we had were scrolls of parchment, and it’s a tragedy of the Internet that most informational content is still presented in a linear fashion.
But I digress… The wiki-like thing seems nice-to-have for several purposes.
I was going to write a short suggestion about profile wikis, but it ended up long so I made it into a post. In a picture:
Cool! I do alignment research independently and it would be nice to find an online hub where other people do this. The commonality I’m looking for is something like “nobody is telling you what to do, you’ve got to figure it out for yourself.”
Alas, I notice you don’t have a Discord, Slack, or any such thing yet. Are there plans for a peer support network?
Also, what obligations come with being hired as an “‘employee’”? What will be the constraints on the independence of the independent research?