Non-EA interests include chess and TikTok (@benthamite). We are probably hiring: https://ââmetr.org/ââhiring
Ben_Westđ¸
Thanks for the question David! I believe the methodology sections of the paper help answer this, particularly: section 4 goes into more detail on what the horizon means and section 8.1 discusses some limitations of this approach.
So the claim is:
The 50% trend will break down at some length of task
The 80% trend will therefore break at
And maybe is large enough to cause some catastrophic risk, but isnât
?
Figure four averages across all models. I think figure six is more illuminating:
Basically, the 80% threshold is ~2 doublings behind the 50% threshold, or ~1 year. An extra year isnât nothing! But youâre still not getting to 10+ year timelines.
ďMETR: MeaÂsurÂing AI AbilÂity to ComÂplete Long Tasks
Thanks for writing this up! I really like when people do concrete empirical surveys like this, itâs helpful to get a sense of how widely current tools are actually being used.
Iâm curious if you have thoughts about what automation would actually speed you up? It sounds like maybe something like âcurrent LLMs but without hallucination?â
Also, do you have a sense for how much investment has been made into AI tools in CEST? My impression is that deepmind really loves getting into nature/âscience but has very little interest in actually commercializing these tools, so it feels not that surprising to me that the thing which got into science didnât actually get used.[1] It would update me if they tried very hard to commercialize it but failed.
- ^
I agree that this doesnât speak well of the editorial process though
- ^
This was a great post, thanks for writing it up
It feels appropriate that this post has a lot of hearts and simultaneously disagree reacts. We will miss you, even (perhaps especially) those of us who often disagreed with you.
I would love to reflect with you on the other side of the singularity. If we make it through alive, I think thereâs a decent chance that it will be in part thanks to your work.
I was excited that they did this and thought it was well produced. The focus on cost cutting feels like a double edged sword: it absolves viewers of responsibility, which makes them more open to the message but also less likely to do anything. I scrolled through the first couple pages of comments and saw a bunch of âcorporations are greedyâ complaints but couldnât find anyone suggesting a concrete behavioral change (for themselves or others).
I wonder if thereâs an adjacent version of this which keeps the viewer absolved of responsibility but still has a call to action. Plausible ideas:
Race to the top: e.g. specifically call out the worst corporate offender in the video
Political stuff, e.g. push for EU Commission to keep their cage banning promise
Maybe YouTube rules about politics prevents them saying this, not sure
In any case, kudos to the Kurzgesagt team for making a video on this which (as of this writing) has 2M+ views!
If you can get a better score than our human subjects did on any of METRâs RE-Bench evals, send it to me and we will fly you out for an onsite interview
Caveats:
youâre employable (we can sponsor visas from most but not all countries)
use same hardware
honor system that you didnât take more time than our human subjects (8 hours). If you take more still send it to me and we probably will still be interested in talking
(Crossposted from twitter.)
Wow thatâs great. Congrats to you and all the organizers!
I appreciate you being willing to share your candid reasons publicly, Jesse. Best of luck with your future plans, and best of luck to Tristan and Mia!
EA Awards
I feel worried that the ratio of the amount of criticism that one gets for doing EA stuff to the amount of positive feedback one gets is too high
Awards are a standard way to counteract this
I would like to explore having some sort of awards thingy
I currently feel most excited about something like: a small group of people solicit nominations and then choose a short list of people to be voted on by Forum members, and then the winners are presented at a session at EAG BA
I would appreciate feedback on:
whether people think this is a good idea
How to frame thisâI want to avoid being seen as speaking on behalf of all EAs
Also if anyone wants to volunteer to co-organize with me I would appreciate hearing that
- Jan 19, 2025, 10:26 PM; 7 points) 's comment on What are we doÂing about the EA FoÂrum? (Jan 2025) by (
It looks like she did a giving season fundraiser for Helen Keller International, which she credits to the EA class she took. Maybe we will see her at a future EAG!
Gave ~50% of my income to my DAF. I will probably disburse it mostly to AI Safety things which make sense on â 5 year AGI timelines.
Adult film star Abella Danger apparently took an class on EA at University of Miami, became convinced, and posted about EA to raise $10k for One for the World. She was PornHubâs most popular female performer in 2023 and has ~10M followers on instagram. Her post has ~15k likes, comments seem mostly positive.
I think this might be the class that @Richard Y Chappellđ¸ teaches?
Thanks Abella and kudos to whoever introduced her to EA!
Thank you for sharing your donation choices!
This is great, thanks for doing this survey!
Kudos for making this post! I think itâs hard to notice when money would best we spent elsewhere, particularly when you do actually have a use for it, and I appreciate you being willing to share this.
Fair enough! fwiw I would not have guessed that most pause AI supporters have a p(doom) of 90%+. My guess is that the crux between you is actually that they believe itâs worth pushing for a policy even if you I think itâs possible you will change your mind in the future. (But people should correct me if Iâm wrong!)
Fair enough! My guess is that when the trend breaks it will be because things have gone super-exponential rather than sub-exponential (some discussion here) but yeah, I agree that this could happen!