Non-EA interests include chess and TikTok (@benthamite). Formerly @ CEA, METR + a couple now-acquired startups.
Ben_Westđ¸
I also consider this to a lesser extent around animal sentience arguments
+1, âit would be very easy for me to ignore the possibility that nematodes might be consciousâ is a major impediment to thinking clearly about animal sentience (including for me).
I donât disagree, itâs more that this feels a bit like privileging the hypothesis? I think the modal reason Iâve heard from people who did capabilities work and now regret it is something like âI knew I was misaligned with leadership but I thought leaving would be even worse.â
If, for some reason, Anthropic asked me how to prevent people from regretting working for them, I would focus much more on âhave a thing for people to do once they realize their colleague is corruptâ instead of âhave a more nuanced way of telling if their colleague is corrupt.â
Downvoted; I think this comment was unnecessarily rude.
Thanks! I only know a handful of people in this category, but for what itâs worth, it again feels like people who were predisposed to thinking that working on pretraining would be okay rather than them being âcorrupted.â
E.g., I recently talked to someone who told me that their main takeaway from a safety fellowship was realizing that they didnât fit in because they actually werenât worried about existential risk in the same way that the other attendees were.
People seem surprised and bewildered when AI folks defect away from AI safety towards capabilities. People trust that as AI companies grow, those gaining power and money from shares will not be adversely influenced by that power and money.
fwiw I donât actually know many examples of this, and the ones I hear cited often seem uncompelling to me. E.g.:
Greg Brockman doesnât seem like a true believer in OpenAIâs nonprofit mission who got corrupted but rather someone who went into it wanting to make a profit
Mechanizeâs founders donât seem like EAs who got corrupted by AI money but rather EAs with unusual moral and empirical views which result in them thinking that the best course of action is the exact opposite of what most EAs think
(Counterexamples appreciated, though!)
And credit to the AI skeptics that they seem to mostly have updated in light of the new evidence (or at least claimed that they never actually believed in long timelines, which is maybe less noble, but ends up in the same place).
Yeah I agree that if you only have one bit of detail that you can store, then saying it is âhardâ rather than âeasyâ is probably the correct bit. However I would suggest that for something as important as your career you should investigate in substantially more detail. If you do so I expect you will come up with a range of needed skills/âattributes for these jobs, some of which you might find easy, others of which you might find hard.
I no longer work at METR. I would guess that theyâd be excited about applicants who have done this, but donât want to speak for them.
Many people said they wanted to work for METR. I made what I thought was a good offer: take one of the benchmarks we give AIs; if you get a good score then I guarantee that I will fly you out for an interview, even if you have no work history, have no money to pay for the trip, or any other barrier one might have to employment.
Exactly zero people took me up on this.[1]
How is it possible for there to be sky-high rejection rates yet also zero people sending me applications?
I think the answer is that raw rejection rates arenât a very useful metric. After all, an 80% rejection rate means that the AI safety jobs are 1/â10th as selective as Walmart!
I would suggest ignoring raw rejection rates in favor of just looking at the criteria for the jobs you want. Particularly for something like s-risks the criteria are going to be unusual and specific, meaning that even generically qualified people will often have to dedicate substantial time to skilling up, but if youâre able to do so, then your odds are pretty good.[2]
- ^
I wouldnât be surprised to learn that some people tried this, failed, and then were too embarrassed about failing to tell me. But, to the best of my recollection, literally zero people have told me that they even attempted this task.
- ^
I say this even with the knowledge that you are 19. I donât want to pretend that the deck isnât stacked against younger peopleâit totally isâbut we employ some 19 year olds, as do other AI safety orgs. If a 19 year old had sent me a good solution to that METR challenge, for example, I would have been happy to hire them.
- ^
Cool! Impressive numbers.
Table 1 shows the techniques used; the teams which were allowed to use SAEs (an interpretability technique) used them; the one which was prohibited from using them searched the data.
Also note that âtraining dataâ does not mean âinstructionsâ. Section 3 describes their training process.
I see, thanks! Iâm not sure exactly what youâd consider as evidence here, but e.g. hereâs citation count on papers from the past year vs. AI Lab Watch safety rating[1]
- ^
Raw data. Note that anthropic doesnât use arxiv, which affects their citation counts. This is just coming from a dumb search of semantic scholar; I expect a lot of disagreement could be had over the exact criteria for considering something âinterpretabilityâ but I expect the Ant/âGDM > OAI >> * ordering to be true for almost any definition.
- ^
I suspect that Iâm still misunderstanding you, but: eg interpretability tools are empirically able to identify misalignment, which feels like a (somewhat simple example of) the thing we want. Neel Nandaâs 80k podcast goes over the state of the field; tldr is roughly that there are pretty meaningful advances but also heâs skeptical that it will be a silver bullet.
I agree with Ben Stewart that thereâs a galaxy-brain argument that these positive impacts are outweighed by accelerating progress, but it seems hard to argue that things like interpretability arenât making progress on their own terms.
Wiblin does not explain where his estimate of âhundreds of billions of dollarsâ of revenue comes from, but it reads to me like pure marketing for potential investors
You quote him as observing that their revenue tripled over the past 3 months, and some basic math tells us that another ~tripling gets them to $100B.
Iâm in favor of rigor and would also have preferred him to share a more detailed model, but âpure marketing for potential investorsâ seems like an unfair characterization of a âpredict trends will continue unchangedâ forecast.
Edit: Iâve listened to the podcast and now think your framing is unfair to the point of being misleading. He says:
And also keep in mind that on Monday â the day before Anthropic published all of this â we learned that their annualised revenue run rate had grown from $9 billion at the end of December to $30 billion just three months later. Thatâs 3.3x growth in a single quarter â perhaps the fastest revenue growth rate for a company of that size ever recorded.
That exploding revenue is a pretty good proxy for how much more useful the previous release, Opus 4.6, has become for real-world tasks. If the past relationship between capability measures and usefulness continues to hold, the economic impact of Mythos once it becomes available is going to dwarf everything that came before it â which is part of why Anthropicâs decision not to release it is a serious one, and actually quite a costly one for them.
Theyâre sitting on something that would likely push their revenue run rate into the hundreds of billions, but theyâve decided itâs simply not worth the risk.
He very straightforwardly seems to be explaining where his estimate comes from to me?
Hmm, but in a success without dignity world making interpretability a bit better, or governments a bit more interested, is relevant, right?
Maybe, but âif EA had just stuck to Earning To Give and malaria nets and decaging chickens then the impact would have been greaterâ doesnât clearly follow. Malaria nets look a lot worse if we all die in a few years from AI anyway, and cage free pledges have ~0 value if humanity ends before the pledge can be fulfilled.
Are you asking just about recent graduates, or all graduates?
Your conflict of interest here feels enormous (even if declared) and its hard to read this and not feel like it might be a bid to directly protect your own interests by asking others to not step into your turf here as a lobbyist.
I think you could also read it as him attempting to solve the problem heâs describing.
I would be keen to hear if you think you have any solutions to this birifuction.
Huh, this feels like prime EA territory to me. We need disagreement so that people can engage in key EA activities like âmaking persnickety critiques of footnote #237 on someoneâs 10k word forum post.â
The case for EA feels much weaker to me if we are all confident that X is the best thing to doâthen you should just do X and not worry about cause prio etc.
I think the AI ethics crowd is the subject of attacks (though arguably this is because they tried to seek power and influence).