Larks

Karma: 15,958

Larks Jan 15, 2025, 3:29 AM
26 points
5 ∶ 4
in reply to: Ryan Greenblatt’s comment on: Are AI safetyists crying wolf?
Oh wow, I actually think your grandparent comment here was way more misleading than their tweet was! It sounds like they almost verbatim quoted you. Yes, they took out that you set up the experiment… but of course? If write “John attempted to kill Sally when he was drunk and angry”, and you summarise it was “John attempted to kill Sally, he’s dangerous, be careful!” this is a totally fair summarisation. Yes it cuts context but that is always the case—any short summarisation does this.
In contrast, unlike your comment, they never said ‘escape into the wild’. When I read your comment I assumed they had said this.
Also, the tweet direct quotes your tweet, so users can easily look at the original source. In contrast your comment here doesn’t link to their tweet—before you linked to it I assumed they had done something significantly worse.

Larks Jan 14, 2025, 4:19 AM
7 points
2 ∶ 0
in reply to: Ryan Greenblatt’s comment on: Are AI safetyists crying wolf?
in response to our recent paper “Alignment Faking in Large Langauge Models”, they posted a tweet which implied that we caught the model trying to escape in the wild. I tried to correct possible misunderstandings here.
Probably would be easier for people to evaluate this if you included a link?

Larks Jan 14, 2025, 3:33 AM
2 points
0 ∶ 0
in reply to: ArthurF’s comment on: Rolling Thresholds for AGI Scaling Regulation
Thanks for the comment! You’re right that this approach would need modification if ‘dangers that only become apparent after mass deployment’ becomes a major risk factor, and that a ‘trial’ commercialisation period could be a good response. My hope is that the regulatory exam period would be able to catch much more than at present though—the regulator would have ample time to design and deploy more sophisticated tests, with the aid of labs who would presumably love to submit a test their competitor would fail (so long as they themselves pass).

Larks Jan 13, 2025, 6:58 PM
3 points
0 ∶ 0
in reply to: James Herbert’s comment on: James Herbert’s Shortform
If EA was a broad and decentralised movement, similar to e.g., environmentalism, I’d classify SMA as an EA project. But right now EA isn’t quite that. Personally, I hope we one day get there.
This seems pretty circular to me?

Larks Jan 12, 2025, 2:24 AM
4 points
0 ∶ 0
in reply to: Jess_Riedel’s comment on: Rolling Thresholds for AGI Scaling Regulation
Interesting suggestion! Continuous or pseudo-continuous threshold raising isn’t something I considered. Here are some quick thoughts:
- Continuous scaling could make eval validity easier, because the jump between eval-train (n-1) and eval-deploy (n) is smaller.
- Continuous scaling encourages training to be done quickly, because you want to get your model launched before it is outdated.
- Continuous scaling means you give up on the idea of models being evaluated side-by-side.

Rolling Thresholds for AGI Scaling Regulation

LarksJan 12, 2025, 1:30 AM

60 points

4 comments6 min readEA link

Larks Jan 11, 2025, 8:03 PM
8 points
1 ∶ 0
on: Trump’s tariffs will probably increase poverty abroad
They rightly note that protectionism constitutes a sales tax which falls hardest on low-income Americans
A bit of a nitpick but no they don’t? They argue it is similar in many ways to a consumption tax, but consumption taxes are not the same as sales taxes. Sales taxes have unique difficulties around compliance which other types of consumption taxes like VAT do not have. Sales taxes are an unusually hard type of tax to enforce (because shops will increasingly under-report sales) leading to distortions in favour of less compliant businesses, but tariffs are unusually easy to enforce because the government controls the ports and airports. My recollection is economists generally think well-designed consumption taxes, like VAT, are unusually good taxes. The problem is that neither sales taxes nor tariffs are particularly good examples of consumption taxes.

Larks Jan 11, 2025, 7:55 PM
2 points
2 ∶ 0
in reply to: Eevee🔹’s comment on: Sample donor screening guidelines (by the Creative Commons org)
Using a very simple cntrl-f methodology, I estimated that over 95% of this post is the CC Guidelines. In contrast you spend less than one sentence arguing for having such a policy, and zero words whatsoever considering tradeoffs. If you want engagement on something you need to provide some material to engage with, and if you thought the CC Guidelines were inappropriate for EA orgs and require significant modification you should say so in the post! I don’t think you can share a lengthy post and then declare, post hoc, that almost the entire post is off-limits, running the risk that orgs might copy-paste it without understanding the issues, and that comments should be restricted to a topic which was barely mentioned in the post.
Indeed, as far as I can see even you agree with this, since one paragraph after chastising me for getting “bogged down in the specifics of the document” you ask for advice on how to adapt these policies, which necessarily involves engagement with the specifics.

Larks Jan 11, 2025, 7:39 PM
4 points
0 ∶ 0
on: Launching Screwworm-Free Future – Funding and Support Request
Thanks for working on this, seems potentially very valuable, good initiative!

Larks Jan 10, 2025, 5:37 PM
0 points
0 ∶ 1
in reply to: Eevee🔹’s comment on: Sample donor screening guidelines (by the Creative Commons org)
by posting it I don’t necessarily endorse its exact contents.
Well you did suggest people could “emulate” and “copy-paste”
Even if taken literally, this would only apply to Palantir employees who work directly on weapons systems or any software that is being used to commit violations of international law (such as war crimes).
No, any employee who is ‘involved’ in arms, including totally unobjectionable sales to the US Government, would be involved. The language is quite clear that, while it includes illegal weapons, there is nothing to suggest it is limited to them. As far as I can see even accountants would be ‘involved’ - you can’t make weapons without accounting!

Larks Jan 10, 2025, 5:27 PM
4 points
0 ∶ 0
in reply to: Eevee🔹’s comment on: Sample donor screening guidelines (by the Creative Commons org)
PopVax is an Indian biotechnology company. Biotech is a type of pharma company.

Larks Jan 10, 2025, 4:37 AM
21 points
2 ∶ 1
on: Sample donor screening guidelines (by the Creative Commons org)
This seems quite burdensome. In order to accept a $10k donation from a ‘high-risk’ PopVax employee (listed on 80k’s job board!) you’d have to:
Involve a wide range of senior people in your org:
A wide range of functions and units across CC are part of the process to decide on Donations and Partnerships. The process will typically include input from CC’s:
- Development Team
- Relevant program or project team
- Legal Team
- CEO
- Development Council
- Members of the Board of Directors, where relevant expertise is needed
Pay for a wealth screening database:
current and prospective Donors that meet the criteria for screening described above will be screened using a wealth screening database
Write a written report that is reviewed by many senior people:
the Development Team will provide a written report to the CEO, the Legal Team, any relevant program / project teams, the Development Council and board members selected by it and the Development Team for review and approval during the Qualification Stage of the fundraising pipeline.
be very diligent:
Decisions are taken after thorough examination of such Donation [emphasis added]
Review each year:
CC reviews Donors and their fundamentals every year at the renewal or annual anniversary of a multi-year relationship
Regularly stalk them on social media:
regularly monitors the media for developments linked to Donors. The monitoring is carried out by the Development Team with support from staff and relevant project teams as reasonably requested.
Communicate frequently with donors with feedback about their business practices:
CC aims to maintain a frequent, transparent and constructive engagement with Donors. This enables the CC to be a critical friend where appropriate.
… and a $10k donation from a Palantir employee would simply be totally prohibited.
Overall it seems plausible to me that actually following this for a $10k donation would eat up most of the donation in due diligence overhead. My guess is that CC does not actually follow the letter of this policy in practice.

Larks Jan 8, 2025, 4:00 PM
12 points
7 ∶ 1
in reply to: Linda Linsefors’s comment on: Linda Linsefors’s Shortform
This sounds like the sort of thing you should have asked CEA about before posting.

Larks Jan 6, 2025, 2:17 PM
8 points
2 ∶ 1
in reply to: Nathan Young’s comment on: Nathan Young’s Shortform
I don’t know whether such people are going around dming everyone like this.
Presumably not, as most people are not going around creating crime prediction markets that dramatically raise the salience of an implicit accusation. From their point of view I can see their response as being extremely restrained—you are making probabilistic public accusations that will predictably make them look bad, no matter how low the market price, and they’re not responding publicly at all.

Larks Jan 4, 2025, 8:12 PM
2 points
0 ∶ 0
in reply to: Benquo’s comment on: Preference Inversion
I’d like to better understand your criteria for relevance.
There was some mental process that lead you to think this was good content to share on the EA forum. What this was was (at least to me, and I suspect to other readers) very opaque—so I suggest you explicitly mention it.
A good example is this post. It also introduces a topic with no explicit action items and doesn’t provide ‘direct factual support for current EA initiatives’. But it is pretty clear why it might be relevant to EA work, and the author explicitly included a section gesturing at the reasons to make it clear.
Are you suggesting that EA relevance requires either explicit action items or direct factual support for current EA initiatives?
No I am not.

Larks Jan 4, 2025, 5:57 AM
2 points
0 ∶ 0
in reply to: Habryka [Deactivated]’s comment on: Preference Inversion
I agree that not everything needs to supply random marginal facts about malaria. But at the same time I think concrete examples are useful to keep things grounded, and I think it’s reasonable to adopt a policy of ‘not relevant to EA until at least some evidence to the contrary is provided’. Apparently the OP does have some relevance in mind:
This matters because a lot of EA work involves studying revealed preferences in contexts with strong power dynamics (development economics, animal welfare, etc). If we miss these dynamics, we risk optimizing for the same coercive equilibria we’re trying to fix.
I feel like it would have been good to spend like half the post on this! Maybe I am just being dumb but it is genuinely unclear to me what preference falsification the OP is worried about with animal welfare. Without this the post seems to be written as a long response to a question about sex that as far as I can tell no-one on the forum asked.

Larks Jan 4, 2025, 5:46 AM
2 points
0 ∶ 0
in reply to: Guive’s comment on: What predictions from theoretical AI Safety research have been confirmed by empirical work?
Conversations people have with un-RLHF’d models.

Larks Jan 3, 2025, 3:58 AM
3 points
1 ∶ 1
in reply to: Benquo’s comment on: Preference Inversion
Yup, I understand the general concept of preference falsification. My question is about the specific application. I think it would be helpful if you had a concrete example of where this would be relevant for e.g. malaria bednets or factory farming?

Larks Jan 2, 2025, 7:17 PM
4 points
1 ∶ 0
on: Preference Inversion
Thanks for sharing this. Perhaps you could explain the relevance to effective altruism a bit more explicitly?

Larks Dec 30, 2024, 1:42 AM
13 points
2 ∶ 1
on: What predictions from theoretical AI Safety research have been confirmed by empirical work?
That mindspace is large and AIs are really weird.

Larks

Rol­ling Thresh­olds for AGI Scal­ing Regulation

Rolling Thresholds for AGI Scaling Regulation