Marcel2

Karma: 1,908

Marcel2 13 Nov 2024 23:43 UTC
4 points
0 ∶ 0
on: Bad omens for US farmed animal policy work?
Has anyone thought about trying to convince anti-regulatory figures (e.g., Marc Andreessen) in the new admin’s orbit to speak out against the regulatory capture of banning cultivated meat? Has anyone tried painting cultivated meat as “Little Tech”?

Marcel2 18 Jun 2024 13:44 UTC
2 points
0 ∶ 0
in reply to: mlsbt’s comment on: On the Dwarkesh/Chollet Podcast, and the cruxes of scaling to AGI
I almost clarified that I know some models technically are multi-modal, but my impression is that the visual reasoning abilities of the current models are very limited, so I’m not at all surprised they’re limited. Among other illustrations of this impression, occasionally I’ve found they struggle to properly describe what is happening in an image beyond a relatively general level.

Marcel2 18 Jun 2024 12:56 UTC
2 points
0 ∶ 0
in reply to: Egg Syntax’s comment on: On the Dwarkesh/Chollet Podcast, and the cruxes of scaling to AGI
Again, I’d be interested to actually see humans attempt the test by viewing the raw JSON, without being allowed to see/generate any kind of visualization of the JSON. I suspect that most people will solve it by visualizing and manipulating it in their head, as one typically does with these kinds of problems. Perhaps you (a person with syntax in their username) would find this challenge quite easy! Personally, I don’t think I could reliably do it without substantial practice, especially if I’m prohibited from visualizing it.

Marcel2 18 Jun 2024 12:53 UTC
2 points
0 ∶ 0
in reply to: mlsbt’s comment on: On the Dwarkesh/Chollet Podcast, and the cruxes of scaling to AGI
Just because an LLM can convert something to a grid representation/visualization does not mean it can itself actually “visualize” the thing. A pure-text model will lack the ability to observe anything visually. Just because a blind human can write out some mathematical function that they can input into a graphing calculator, that does not mean that the human necessarily can visualize what the function’s shape will take, even if the resulting graph is shown to everyone else.

Marcel2 17 Jun 2024 10:37 UTC
4 points
0 ∶ 0
in reply to: mlsbt’s comment on: On the Dwarkesh/Chollet Podcast, and the cruxes of scaling to AGI
I wouldn’t be surprised if that’s correct (though I haven’t seen the tests), but that wasn’t my complaint. A moderately smart/trained human can also probably convert from JSON to a description of the grid, but there’s a substantial difference in experience from seeing even a list of grid square-color labels vs. actually visualizing it and identifying the patterns. I would strike a guess that humans who are only given a list of square color labels (not just the raw JSON) would perform significantly worse if they are not allowed to then draw out the grids.

And I would guess that even if some people do it well, they are doing it well because they convert from text to visualization.

Marcel2 16 Jun 2024 16:13 UTC
4 points
1 ∶ 1
on: On the Dwarkesh/Chollet Podcast, and the cruxes of scaling to AGI
Can anyone point me to a good analysis of the ARC test’s legitimacy/value? I was a bit surprised when I listened to the podcast, as they made it seem like a high-quality, general-purpose test, but then I was very disappointed to see it’s just a glorified visual pattern abstraction test. Maybe I missed some discussion of it in the podcasts I listened to, but it just doesn’t seem like people pushed back hard enough on the legitimacy of comparing “language model that is trying to identify abstract geometric patterns through a JSON file” vs. “humans that are just visually observing/predicting the patterns.”

Like, is it wrong to demand that humans should have to do this test purely by interpreting the JSON (with no visual aide)?

Marcel2 20 May 2024 4:56 UTC
4 points
0 ∶ 0
on: Project idea: AI for epistemics
I’ve been advocating for something like this for a while (more recently, here and here), but have only ever received lukewarm feedback at best. I’d still be excited to see this take off, and would probably like to hear what other work is happening in this space!

Marcel2 20 May 2024 4:37 UTC
4 points
1 ∶ 0
on: Harrison D’s Shortform
I spent way too much time organizing my thoughts on AI loss-of-control (“x-risk”) debates without any feedback today, so I’m publishing perhaps one of my favorite snippets/threads:
A lot of debates seem to boil down to under-acknowledged and poorly-framed disagreements about questions like “who bears the burden of proof.” For example, some skeptics say “extraordinary claims require extraordinary evidence” when dismissing claims that the risk is merely “above 1%”, whereas safetyists argue that having >99% confidence that things won’t go wrong is the “extraordinary claim that requires extraordinary evidence.”
I think that talking about “burdens” might be unproductive. Instead, it may be better to frame the question more like “what should we assume by default, in the absence of definitive ‘evidence’ or arguments, and why?” “Burden” language is super fuzzy (and seems a bit morally charged), whereas this framing at least forces people to acknowledge that some default assumptions are being made and consider why.
To address that framing, I think it’s better to ask/answer questions like “What reference class does ‘building AGI’ belong to, and what are the base rates of danger for that reference class?” This framing at least pushes people to make explicit claims about what reference class building AGI belongs to, which should make it clearer that it doesn’t belong in your “all technologies ever” reference class.
In my view, the “default” estimate should not be “roughly zero until proven otherwise,” especially given that there isn’t consensus among experts and the overarching narrative of “intelligence proved really powerful in humans, misalignment even among humans is quite common (and is already often observed in existing models), and we often don’t get technologies right on the first few tries.”

Marcel2 25 Mar 2024 5:09 UTC
4 points
1 ∶ 0
in reply to: yanni kyriacos’s comment on: Be skeptical of EAs giving advice on things they’ve never actually been successful in themselves
I definitely think beware is too strong. I would recommend “discount” or “be skeptical” or something similar.

Marcel2 19 Mar 2024 2:18 UTC
2 points
0 ∶ 1
in reply to: Denis ’s comment on: Harrison D’s Shortform
Venus is an extreme example of an Earth-like planet with a very different climate. There is nothing in physics or chemistry that says Earth’s temperature could not one day exceed 100 C.
[...]
[Regarding ice melting -- ] That will take time, but very little time on a cosmic scale, maybe a couple of thousand years.
I’ll be blunt, remarks like these undermine your credibility. But regardless, I just don’t have any experience or contributions to make on climate change, other than re-emphasizing my general impression that, as a person who cares a lot about existential risk and has talked to various other people who also care a lot about existential risk, there seems to be very strong scientific evidence suggesting that extinction is unlikely.

Marcel2 14 Mar 2024 15:43 UTC
2 points
0 ∶ 0
in reply to: Denis ’s comment on: Harrison D’s Shortform
Everything is going more or less as the scientists predicted, if anything, it’s worse.
I’m not that focused on climate science, but my understanding is that this is a bit misleading in your context—that there were some scientists in the (90s/2000s?) who forecasted doom or at least major disaster within a few decades due to feedback loops or other dynamics which never materialized. More broadly, my understanding is that forecasting climate has proven very difficult, even if some broad conclusions (e.g., “the climate is changing,” “humans contribute to climate change”) have held up. Additionally, it seems that many engineers/scientists underestimated the pace of alternative energy technology (e.g., solar).
That aside, I would be excited to see someone work on this project, and I still have not discovered any such database.

Forecasting With LLMs—An Open and Promising Research Direction

Marcel212 Mar 2024 4:23 UTC

13 points

0 comments4 min readEA link

Marcel2 11 Mar 2024 3:32 UTC
2 points
0 ∶ 2
in reply to: Matthew_Barnett’s comment on: Clarifying two uses of “alignment”
I don’t find this response to be a compelling defense of what you actually wrote:
since AIs would “get old” too [...] they could also have reason to not expropriate the wealth of vulnerable old agents because they too will be in such a vulnerable position one day
It’s one thing if the argument is “there will be effective enforcement mechanisms which prevent theft,” but the original statement still just seems to imagine that norms will be a non-trivial reason to avoid theft, which seems quite unlikely for a moderately rational agent.
Ultimately, perhaps much of your scenario was trying to convey a different idea from what I see as the straightforward interpretation, but I think it makes it hard for me to productively engage with it, as it feels like engaging with a motte-and-bailey.

Marcel2 11 Mar 2024 2:23 UTC
11 points
2 ∶ 1
on: Clarifying two uses of “alignment”
Apologies for being blunt, but the scenario you lay out is full of claims that just seem to completely ignore very facially obvious rebuttals. This would be less bad if you didn’t seem so confident, but as written the perspective strikes me as naive and I would really like an explanation/defense.

Take for example:

Furthermore, since AIs would “get old” too, in the sense of becoming obsolete in the face of new generations of improved AIs, they could also have reason to not expropriate the wealth of vulnerable old agents because they too will be in such a vulnerable position one day, and thus would prefer not to establish a norm of expropriating the type of agent they may one day become.

Setting aside the debatable assumptions about AIs getting “old,” this just seems to completely ignore the literature on collective action problems. If the scenario were such that any one AI agent can expect to get away with defecting (expropriation from older agents) and the norm-breaking requires passing a non-small threshold of such actions, a rational agent will recognize that their defection has minimal impact on what the collective will do, so they may as well do it before others do.

There are multiple other problems in your post, but I don’t think it’s worth the time going through them all. I just felt compelled to comment because I was baffled by the karma on this post, unless it was just people liking it because they agreed with the beginning portion…?

Marcel2 9 Mar 2024 18:36 UTC
3 points
0 ∶ 0
in reply to: Rebecca’s comment on: Harrison D’s Shortform
Sure! (I just realized the point about the MNIST dataset problems wasn’t fully explained in my shared memo, but I’ve fixed that now)
Per the assessment section, some of the problems with assuming that FRVT demonstrates NIST’s capabilities for evaluation of LLMs/etc. include:
1. Facial recognition is a relatively “objective” test—i.e., the answers can be linked to some form of “definitive” answer or correctness metric (e.g., name/identity labels). In contrast, many of the potential metrics of interest with language models (e.g., persuasiveness, knowledge about dangerous capabilities) may not have a “definitive” evaluation method, where following X procedure reliably evaluates a response (and does so in a way that onlookers would look silly to dispute).
2. The government arguably had some comparative advantage in specific types of facial image data, due to collecting millions of these images with labels. The government doesn’t have a comparative advantage in, e.g., text data.
3. The government has not at all kept pace with private/academic benchmarks for most other ML capabilities, such as non-face image recognition (e.g., Common Objects in Context) and LLMs (e.g., SuperGLUE).
4. It’s honestly not even clear to me whether FRVT’s technical quality truly is the “gold standard” in comparison to the other public training/test datasets for facial recognition (e.g., MegaFace); it seems plausible that the value of FRVT is largely just that people can’t easily cheat on it (unlike datasets where the test set is publicly available) because of how the government administers it.
For the MNIST case, I now have the following in my memo:
Even NIST’s efforts with handwriting recognition were of debatable quality: Yann LeCun’s widely-used MNIST is a modification of NIST’s datasets, in part because NIST’s approach used census bureau employees’ handwriting for the training set and high school students’ handwriting for the test set.^[1]
1. ^
  Some may argue this assumption was justified at the time because it required that models could “generalize” beyond the training set. However, popular usage appears to have favored MNIST’s approach. Additionally, it is externally unclear that one could effectively generalize from the handwriting of a narrow and potentially unrepresentative segment of society—professional bureaucrats—to high schoolers’, and the assumption that this would be necessary (e.g., due to the inability to get more representative data) seems unrealistic.

Marcel2 8 Mar 2024 17:42 UTC
29 points
2 ∶ 0
on: Harrison D’s Shortform
Seeing the drama with the NIST AI Safety Institute and Paul Christiano’s appointment and this article about the difficulty of rigorously/objectively measuring characteristics of generative AI, I figured I’d post my class memo from last October/November.
The main point I make is that NIST may not be well suited to creating measurements for complex, multi-dimensional characteristics of language models—and that some people may be overestimating the capabilities of NIST because they don’t recognize how incomparable the Facial Recognition Vendor Test is to this situation of subjective metrics for GenAI and they don’t realize NIST arguably even botched MNIST (which was actually produced by Yann LeCun by recompiling NIST’s datasets). Moreover, government is slow, while AI is fast. Instead, I argue we should consider an alternative model such as federal funding for private/academic benchmark development (e.g., prize competitions).
I wasn’t sure if this warranted a full post, especially since it feels a bit late; LMK if you think otherwise!

Marcel2 16 Dec 2023 19:55 UTC
3 points
0 ∶ 0
in reply to: finm’s comment on: The Offense-Defense Balance Rarely Changes
I probably should have been more clear, my true “final” paper actually didn’t focus on this aspect of the model: the offense-defense balance was the original motivation/purpose of my cyber model, but I eventually became far more interested in using the model to test how large language models could improve agent-based modeling by controlling actors in the simulation. I have a final model writeup which explains some of the modeling choices in more detail and talks about the original offense/defense purpose in more detail.
(I could also provide the model code which is written in Python and, last I checked, runs fine, but I don’t expect people would find it to be that valuable unless they really want to dig into this further, especially given that it might have bugs.)

Marcel2 16 Dec 2023 16:37 UTC
5 points
0 ∶ 0
in reply to: finm’s comment on: The Offense-Defense Balance Rarely Changes
If offence and defence both get faster, but all the relative speeds stay the same, I don’t see how that in itself favours offence
Funny you should say this, it so happens that I just submitted a final paper last night for an agent-based model which was meant to test exactly this kind of claim for the impacts of improving “technology” (AI) in cybersecurity. Granted, the model was extremely simple + incomplete, but the theoretical results explain how this could possible.
In short, when assuming a fixed number of vulnerabilities in an attack surface, while attackers’ and defenders’ budgets are very small there may be many more vulnerabilities that go unnoticed. For example, suppose they together can only explore 10% of the attack surface, but vulnerabilities are only in 1% of the surface. Thus, even if atk/def budgets increase by the same factor (e.g., 10x), it increases the likelihood that vulnerabilities are found either by the attacker or defender.
The following results are admittedly not very reliable (I didn’t do any formal verification/validation beyond spot checks), but the point of showing these graphs is not “here are the definitive numbers” but more an illustrative “here is what the pattern of relationships between attack surface, atk/def budgets, and theft rate could look like”.
Notice how as the attack surface increases the impact of multiplying the attackers and defenders’ budgets causes more convergence. With a hypothetical 1x1 attack surface (grid) for each actor, the budget multiplication should have no effect on loss rates, because all vulnerabilities are found and it’s just a matter of who found them first, which is not affected by budget multiplication. However, with a hypothetical infinite by infinite grid, the multiplication of budgets strictly benefits the attacker, because the defenders’ will ~never check the same squares that the attacker checks.
(ultimately my model makes many unrealistic assumptions and may have had bugs, but this seemed like a decent intuition seed—not a true “conclusion” which can be carelessly applied elsewhere.)

Marcel2 16 Dec 2023 16:14 UTC
3 points
1 ∶ 0
in reply to: Ben Stewart’s comment on: The Offense-Defense Balance Rarely Changes
Thank you so much for articulating a bunch of the points I was going to make!

I would probably just further drive home the last paragraph: it’s really obvious that the “number of people a lone maniac can kill in given time” (in America) has skyrocketed with the development of high fire-rate weapons (let alone knowledge of explosives). It could be true that the O/D balance for states doesn’t change (I disagree) while the O/D balance for individuals skyrockets.

Marcel2 8 Nov 2023 20:00 UTC
11 points
1 ∶ 0
on: How Rethink Priorities is Addressing Risk and Uncertainty
I have increasingly become open to incorporating alternative decision theories as I recognize that I cannot be entirely certain in expected value approaches, which means that (per expected value!) I probably should not solely rely on one approach. At the same time, I am still not convinced that there is a clear, good alternative, and I also repeatedly find that the arguments against using EV are not compelling (e.g., due to ignoring more sophisticated ways of applying EV).
Having grappled with the problem of EV-fanaticism for a long time in part due to the wild norms of competitive policy debate (e.g., here, here, and here), I’ve thought a lot about this, and I’ve written many comments on the forum about this. My expectation is that this comment won’t gain sufficient attention/interest to warrant me going through and collecting all of those instances, but my short summary is something like:
- Fight EV fire with EV fire: Countervailing outcomes—e.g., the risk that doing X has a negative 999999999… effect—are extremely important when dealing with highly speculative estimates. Sure, someone could argue that if you don’t give $20 to the random guy wearing a tinfoil hat and holding a remote which he will use to destroy 3^3^3 galaxies there’s at least a 0.000000...00001% chance he’s telling the truth, but there’s also a decent chance that doing this could have the opposite effect due to some (perhaps hard-to-identify) alternative effect.
- One should probably distinguish between extremely low (e.g., 0.00001%) estimates which are the result of well-understood or “”objective”″^[1] analyses which you expect cannot be improved by further analysis or information collection (e.g., you can directly see/show the probability written in a computer program, a series of coin flips with a fair coin) vs. such estimates that are the result of very subjective estimates probability estimates that you expect you will likely adjust downwards due to further analysis, but where you just can’t immediately rule out some sliver of uncertainty.^[2]
  - Often you should recognize that when you get into small probability spaces for “”subjective”″ questions, you are at a very high risk of being swayed by random noise or deliberate bias in argument/information selection—for example, if you’ve never thought about how nano-tech could cause extinction and listen to someone who gives you a sample of arguments/information in favor of the risks, you likely will not immediately know the counterarguments and you should update downwards based on the expectation that the sample you are exposed to is probably an exaggeration of the underlying evidence.
  - The cognitive/time costs of doing “”subjective”″ analyses likely imposes high opportunity costs (going back to the first point);
  - When your analysis is not legible to other people, you risk high reputational costs (again, which goes back to the first point).
- Based on the above, I agree that in some cases it may be a far more efficient heuristic for decision-making under analytical constraints to use heuristics like trimming off highly-”“subjective”” risk estimates. However, I make this claim based on EV with the recognition that it is still a better general-purpose decision-making algorithm, but which may just not be optimized for application under realistic constraints (e.g., other people not being familiar with your method of thinking, short amount of time for discussion or research, error-prone brains which do not reliably handle lots of considerations and small numbers).^[3]
1. ^
  I dislike using “objective” and “subjective” to make these distinctions, but for simplicity’s sake / for lack of a better alternative at the moment, I will use them.
2. ^
  For more on this thread, see here: https://forum.effectivealtruism.org/posts/WSqLHsuNGoveGXhgz/disentangling-some-important-forecasting-concepts-terms
3. ^
  I advocate for something like this competitive policy debate, since “fighting EV fire with EV fire” risks “burning the discussion”—including the educational value, reputation of participants, etc. But most deliberations do not have to be made within the artificial constraints of competitive policy debate.

Marcel2

Fore­cast­ing With LLMs—An Open and Promis­ing Re­search Direction

Forecasting With LLMs—An Open and Promising Research Direction