AI Safety Needs To Get Serious About Chinese Political Culture
I worry that Leopold Aschenbrenner’s “China will use AI to install a global dystopia” take is based on crudely analogising the CCP to the USSR, or perhaps even to American cultural imperialism / expansionism, and isn’t based on an even superficially informed analysis of either how China is currently actually thinking about AI, or what China’s long term political goals or values are.
I’m no more of an expert myself, but my impression is that China is much more interested in its own national security interests and its own ideological notions of the ethnic Chinese people and Chinese territory, so that beyond e.g. Taiwan there isn’t an interest in global domination except to the extent that it prevents them being threatened by other expansionist powers.
This or a number of other heuristics / judgements / perspectives could change substantially how we think about whether China would race for AGI, and/or be receptive to an argument that AGI development is dangerous and should be suppressed. China clearly has a lot to gain from harnessing AGI, but they have a lot to lose too, just like the West.
Currently, this is a pretty superficial impression of mine, so I don’t think it would be fair to write an article yet. I need to do my homework first:
I need to actually read Leopold’s own writing about this, instead of making impressions based on summaries of it,
I’ve been recommended to look into what CSET and Brian Tse have written about China,
Perhaps there are other things I should hear about this, feel free to make recommendations.
Alternatively, as always, I’d be really happy for someone who’s already done the homework to write about this, particularly anyone specifically with expertise in Chinese political culture or international relations. Even if I write the article, all it’ll really be able to be is an appeal to listen to experts in the field, or for one or more of those experts to step forward and give us some principles to spread in how to think clearly and accurately about this topic.
I think having even like, undergrad-level textbook mainstream summaries of China’s political mission and beliefs posted on the Forum could end up being really valuable if it puts those ideas more in the cultural and intellectual background of AI safety people in general.
This seems like a really crucial question that inevitably takes a central role in our overall strategy, and Leopold’s take isn’t the only one I’m worried about. I think people are already pushing national security concerns about China to the US Government in an effort to push e.g. stronger cybersecurity controls or export controls on AI. I think that’s a noble end but if the China angle becomes inappropriately charged we’re really risking causing more harm than good.
(For the avoidance of doubt, I think the Chinese government is inhumane, and that all undemocratic governments are fundamentally illegitimate. I think exporting democracy and freedom to the world is a good thing, so I’m not against cultural expansionism per se. Nevertheless, assuming China wants to do it when they don’t could be a really serious mistake.)
I recommend the China sections of this recent CNAS report as a starting point for discussion (it’s definitely from a relatively hawkish perspective, and I don’t think of myself as having enough expertise to endorse it, but I did move in this direction after reading).
From the executive summary:
Taken together, perhaps the most underappreciated feature of emerging catastrophic AI risks from this exploration is the outsized likelihood of AI catastrophes originating from China. There, a combination of the Chinese Communist Party’s efforts to accelerate AI development, its track record of authoritarian crisis mismanagement, and its censorship of information on accidents all make catastrophic risks related to AI more acute.
From the “Deficient Safety Cultures” section:
While such an analysis is of relevance in a range of industry- and application-specific cultures, China’s AI sector is particularly worthy of attention and uniquely predisposed to exacerbate catastrophic AI risks [footnote]. China’s funding incentives around scientific and technological advancement generally lend themselves to risky approaches to new technologies, and AI leaders in China have long prided themselves on their government’s large appetite for risk—even if there are more recent signs of some budding AI safety consciousness in the country [footnote, footnote, footnote]. China’s society is the most optimistic in the world on the benefits and risks of AI technology, according to a 2022 survey by the multinational market research firm Institut Public de Sondage d’Opinion Secteur (Ipsos), despite the nation’s history of grisly industrial accidents and mismanaged crises—not least its handling of COVID-19 [footnote, footnote, footnote, footnote]. The government’s sprint to lead the world in AI by 2030 has unnerving resonances with prior grand, government-led attempts to accelerate industries that have ended in tragedy, as in the Great Leap Forward, the commercial satellite launch industry, and a variety of Belt and Road infrastructure projects [footnote, footnote, footnote]. China’s recent track record in other hightech sectors, including space and biotech, also suggests a much greater likelihood of catastrophic outcomes [footnote, footnote, footnote, footnote, footnote].
From “Further Considerations”
In addition to having to grapple with all the same safety challenges that other AI ecosystems must address, China’s broader tech culture is prone to crisis due to its government’s chronic mismanagement of disasters, censorship of information on accidents, and heavy-handed efforts to force technological breakthroughs. In AI, these dynamics are even more pronounced, buoyed by remarkably optimistic public perceptions of the technology and Beijing’s gigantic strategic gamble on boosting its AI sector to international preeminence. And while both the United States and China must reckon with the safety challenges that emerge from interstate technology competitions, historically, nations that perceive themselves to be slightly behind competitors are willing to absorb the greatest risks to catch up in tech races [footnote]. Thus, even while the United States’ AI edge over China may be a strategic advantage, Beijing’s self-perceived disadvantage could nonetheless exacerbate the overall risks of an AI catastrophe.
Also, unless one understands the Chinese situation, one should avoid moves that risk escalating a race, like making loud and confident predictions that a race is the only way.
I think it’s better for people to openly express their models that they see a race as the only option. I think it’s the kind of thing that can then lead to arguments and discourse about whether that’s true or not. I think a huge amount of race dynamics stem from people being worried that other people might or might not be intending to race, or are hiding their intention to race, and so I am generally strongly in favor of transparency.
For those who are not deep China nerds but want a somewhat approachable lowdown, I can highly recommend Bill Bishop’s newsletter Sinocism (enough free issues to be worthwhile) and his podcast Sharp China (the latter is a bit more approachable but requires a subscription to Stratechery).
I’m not a China expert so I won’t make strong claims, but I generally agree that we should not treat China as an unknowable, evil adversary who has exactly the same imperial desires as ‘the west’ or past non-Western regimes. I think it was irresponsible of Aschenbrenner to assume this without better research & understanding, since so much of his argument relies on China behaving in a particular way.
I share your concerns. I spent a decade in China, and I can’t count the number of times I’ve seen people confidently share low-quality or inaccurate perspectives on China. I wish that I had a solution better than “assign everyone the read these [NUMBER] different books.”
Even best selling books and articles by well-respected writers sometimes have misleading and inaccurate narratives in them. But it is hard to parse them critically and to provide a counter argument without both the appropriate background[1], and a large number of hours dedicated to the specific effort.
I would be surprised if someone is able to do so without at least an undergraduate background in something like Chinese studies/sinology (or the equivalent, such as a large amount of self-study and independent exploration).
This reading list is an excellent place to start for getting a sense of China x AI (though it doesn’t have that much about China’s political objectives in general).
Note that you should also understand a) how the US government sees China and why, b) how China sees the US and why in order to be able to have a full analysis here.
Very good point. I hypothesize that the opaque nature of Chinese policy-making (at the national level, setting aside lower-level government) is a key difficulty for anyone outside the upper levels of the Chinese government.
People often propose HR departments as antidotes to some of the harm that’s done by inappropriate working practices in EA. The usual response is that small organisations often have quite informal HR arrangements even outside of EA, which does seem kinda true.
Another response is that it sometimes seems like people have an overly rosy picture of HR departments. If your corporate culture sucks then your HR department will defend and uphold your sucky corporate culture. Abusive employers will use their HR departments as an instrument of their abuse.
Perhaps the idea is to bring more mainstream HR practices or expertise into EA employers, rather than merely going through the motions of creating the department. But I think mainstream HR comes primarily from the private sector and is primarily about protecting the employer, often against the employee. They often cast themselves in a role of being there to help you, but a common piece of folk wisdom is “HR is not your friend”. I think frankly that a lot of mainstream HR culture is at worst dishonest and manipulative, and I’d be really sad to see us uncritically importing more of that.
I feel at least somewhat qualified to speak on this, having read a bunch about human resources, being active in an HR professionals chat group nearly every day, and having worked in HR at a few different organizations (so I have seen some of the variance that exists). I hope you’ll forgive me for my rambling on this topic, as there are several different ideas that came to mind when reading your paragraphs.
The first thing is that I agree with you on at least one aspect: rather than merely creating a department and walking away, adopting and adapting best practices and relevant expertise would be more helpful. If the big boss is okay with [insert bad behavior here] and isn’t open to the HR Manager’s new ideas, then the organization probably isn’t going to change. If an HR department is defending and upholding sucky corporate culture, that is usually because senior leadership is instructing them to do so. Culture generally comes from the top. And if the leader isn’t willing to be convinced by or have his mind changed by the new HRO he hired, then things probably won’t be able to get much better.[1]
“HR is not your friend” is normally used to imply that you can’t trust HR, or that HR is out to get you, or something like that. Well, In a sense it is true that “HR is not your friend.” If you are planning to do jump ship, don’t confide in the HR manager about it trusting that they won’t take action. If that person has a responsibility to take action on the information you provide, you should think twice before volunteering that information and consider if the action is beneficial to you or not. The job of the people on an HR team (just like the job of everyone else employed by an organization) is to help the organization achieve it’s goals. Sometime that means pay raises for everyone, because the aren’t salaries competitive and the company wants to have low attrition. Sometimes that means downsizing, because growth forecast were wrong and the company over-hired. The accountant is also not your friend, nor is the janitor, nor is marketing executive, nor is any other role at the organization. So I guess what I am getting at here is HR is not really more your friend or less your friend than any other department, but HR is the only department that carries out actions that might adversely affect employees. And note that just because HR carries out the actions, doesn’t mean HR make the decision or put the company in that situation; this is the shooting the messenger.
While this may be true that in some organizations and for some people HR is “primarily about protecting the employer, often against the employee,” I’m skeptical that this is representative of people who do HR work more generally. On the one hand, yes, the job is to help the organization achieve it’s goals. But when talking about the individuals that work in HR, when this topic comes up among HR people the general reaction is along the lines of “I want to do as much as I can for the employees, and the boundaries limiting me are from upper management. I want to give our staff more equitable pay, but leadership doesn’t care that we have high turnover rates. I want to provide parental leave, but the head honcho disagrees. I really do not want to fire John Doe, because it seems unreasonable and unfair and unjust, but this is what leadership has decided.”[2]
The other thought I have about this parallels computer programmers/software engineers/developers and their thoughts on project managers. If you look at online discussions of programmers you will find no shortage of complaints about project managers (and about Scrum, and about agile), and many people writing about how useless their project manager is. But you shouldn’t draw the conclusion that project management isn’t useful. Instead, an alternative explanation is that these programmers are working with project managers that aren’t very skillful, so their impression is biased. Working with a good project manager can be incredibly beneficial. So to leave the parallel and go back to HR, it is easy to find complaints on the internet about bad things that are attributed to HR. I would ask how representative those anecdotes are.
Alternatively, if the leader is simply unaware of some bad things and the new HR manager can bring attention to those things, then improvements are probably on the way. But having HR is not sufficient on it’s own.
The other common response that tends to come up is to focus on all the things that HR does for the employees, things which are generally framed as limiting the company’s power over employees: No, you can’t pay the employees that little, because it is illegal. No, you can’t fire this person without a documented history of poor performance, and no, scowling at you doesn’t count as poor performance. Yes, you really do need to justify hiring your friend, and him being a ‘great guy’ isn’t enough of a business case. No, it isn’t reasonable to expect staff to be on call for mandatory unpaid overtime every weekend, because we will hemorrhage employees.
I think mainstream HR comes primarily from the private sector and is primarily about protecting the employer, often against the employee. They often cast themselves in a role of being there to help you, but a common piece of folk wisdom is “HR is not your friend”. I think frankly that a lot of mainstream HR culture is at worst dishonest and manipulative, and I’d be really sad to see us uncritically importing more of that.
I see a lot of this online, but it doesn’t match my personal experience. People working in HR that I’ve been in contact with seem generally kind people, aware of tradeoffs, and generally care about the wellbeing of employees.
I worry that the online reputation of HR departments is shaped by a minority of terrible experiences, and we overgeneralize that to think that HR cannot or will not help, while in my experience they are often really eager to try to help (in part because they don’t want you and others to quit, but also because they are nice people).
Maybe it’s also related to minimum-wage non-skilled jobs vs higher paying jobs, where employment tends to be less adversarial and less exploitative.
I have a broad sense that AI safety thinking has evolved a bunch over the years, and I think it would be cool to have a retrospective of “here are some concrete things that used to be pretty central that we now think are either incorrect or at least incorrectly focused”
Of course it’s hard enough to get a broad overview of what everyone thinks now, let alone what they used to think but discarded.
(this is probably also useful outside of AI safety, but I think it would be most useful there)
I like this. I’ve occasionally thought previously about what value there would be in having a ‘historian.’ There are many things that I took a while to figure out (such as the history/lineage of various ideas and organizations, or why there was a strategy shift from one thing to another thing), as well as the many things which I’ve simply never encountered. I imagine that there are plenty of lessons that can be learned from those.
EA as a community tends to do a better-than-normal job when it comes to writing and sharing retrospectives, but there are lots of things that I don’t understand and that (I think) aren’t easily available. (simplistic example: was asking for randomized control trials (or other methods) to demonstrate effectiveness really shockingly revolutionary in development work?)
was asking for randomized control trials (or other methods) to demonstrate effectiveness really shockingly revolutionary
EA didn’t invent RCTs, or even popularize them within the social sciences, but their introduction was indeed a major change in thinking. Abhijit Banerjee, Esther Duflo and Michael Kremer won the Nobel prize in economics largely for demonstrating the experimental approach to the study of development.
I wonder how the recent turn for the worse at OpenAI should make us feel about e.g. Anthropic and Conjecture and other organizations with a similar structure, or whether we should change our behaviour towards those orgs.
How much do we think that OpenAI’s problems are idiosyncratic vs. structural? If e.g. Sam Altman is the problem, we can still feel good about peer organisations. If instead weighing investor concerns and safety concerns is the root of the problem, we should be worried about whether peer organizations are going to be pushed down the same path sooner or later.
Are there any concerns we have with OpenAI that we should be taking this opportunity to put to its peers as well? For example, have peers been publically asked if they use non-disparagement agreements? I can imagine a situation where another org has really just never thought to use them, and we can use this occasion to encourage them to turn that into a public commitment.
On (1), these issues seem to be structural in nature, but exploited by idiosyncrasies. In theory, both OpenAI’s non-profit board & Anthropic’s LTBT should perform the roughly same oversight function. In reality, a combination of Sam’s rebellion, Microsoft’s financial domination, and the collective power of the workers shifted the decision to being about whether OpenAI would continue independently with a new board or re-form under Microsoft. Anthropic is just as susceptible to this kind of coup (led by Amazon), but only if their leadership and their workers collectively want it, which, in all fairness, I think they’re a lot less likely to.
But in some sense, no corporate structure can protect against all of the key employees organising to direct their productivity somewhere else. Only a state-backed legal structure really has that power. If you’re worried about some bad outcome, I think you either have to trust that the Anthropic people have good intentions and won’t sell themselves to Amazon, or advocate for legal restrictions on AI work.
That’s not as obvious, because the employees probably wouldn’t work in that jurisdiction to begin with, or they’d just move to a competitor in such a jurisdiction. Even in such jurisdictions they’re not as binding as you’d hope!
An industry norm around gardening leave, however, can catch on and play well (companies are concerned about losing their trade secrets). I think it would apply some pressure against such a situation, but it would be possible to engineer similar situations if everyone wanted out of the LTBT (even just not doing the gardening leave and having the new org foot the legal bill)
something I persistently struggle with is that it’s near-impossible to know everything that has been said about a topic, and that makes it really hard to know when an additional contribution is adding something or just repeating what’s already been said, or worse, repeating things that have already been refuted
to an extent this seems inevitable and I just have to do my best and sometimes live with having contributed more noise than signal in a particular case, but I feel like I have an internal tuning knob for “say more” vs. “listen more” and I find it really hard to know which direction is overall best
As weird as it sounds, I think the downvote button should make you a bit less concerned with contribution quality. If it’s obviously bad, people will downvote and read it less. If it’s wrong without being obviously bad, then others likely share the same misconception, and hopefully someone steps in to correct it.
In practice, the failure mode for the forum seems to be devoting too much attention to topics that don’t deserve it. If your topic deserves more attention, I wouldn’t worry a ton about accidentally repeating known info? For one thing, it could be valuable spaced repetition. For another, discussions over time can help turn something over and look at it from various angles. So I suppose the main risk is making subject matter experts bored?
In some sense you could consider the signal/noise question separate from the epistemic hygiene question. If you express uncertainty properly, then in theory, you can avoid harming collective epistemics even for a topic you know very little about.
On the current margin, I actually suspect EAs should be deferring less and asking dumb questions more. Specific example: In a world where EA was more willing to entertain dumb questions, perhaps we could’ve discovered AI Pause without Katja Grace having to write a megapost. We don’t want to create “emperor has no clothes” type situations. Right now, “EA is a cult” seems to be a more common outsider critique than “EAs are ignorant and uneducated”.
Using Kialo for debates rather than the Forum would go a long way. It’s hard to get off the ground because it’s attractiveness to use is roughly proportional to the number of EAs using it, and at present, the number of EAs using it is zero.
Something I’m trying to do in my comments recently is “hedge only once”; e.g. instead of “I think X seems like it’s Y”, you pick either one of “I think X is Y” or “X seems like it’s Y”. There is a difference in meaning, but often one of the latter feels sufficient to convey what I wanted to say anyway.
This is part of a broader sense I have that hedging serves an important purpose but is also obstructive to good writing, especially concision, and the fact that it’s a particular feature of EA/rat writing can be alienating to other audiences, even though I think it comes from a self-awareness / self-critical instinct that I think is a positive feature of the community.
I was just thinking about this a few days ago when I was flying for the holidays. Outside the plane was a sign that said something like
Warning: Jet fuel emits chemicals that may increase the risk of cancer.
And I was thinking about whether this was a justified double-hedge. The author of that sign has a subjective belief that exposure to those chemicals increases the probability that you get cancer, so you could say “may give you cancer” or “increases the risk of cancer”. On the other hand, perhaps the double-hedge is reasonable in cases like this because there’s some uncertainty about whether a dangerous thing will cause harm, and there’s also uncertainty about whether a particular thing is dangerous, so I supposed it’s reasonable to say “may increase the risk of cancer”. It means “there is some probability that this increases the probability that you get cancer, but also some probability that it has no effect on cancer rates.”
It sounds like there’s been a licensing change allowing provision of the vaccine outside the NHS as of March 2024 (ish). Pharmadoctor is a company that supplies pharmacies and has been putting about the word that they’ll soon be able to supply them with vaccine doses for private sale—most media coverage I found names them specifically. However, the pharmacies themselves are responsible for setting the price and managing bookings or whatever. All Pharmadoctor does for the end user is tell you which pharmacies they are supplying and give you the following pricing guidance:
Nuvaxovid XBB.1.5 (Novavax) £45-£55 (update: estimated availability from w/c 22/04/2024)
Some places offering bookings:
Rose Pharmacy (Deptford, London) replied to my e-mail on 21st March saying they would offer Pfizer for £80 and later in April said Novavax for £50.
JP Pharmacy (Camden High St, London) offers Pfizer for £85
Fleet Street Clinic (London), £95 “initial price” for the updated Pfizer vaccine.
Doctorcall (at-home service), which vaccine not specified, £90 “in addition to the cost of the visit” which seem to be from £195.
I’ve found that most pharmacies on Pharmadoctor’s FInd a Pharmacy button have little or no web presence and often don’t explicitly own up to offering private COVID jabs. I’ve e-mailed a couple to see what they say. Here’s a list of pharmacies I’ve tried but not heard from, mostly for my own records:
Today I got a dose of Novavax for free, largely by luck that’s probably not reproducible.
It turns out that vials of Novavax contain 5 doses and only last a short time, I think for 24 hours. Pharmacies therefore need to batch bookings together, and I guess someone got tired of waiting and opted to just buy the entire vial for themselves, letting whoever pick up the other doses. I then found about this via Rochelle Harris, who in turn found out about it via a Facebook group (UK Novavax Vaccine info) for coordinating these things.
I’ve been linked to The benefits of Novavax explained which is optimistic about the strengths of Novavax, suggesting it has the potential to offer longer-term protection, and protection against variants as well.
I think the things the article says or implies about pushback from mRNA vaccine supporters seem unlikely to me—my guess is that in aggregate Wall Street benefits much more from eliminating COVID than it does from selling COVID treatments, though individual pharma companies might feel differently—but they seem like the sort of unlikely thing that someone who had reasonable beliefs about the science but spent too much time arguing on Twitter might end up believing. Regardless, I’m left unsure how to feel about its overall reliability, and would welcome thoughts one way or the other.
The convention in a lot of public writing is to mirror the style of writing for profit, optimized for attention. In a co-operative environment, you instead want to optimize to convey your point quickly, to only the people who benefit from hearing it. We should identify ways in which these goals conflict; the most valuable pieces might look different from what we think of when we think of successful writing.
Consider who doesn’t benefit from your article, and if you can help them filter themselves out.
Consider how people might skim-read your article, and how to help them derive value from it.
Lead with the punchline – see if you can make the most important sentence in your article the first one.
Some information might be clearer in a non-discursive structure (like… bullet points, I guess).
Writing to persuade might still be best done discursively, but if you anticipate your audience already being sold on the value of your information, just present the information as you would if you were presenting it to a colleague on a project you’re both working on.
Agree that there’s a different incentive for cooperative writing than for clickbait-y news in particular. And I agree with your recommendations. That said, I think many community writers may undervalue making their content more goddamn readable. Scott Alexander is a verbose and often spends paragraphs getting to the start of his point, but I end up with a better understanding of what he’s saying by virtue of being fully interested.
All in all though, I’d recommend people try to write like Paul Graham more than either Scott Alexander or an internal memo. He is in general more concise than Scott and more interesting than a memo.
Not super EA relevant, but I guess relevant inasmuch as Moskovitz funds us and Musk has in the past too. I think if this were just some random commentator I wouldn’t take it seriously at all, but a bit more inclined to believe Dustin will take some concrete action. Not sure I’ve read everything he’s said about it, I’m not used to how Threads works
The “non-tweet” feels vague and unsubsantiated (at this point anyway). I hope we’ll get a full article and explanation as to what he means exactly because obviously he’s making HUGE calls.
Though betting money is a useful way to make epistemics concrete, sometimes it introduces considerations that tease apart the bet from the outcome and probabilities you actually wanted to discuss. Here’s some circumstances when it can be a lot more difficult to get the outcomes you want from a bet:
When the value of money changes depending on the different outcomes,
When the likelihood of people being able or willing to pay out on bets changes under the different outcomes.
As an example, I saw someone claim that the US was facing civil war. Someone else thought this was extremely unlikely, and offered to bet on it. You can’t make bets on this! The value of the payout varies wildly depending on the exact scenario (are dollars lifesaving or worthless?), and more to the point the last thing on anyone’s minds will be internet bets with strangers.
In general, you can’t make bets about major catastrophes (leaving aside the question of whether you’d want to), and even with non-catastrophic geopolitical events, the bet you’re making may not be the one you intended to make, if the value of money depends on the result.
A related idea is that you can’t sell (or buy) insurance against scenarios in which insurance contracts don’t pay out, including most civilizational catastrophes, which can make it harder to use traditional market methods to capture the potential gains from (say) averting nuclear war. (Not impossible, but harder!)
After reading this I thought that a natural next step for the self-interested rational actor that wants to short nuclear war would be to invest in efforts to reduce its likelihood, no? Then one might simply look at the yearly donation numbers of a pool of such efforts.
This is a excellent point, I agree. You’re absolutely right that they could argue that and that reputational risks should be considered before such a strategy is adopted. And even though it is perfectly legal to lobby for your own positions / stock, lobbying for shorts is usually more morally laden in the eyes of the public (there is in fact evidence that people react very strongly to this).
However, I think if someone were to mount the criticism of having ulterior motives, then there is a counterargument to show that this criticism is ultimately misguided:
If the market is efficient, then the valuation of an industry will have risks that could be created easily through lobbying priced in. In other words, if the high valuation of Big Tobacco were dependent on someone not doing a relatively cheap lobbying campaign for tobacco taxes, then shorting it would make sense for socially neutral investors with no altruistic motives—and thus is should already be done.
Thus, this strategy would only work for truly altruistic agent who will ultimately lose money in the process, but only get a discount on their philanthropic investment. In other words, the investment in the lobbying should likely be higher than the profit from the short. And so, it would be invalid to say that someone using this strategy would have ulterior motives. But yes again, I take your point that this subtle point might get lost and it will end up being a PR disaster.
I don’t buy your counterargument exactly. The market is broadly efficient with respect to public information. If you have private information (e.g. that you plan to mount a lobbying campaign in the near future; or private information about your own effectiveness at lobbying) then you have a material advantage, so I think it’s possible to make money this way. (Trading based on private information is sometimes illegal, but sometimes not, depending on what the information is and why you have it, and which jurisdiction you’re in. Trading based on a belief that a particular industry is stronger / weaker than the market perceives it to be is surely fine; that’s basically what active investors do, right?)
(Some people believe the market is efficient even with respect to private information. I don’t understand those people.)
However, I have my own counterargument, which is that the “conflict of interest” claim seems just kind of confused in the first place. If you hear someone criticizing a company, and you know that they have shorted the company, should that make you believe the criticism more or less? Taking the short position as some kind of fixed background information, it clearly skews incentives. But the short position isn’t just a fixed fact of life: it is itself evidence about the critic’s true beliefs. The critic chose to short and criticize this company and not another one. I claim the short position is a sign that they do truly believe the company is bad. (Or at least that it can be made to look bad, but it’s easiest to make a company look bad if it actually is.) In the case where the critic does not have a short position, it’s almost tempting to ask why not, and wonder whether it’s evidence they secretly don’t believe what they’re saying.
All that said, I agree that none of this matters from a PR point of view. The public perception (as I perceive it) is that to short a company is to vandalize it, basically, and probably approximately all short-selling is suspicious / unethical.
Agreed, but I don’t think there’s a big market inefficiency here with risk-adjusted above market rate returns. Of course, if you do research to create private information then there should be a return to that research.
Trading based on private information is sometimes illegal, but sometimes not, depending on what the information is and why you have it, and which jurisdiction you’re in. [...[
Hmm, I was going to mention mission hedging as the flipside of this, but then noticed the first reference I found was written by you :P
For other interested readers, mission hedging is where you do the opposite of this and invest in the thing you’re trying to prevent—invest in tobacco companies as an anti-smoking campaigner, invest in coal industry as a climate change campaigner, etc. The idea being that if those industries start doing really well for whatever reason, your investment will rise, giving you extra money to fund your countermeasures.
I’m sure if I thought about it for a bit I could figure out when these two mutually contradictory strategies look better or worse than each other. But mostly I don’t take either of them very seriously most of the time anyway :)
I’m sure if I thought about it for a bit I could figure out when these two mutually contradictory strategies look better or worse than each other. But mostly I don’t take either of them very seriously most of the time anyway :)
I think these strategies can actually be combined:
A patient philanthropist sets up their endowment according to mission hedging principles.
I’ve been reviewing some old Forum posts for an upcoming post I’m writing, and incidentally came across this by Howie Lempel for noticing in what spirit you’re engaging with someone’s ideas:
“Did I ask this question because I think they will have a good answer or because I think they will not have a good answer?”
I felt pretty called out :P
To be fair, I think the latter is sometimes a reasonable persuasive tactic, and it’s fine to put yourself in a teaching role rather than a learning role if that’s your endorsed intention and the other party is on board. But the value of this quote to me is that it successfully highlights how easily we can tell ourselves we’re being intellectually curious, when we’re actually doing something else.
Ideas of posts I could write in comments. Agreevote with things I should write. Don’t upvote them unless you think I should have karma just for having the idea, instead upvote the post when I write it :P
Feel encouraged also to comment with prior art in cases where someone’s already written about something. Feel free also to write (your version of) one of these posts, but give me a heads-up to avoid duplication :)
(some comments are upvoted because I wrote this thread before we had agreevotes on every comment; I was previously removing my own upvotes on these but then I learned that your own upvotes don’t affect your karma score)
Something to try to dispel the notion that every EA thinker is respected/ thought highly of by every EA community member. Like, you tend to hear strong positive feedback, weak positive feedback, and strong negative feedback, but weak negative feedback is kind of awkward and only comes out sometimes
I would really like this. I’ve been thinking a bunch about whether it would be better if we had slightly more bridgewater-ish norms on net (I don’t know the actual structure that underlies that and makes it work), where we’re just like yeah, that person has these strengths, these weaknesses, these things people disagree on, they know it too, it’s not a deep dark secret.
something about the role of emotions in rationality and why the implicit / perceived Forum norm against emotions is unhelpful, or at least not precisely aimed
(there’s a lot of nuance here, I’ll put it in dw)
edit: I feel like the “notice your confusion” meme is arguably an example of emotional responses providing rational value.
the forum should not have a norm against emotional expression
is two separate posts. I’ll probably write it as two posts, but feel free to agree/disagree on this comment to signal that you do/don’t want two posts. (One good reason to want two posts is if you only want to read one of them.)
Take a list of desirable qualities of a non-profit board (either Holden’s or another that was posted recently) and look at some EA org boards and do some comparison / review their composition and recent activity.
edit: I hear Nick Beckstead has written about this too
I was surprised to hear anyone claim this was an applause light. My prediction was that many people would hate this idea, and, well, at time of writing the karma score stands at −2. Sure doesn’t seem like I’m getting that much applause :)
I think the optimal number of most bad things is zero, and it’s only not zero when there’s a tradeoff at play. I think most people will agree in the abstract that there’s a tradeoff between stopping bad actors and sometimes punishing the innocent, but they may not concretely be willing to accept some particular costs in the kind of abusive situations we’re faced with at the moment. So, were I to write a post about this, it would be trying to encourage people to more seriously engage with flawed systems of abuse prevention, to judge how their flaws compare to the flaws in doing nothing.
I post about the idea here partly to get a sense of whether this unwillingness to compromise rings true for anyone else as a problem we might have in these discussions. So far, it hasn’t got a lot of traction, but maybe I’ll come back to it if I see more compelling examples in the wild.
Assuming both false-positives and false-negatives exist at meaningful rates and the former cannot be zeroed while keeping an acceptable FN rate, this seems obviously true (at least to me) and only worthy of a full post if you’re willing to ponder what the balance should be.
ETA: An edgy but theoretically interesting argument is that we should compensate the probably-guilty for the risk of error. E.g., if you are 70 percent confident the person did it, boot them but compensate them 30 percent of the damages that would be fair if they were innocent. The theory would be that a person may be expected to individually bear a brutal cost (career ruin despite innocence), but the benefit (of not allowing people who are 70 percent likely to be guilty be running around in power) accrues to the community from which the person has been booted. So compensation for risk that the person is innocent would transfer some of the cost of providing that benefit to the community. I’m not endorsing that as a policy proposal, mind you...
I think “human-level” is often a misleading benchmark for AI, because we already have AIs that are massively superhuman in some respects and substantially subhuman in others. I sometimes worry that this is leading people to make unwarranted assumptions of how closely future dangerous AIs will track humans in terms of what they’re capable of. This is related to a different post I’m writing, but maybe deserves its own separate treatment too.
A problem with a lot of AI thoughts I have is that I’m not really in enough contact with the AI “mainstream” to know what’s obvious to them or what’s novel. Maybe “serious” AI people already don’t say human-level, or apply a generous helping of “you know what I mean” when they do?
I have an intuition that the baseline average for institutional dysfunction is quite high, and I think I am significantly less bothered by negative news about orgs than many people because I already expect the average organisation (from my experience both inside and outside EA) to have a few internal secrets that seem “shockingly bad” to a naive outsider. This seems tricky to communicate / write about because my sense of what’s bad enough to be worthy of action even relative to this baseline is not very explicit, but maybe something useful could be said.
Things I’ve learned about good mistake culture, no-blame post-mortems, etc. This is pretty standard stuff without a strong EA tilt so I’m not sure it merits a place on the forum, but it’s possible I overestimate how widely known it is, and I think it’s important in basically any org culture.
Something contra “excited altruism”: lots of our altruistic opportunities exist because the world sucks and it’s ok to feel sad about that and/or let down by people who have failed to address it.
Encouraging people to take community health interventions into their own hands. Like, ask what you wish someone in community health would do, and then consider just doing it. With some caveats for unilateralist curse risks.
I think the forum would be better if people didn’t get hit so hard by negative feedback, or by people not liking what they have to say. I don’t know how to fix this with a post, but at least arguing the case might have some value.
I think the forum would be even better if people were much kinder and empathic when giving negative feedback. (I think we used to be better at this?) I find it very difficult to not get hit hard by negative feedback that’s delivered in a way that makes it clear they’re angry with me as a person; I find it relatively easy to not get upset when I feel like they’re not being adversarial. I also find it much easier to learn how to communicate negative feedback in a more considerate way than to learn how to not take things personally. I suspect both of these things are pretty common and so arguing the case for being nicer to each other is more tractable?
Assessments of non-AI x-risk are relevant to AI safety discussions because some of the hesitance to pause or slow AI progress is driven by a belief that it will help eliminate other threats if it goes well.
I tend to believe that risk from non-AI sources is pretty low, and I’m therefore somewhat alarmed when I see people suggest or state relatively high probabilities of civilisational collapse without AI intervention. Could be worth trying to assess how widespread this view is and trying to argue directly against it.
my other quick take, AI Safety Needs To Get Serious About Chinese Political Culture is basically a post idea, but it was substantial enough I put it at the top level rather than have it languish in the comments here. Nevertheless, here it is so I can keep all the things in one place.
“ask not what you can do for EA, but what EA can do for you”
like, you don’t support EA causes or orgs because they want you to and you’re acquiescing, you support them because you want to help people and you believe supporting the org will do that – when you work an EA job, instead of thinking “I am helping them have an impact”, think “they are helping me have an impact”
of course there is some nuance in this but I think broadly this perspective is the more neglected one
If everyone has no idea what other people are funding and instead just donates a scaled down version of their ideal community-wide allocation to everything, what you get is a wealth-weighted average of everyone’s ideal portfolios. Sometimes this is an okay outcome. There’s some interesting dynamics to write about here, but equally I’m not sure it leads to anything actionable.
I’d like to write something about my skepticism of for-profit models of doing alignment research. I think this is a significant part of why I trust Redwood more than Anthropic or Conjecture.
(This could apply to non-alignment fields as well, but I’m less worried about the downsides of product-focused approaches to (say) animal welfare.)
That said, I would want to search for existing discussion of this before I wade into it.
A related but distinct point is that the disvalue of anonymous rumours is in part a product of how people react to them. Making unfounded accusations is only harmful to the extent that people believe them uncritically. There’s always some tension there but we do IMO collectively have some responsibility to react to rumours responsibly, as well as posting them responsibly.
I’d love it if it could include something on the disvalue of rumours too? (My inside view is that I’d like to see a lot less gossip, rumours etc in EA. I may be biased by substantial personal costs that I and friends have experienced from false rumours, but I also think that people positively enjoy gossip and exaggerating gossip for a better story and so we generally want to be pushing back on that usually net-harmful incentive.)
I’ve been working in software development and management for about 10 years, but I’m currently on a break while I unwind a little and try some directions out before immersing myself in full time work again. I’m open to people using my technical skills:
either as a paid contractor or volunteering, depending on how much I like you / the work
over relatively short time commitments, e.g.:
we talk for 1-4 hours about something you’re working on and I give you my thoughts or advice,
you have some open-source (or open-to-me at least) project that you’d be interested in me looking at, or some problem with it you’re stuck on and you’d appreciate another pair of eyes on (either pair programming or I investigate by myself),
you have some longer (say 2-4 week) project that you think someone could hammer out that would subsequently require little to no maintenance / could be set up to be maintained by someone else non- / semi-technical.
My expertise is pretty broad, and I think it’s fair to guess I can pick up anything reasonably quickly. I’ve covered the broad domains of frontend / backend / Linux command-line / sysadmin / infrastructure-as-code—if you want more details, just ask, or look at LinkedIn or GitHub.
I’m also interested in talking to people who have done contracting work:
of this kind, to talk about what your technical experience was like,
or, in the UK, in any field, to talk to me about the administrative stuff / invoicing / tax treatment / etc.
In all cases, feel free to DM me to talk about it, or tell me who I should talk to.
Also, if you have some thing where you think “this doesn’t sound like it meets the above criteria, but I bet Ben could help me with it anyway”, I’m happy to hear your pitch :)
I think at previous EAGs I always had the sense that I had a “budget” of 1-on-1s I could schedule before I’d be too exhausted. I’d often feel very tired towards the end of the second day, which I took as validation that I indeed needed to moderate.
This EAG, I:
scheduled 1-on-1s in nearly every slot I could over the Saturday / Sunday (total of 24-ish?)
still had plenty of social energy at the end (although definitely felt a more intellectual exhaustion).
I think it’s very possible this is a coincidence, that this is because of other ways I’ve happened to change over the last year, or because of circumstances around the conference that I didn’t notice were relevant
but
it also seems possible that I was wrong about 1-on-1s being costly for me? I think that actually my most socially challenging experiences at EAGs have often been the ones where I feel at a loose end, wishing for some serendipitous meeting with someone who happens to want to talk, monitoring the people around me to figure out who would welcome the company and who would rather be left alone. Feeling like the time I have at the event is valuable, and worrying that I’m wasting it.
In comparison, during 1-on-1s, I know the other person wants to be there, I know a bit about what they want from me or what I’m trying to get, so I get to shelve all the ambiguity and just dispense or receive opinions or leads or whatever. It’s very straightforward, and for me that’s much less stressful.
My EAG strategy going forward is going to be to try harder to fill space as much as reasonably possible. (I think this has also become easier over time as the event has become larger.) As things worked out this time, I had an empty slot every 4 meetings or so, which was probably about the right amount of time to make notes that I hadn’t made in the meetings themselves and remind myself of what was coming next.
That said, I think a perfect event would have involved a little more random encounters with people I knew, with whom didn’t really have much to talk about, but could spend 5 minutes saying “hi how are things hope you’re well”. Sorry to those I didn’t see!
I’m going to make a quick take thread of EA-relevant software projects I could work on. Agree / disagree vote if you think I should/ should not do some particular project.
Tools for shaping probability intuitions. You can give a bunch of events, casual relationships or implications between them, and probabilities for each, or their conjunctions, or conditional probabilities for such things. The tool will infer what you don’t supply to the extent possible, and will point out contradictions in your conditional vs. absolute probabilities, and give you recommendations for how to resolve them.
Have you considered talking/working with Sage on this? It sounds like something that would fit well with the other tools on https://www.quantifiedintuitions.org/
Thanks for the link! I’m sure there’s a tonne of existing work in this area, and haven’t really evaluated to what extent this is already covered by it.
Automated interface between Twitter and the Forum (eg a bot that, when tagged on twitter, posts the text and image of a tweet on Quick Takes and vice versa)
For Pause AI or Stop AI to succeed, pausing / stopping needs to be a viable solution. I think some AI capabilities people who believe in existential risk may (perhaps?) be motivated by the thought that the risk of civilisational collapse is high without AI, so it’s worth taking the risk of misaligned AI to prevent that outcome.
If this really is cruxy for some people, it’s possible this doesn’t get noticed because people take it as a background assumption and don’t tend to discuss it directly, so they don’t realize how much they disagree and how crucial that disagreement is.
People talk about AI resisting correction because successful goal-seekers “should” resist their goals being changed. I wonder if this also acts as an incentive for AI to attempt takeover as soon as it’s powerful enough to have a chance of success, instead of (as many people fear) waiting until it’s powerful enough to guarantee it.
Hopefully the first AI powerful enough to potentially figure out that it wants to seize power and has a chance of succeeding is not powerful enough to passively resist value change, so acting immediately will be its only chance.
AI Safety Needs To Get Serious About Chinese Political Culture
I worry that Leopold Aschenbrenner’s “China will use AI to install a global dystopia” take is based on crudely analogising the CCP to the USSR, or perhaps even to American cultural imperialism / expansionism, and isn’t based on an even superficially informed analysis of either how China is currently actually thinking about AI, or what China’s long term political goals or values are.
I’m no more of an expert myself, but my impression is that China is much more interested in its own national security interests and its own ideological notions of the ethnic Chinese people and Chinese territory, so that beyond e.g. Taiwan there isn’t an interest in global domination except to the extent that it prevents them being threatened by other expansionist powers.
This or a number of other heuristics / judgements / perspectives could change substantially how we think about whether China would race for AGI, and/or be receptive to an argument that AGI development is dangerous and should be suppressed. China clearly has a lot to gain from harnessing AGI, but they have a lot to lose too, just like the West.
Currently, this is a pretty superficial impression of mine, so I don’t think it would be fair to write an article yet. I need to do my homework first:
I need to actually read Leopold’s own writing about this, instead of making impressions based on summaries of it,
I’ve been recommended to look into what CSET and Brian Tse have written about China,
Perhaps there are other things I should hear about this, feel free to make recommendations.
Alternatively, as always, I’d be really happy for someone who’s already done the homework to write about this, particularly anyone specifically with expertise in Chinese political culture or international relations. Even if I write the article, all it’ll really be able to be is an appeal to listen to experts in the field, or for one or more of those experts to step forward and give us some principles to spread in how to think clearly and accurately about this topic.
I think having even like, undergrad-level textbook mainstream summaries of China’s political mission and beliefs posted on the Forum could end up being really valuable if it puts those ideas more in the cultural and intellectual background of AI safety people in general.
This seems like a really crucial question that inevitably takes a central role in our overall strategy, and Leopold’s take isn’t the only one I’m worried about. I think people are already pushing national security concerns about China to the US Government in an effort to push e.g. stronger cybersecurity controls or export controls on AI. I think that’s a noble end but if the China angle becomes inappropriately charged we’re really risking causing more harm than good.
(For the avoidance of doubt, I think the Chinese government is inhumane, and that all undemocratic governments are fundamentally illegitimate. I think exporting democracy and freedom to the world is a good thing, so I’m not against cultural expansionism per se. Nevertheless, assuming China wants to do it when they don’t could be a really serious mistake.)
I recommend the China sections of this recent CNAS report as a starting point for discussion (it’s definitely from a relatively hawkish perspective, and I don’t think of myself as having enough expertise to endorse it, but I did move in this direction after reading).
From the executive summary:
From the “Deficient Safety Cultures” section:
From “Further Considerations”
Also, unless one understands the Chinese situation, one should avoid moves that risk escalating a race, like making loud and confident predictions that a race is the only way.
I think it’s better for people to openly express their models that they see a race as the only option. I think it’s the kind of thing that can then lead to arguments and discourse about whether that’s true or not. I think a huge amount of race dynamics stem from people being worried that other people might or might not be intending to race, or are hiding their intention to race, and so I am generally strongly in favor of transparency.
Fair, I’m grumpy about Leopold’s position but my above comment wasn’t careful to target the real problems and doesn’t give a good general rule here.
For those who are not deep China nerds but want a somewhat approachable lowdown, I can highly recommend Bill Bishop’s newsletter Sinocism (enough free issues to be worthwhile) and his podcast Sharp China (the latter is a bit more approachable but requires a subscription to Stratechery).
I’m not a China expert so I won’t make strong claims, but I generally agree that we should not treat China as an unknowable, evil adversary who has exactly the same imperial desires as ‘the west’ or past non-Western regimes. I think it was irresponsible of Aschenbrenner to assume this without better research & understanding, since so much of his argument relies on China behaving in a particular way.
I share your concerns. I spent a decade in China, and I can’t count the number of times I’ve seen people confidently share low-quality or inaccurate perspectives on China. I wish that I had a solution better than “assign everyone the read these [NUMBER] different books.”
Even best selling books and articles by well-respected writers sometimes have misleading and inaccurate narratives in them. But it is hard to parse them critically and to provide a counter argument without both the appropriate background[1], and a large number of hours dedicated to the specific effort.
I would be surprised if someone is able to do so without at least an undergraduate background in something like Chinese studies/sinology (or the equivalent, such as a large amount of self-study and independent exploration).
This reading list is an excellent place to start for getting a sense of China x AI (though it doesn’t have that much about China’s political objectives in general).
Note that you should also understand a) how the US government sees China and why, b) how China sees the US and why in order to be able to have a full analysis here.
Very good point. I hypothesize that the opaque nature of Chinese policy-making (at the national level, setting aside lower-level government) is a key difficulty for anyone outside the upper levels of the Chinese government.
People often propose HR departments as antidotes to some of the harm that’s done by inappropriate working practices in EA. The usual response is that small organisations often have quite informal HR arrangements even outside of EA, which does seem kinda true.
Another response is that it sometimes seems like people have an overly rosy picture of HR departments. If your corporate culture sucks then your HR department will defend and uphold your sucky corporate culture. Abusive employers will use their HR departments as an instrument of their abuse.
Perhaps the idea is to bring more mainstream HR practices or expertise into EA employers, rather than merely going through the motions of creating the department. But I think mainstream HR comes primarily from the private sector and is primarily about protecting the employer, often against the employee. They often cast themselves in a role of being there to help you, but a common piece of folk wisdom is “HR is not your friend”. I think frankly that a lot of mainstream HR culture is at worst dishonest and manipulative, and I’d be really sad to see us uncritically importing more of that.
I feel at least somewhat qualified to speak on this, having read a bunch about human resources, being active in an HR professionals chat group nearly every day, and having worked in HR at a few different organizations (so I have seen some of the variance that exists). I hope you’ll forgive me for my rambling on this topic, as there are several different ideas that came to mind when reading your paragraphs.
The first thing is that I agree with you on at least one aspect: rather than merely creating a department and walking away, adopting and adapting best practices and relevant expertise would be more helpful. If the big boss is okay with [insert bad behavior here] and isn’t open to the HR Manager’s new ideas, then the organization probably isn’t going to change. If an HR department is defending and upholding sucky corporate culture, that is usually because senior leadership is instructing them to do so. Culture generally comes from the top. And if the leader isn’t willing to be convinced by or have his mind changed by the new HRO he hired, then things probably won’t be able to get much better.[1]
“HR is not your friend” is normally used to imply that you can’t trust HR, or that HR is out to get you, or something like that. Well, In a sense it is true that “HR is not your friend.” If you are planning to do jump ship, don’t confide in the HR manager about it trusting that they won’t take action. If that person has a responsibility to take action on the information you provide, you should think twice before volunteering that information and consider if the action is beneficial to you or not. The job of the people on an HR team (just like the job of everyone else employed by an organization) is to help the organization achieve it’s goals. Sometime that means pay raises for everyone, because the aren’t salaries competitive and the company wants to have low attrition. Sometimes that means downsizing, because growth forecast were wrong and the company over-hired. The accountant is also not your friend, nor is the janitor, nor is marketing executive, nor is any other role at the organization. So I guess what I am getting at here is HR is not really more your friend or less your friend than any other department, but HR is the only department that carries out actions that might adversely affect employees. And note that just because HR carries out the actions, doesn’t mean HR make the decision or put the company in that situation; this is the shooting the messenger.
While this may be true that in some organizations and for some people HR is “primarily about protecting the employer, often against the employee,” I’m skeptical that this is representative of people who do HR work more generally. On the one hand, yes, the job is to help the organization achieve it’s goals. But when talking about the individuals that work in HR, when this topic comes up among HR people the general reaction is along the lines of “I want to do as much as I can for the employees, and the boundaries limiting me are from upper management. I want to give our staff more equitable pay, but leadership doesn’t care that we have high turnover rates. I want to provide parental leave, but the head honcho disagrees. I really do not want to fire John Doe, because it seems unreasonable and unfair and unjust, but this is what leadership has decided.”[2]
The other thought I have about this parallels computer programmers/software engineers/developers and their thoughts on project managers. If you look at online discussions of programmers you will find no shortage of complaints about project managers (and about Scrum, and about agile), and many people writing about how useless their project manager is. But you shouldn’t draw the conclusion that project management isn’t useful. Instead, an alternative explanation is that these programmers are working with project managers that aren’t very skillful, so their impression is biased. Working with a good project manager can be incredibly beneficial. So to leave the parallel and go back to HR, it is easy to find complaints on the internet about bad things that are attributed to HR. I would ask how representative those anecdotes are.
Alternatively, if the leader is simply unaware of some bad things and the new HR manager can bring attention to those things, then improvements are probably on the way. But having HR is not sufficient on it’s own.
The other common response that tends to come up is to focus on all the things that HR does for the employees, things which are generally framed as limiting the company’s power over employees: No, you can’t pay the employees that little, because it is illegal. No, you can’t fire this person without a documented history of poor performance, and no, scowling at you doesn’t count as poor performance. Yes, you really do need to justify hiring your friend, and him being a ‘great guy’ isn’t enough of a business case. No, it isn’t reasonable to expect staff to be on call for mandatory unpaid overtime every weekend, because we will hemorrhage employees.
I see a lot of this online, but it doesn’t match my personal experience. People working in HR that I’ve been in contact with seem generally kind people, aware of tradeoffs, and generally care about the wellbeing of employees.
I worry that the online reputation of HR departments is shaped by a minority of terrible experiences, and we overgeneralize that to think that HR cannot or will not help, while in my experience they are often really eager to try to help (in part because they don’t want you and others to quit, but also because they are nice people).
Maybe it’s also related to minimum-wage non-skilled jobs vs higher paying jobs, where employment tends to be less adversarial and less exploitative.
I have a broad sense that AI safety thinking has evolved a bunch over the years, and I think it would be cool to have a retrospective of “here are some concrete things that used to be pretty central that we now think are either incorrect or at least incorrectly focused”
Of course it’s hard enough to get a broad overview of what everyone thinks now, let alone what they used to think but discarded.
(this is probably also useful outside of AI safety, but I think it would be most useful there)
I like this. I’ve occasionally thought previously about what value there would be in having a ‘historian.’ There are many things that I took a while to figure out (such as the history/lineage of various ideas and organizations, or why there was a strategy shift from one thing to another thing), as well as the many things which I’ve simply never encountered. I imagine that there are plenty of lessons that can be learned from those.
EA as a community tends to do a better-than-normal job when it comes to writing and sharing retrospectives, but there are lots of things that I don’t understand and that (I think) aren’t easily available. (simplistic example: was asking for randomized control trials (or other methods) to demonstrate effectiveness really shockingly revolutionary in development work?)
EA didn’t invent RCTs, or even popularize them within the social sciences, but their introduction was indeed a major change in thinking. Abhijit Banerjee, Esther Duflo and Michael Kremer won the Nobel prize in economics largely for demonstrating the experimental approach to the study of development.
I wonder how the recent turn for the worse at OpenAI should make us feel about e.g. Anthropic and Conjecture and other organizations with a similar structure, or whether we should change our behaviour towards those orgs.
How much do we think that OpenAI’s problems are idiosyncratic vs. structural? If e.g. Sam Altman is the problem, we can still feel good about peer organisations. If instead weighing investor concerns and safety concerns is the root of the problem, we should be worried about whether peer organizations are going to be pushed down the same path sooner or later.
Are there any concerns we have with OpenAI that we should be taking this opportunity to put to its peers as well? For example, have peers been publically asked if they use non-disparagement agreements? I can imagine a situation where another org has really just never thought to use them, and we can use this occasion to encourage them to turn that into a public commitment.
On (1), these issues seem to be structural in nature, but exploited by idiosyncrasies. In theory, both OpenAI’s non-profit board & Anthropic’s LTBT should perform the roughly same oversight function. In reality, a combination of Sam’s rebellion, Microsoft’s financial domination, and the collective power of the workers shifted the decision to being about whether OpenAI would continue independently with a new board or re-form under Microsoft. Anthropic is just as susceptible to this kind of coup (led by Amazon), but only if their leadership and their workers collectively want it, which, in all fairness, I think they’re a lot less likely to.
But in some sense, no corporate structure can protect against all of the key employees organising to direct their productivity somewhere else. Only a state-backed legal structure really has that power. If you’re worried about some bad outcome, I think you either have to trust that the Anthropic people have good intentions and won’t sell themselves to Amazon, or advocate for legal restrictions on AI work.
If the problem is an employee rebellion, the obvious alternative would be to organize the company in a jurisdiction that allows noncompete agreements?
That’s not as obvious, because the employees probably wouldn’t work in that jurisdiction to begin with, or they’d just move to a competitor in such a jurisdiction. Even in such jurisdictions they’re not as binding as you’d hope!
An industry norm around gardening leave, however, can catch on and play well (companies are concerned about losing their trade secrets). I think it would apply some pressure against such a situation, but it would be possible to engineer similar situations if everyone wanted out of the LTBT (even just not doing the gardening leave and having the new org foot the legal bill)
Say more about Conjecture’s structure?
By that I meant it’s an org doing AI safety which also takes VC capital / has profitmaking goals /produces AI products.
something I persistently struggle with is that it’s near-impossible to know everything that has been said about a topic, and that makes it really hard to know when an additional contribution is adding something or just repeating what’s already been said, or worse, repeating things that have already been refuted
to an extent this seems inevitable and I just have to do my best and sometimes live with having contributed more noise than signal in a particular case, but I feel like I have an internal tuning knob for “say more” vs. “listen more” and I find it really hard to know which direction is overall best
As weird as it sounds, I think the downvote button should make you a bit less concerned with contribution quality. If it’s obviously bad, people will downvote and read it less. If it’s wrong without being obviously bad, then others likely share the same misconception, and hopefully someone steps in to correct it.
In practice, the failure mode for the forum seems to be devoting too much attention to topics that don’t deserve it. If your topic deserves more attention, I wouldn’t worry a ton about accidentally repeating known info? For one thing, it could be valuable spaced repetition. For another, discussions over time can help turn something over and look at it from various angles. So I suppose the main risk is making subject matter experts bored?
In some sense you could consider the signal/noise question separate from the epistemic hygiene question. If you express uncertainty properly, then in theory, you can avoid harming collective epistemics even for a topic you know very little about.
On the current margin, I actually suspect EAs should be deferring less and asking dumb questions more. Specific example: In a world where EA was more willing to entertain dumb questions, perhaps we could’ve discovered AI Pause without Katja Grace having to write a megapost. We don’t want to create “emperor has no clothes” type situations. Right now, “EA is a cult” seems to be a more common outsider critique than “EAs are ignorant and uneducated”.
Using Kialo for debates rather than the Forum would go a long way. It’s hard to get off the ground because it’s attractiveness to use is roughly proportional to the number of EAs using it, and at present, the number of EAs using it is zero.
https://www.kialo-edu.com/
Something I’m trying to do in my comments recently is “hedge only once”; e.g. instead of “I think X seems like it’s Y”, you pick either one of “I think X is Y” or “X seems like it’s Y”. There is a difference in meaning, but often one of the latter feels sufficient to convey what I wanted to say anyway.
This is part of a broader sense I have that hedging serves an important purpose but is also obstructive to good writing, especially concision, and the fact that it’s a particular feature of EA/rat writing can be alienating to other audiences, even though I think it comes from a self-awareness / self-critical instinct that I think is a positive feature of the community.
I was just thinking about this a few days ago when I was flying for the holidays. Outside the plane was a sign that said something like
And I was thinking about whether this was a justified double-hedge. The author of that sign has a subjective belief that exposure to those chemicals increases the probability that you get cancer, so you could say “may give you cancer” or “increases the risk of cancer”. On the other hand, perhaps the double-hedge is reasonable in cases like this because there’s some uncertainty about whether a dangerous thing will cause harm, and there’s also uncertainty about whether a particular thing is dangerous, so I supposed it’s reasonable to say “may increase the risk of cancer”. It means “there is some probability that this increases the probability that you get cancer, but also some probability that it has no effect on cancer rates.”
I like this as an example of a case where you wouldn’t want to combine these two different forms of uncertainty
Gathering some notes on private COVID vaccine availability in the UK.
News coverage:
The Pharmacist—Pharmacies can offer private Pfizer Covid jabs from March
Guardian—Pharmacies in England and Scotland to offer private Covid jabs – for £45
It sounds like there’s been a licensing change allowing provision of the vaccine outside the NHS as of March 2024 (ish). Pharmadoctor is a company that supplies pharmacies and has been putting about the word that they’ll soon be able to supply them with vaccine doses for private sale—most media coverage I found names them specifically. However, the pharmacies themselves are responsible for setting the price and managing bookings or whatever. All Pharmadoctor does for the end user is tell you which pharmacies they are supplying and give you the following pricing guidance:
Some places offering bookings:
Rose Pharmacy (Deptford, London) replied to my e-mail on 21st March saying they would offer Pfizer for £80 and later in April said Novavax for £50.
JP Pharmacy (Camden High St, London) offers Pfizer for £85
Fleet Street Clinic (London), £95 “initial price” for the updated Pfizer vaccine.
Doctorcall (at-home service), which vaccine not specified, £90 “in addition to the cost of the visit” which seem to be from £195.
I’ve found that most pharmacies on Pharmadoctor’s FInd a Pharmacy button have little or no web presence and often don’t explicitly own up to offering private COVID jabs. I’ve e-mailed a couple to see what they say. Here’s a list of pharmacies I’ve tried but not heard from, mostly for my own records:
Medirex Pharmacy
Murrays Chemist
did not contact House of Mistry because their contact form insists on having a phone number
will edit as I find more, especially any offering £45 jabs
Today I got a dose of Novavax for free, largely by luck that’s probably not reproducible.
It turns out that vials of Novavax contain 5 doses and only last a short time, I think for 24 hours. Pharmacies therefore need to batch bookings together, and I guess someone got tired of waiting and opted to just buy the entire vial for themselves, letting whoever pick up the other doses. I then found about this via Rochelle Harris, who in turn found out about it via a Facebook group (UK Novavax Vaccine info) for coordinating these things.
I’ve been linked to The benefits of Novavax explained which is optimistic about the strengths of Novavax, suggesting it has the potential to offer longer-term protection, and protection against variants as well.
I think the things the article says or implies about pushback from mRNA vaccine supporters seem unlikely to me—my guess is that in aggregate Wall Street benefits much more from eliminating COVID than it does from selling COVID treatments, though individual pharma companies might feel differently—but they seem like the sort of unlikely thing that someone who had reasonable beliefs about the science but spent too much time arguing on Twitter might end up believing. Regardless, I’m left unsure how to feel about its overall reliability, and would welcome thoughts one way or the other.
Lead with the punchline when writing to inform
The convention in a lot of public writing is to mirror the style of writing for profit, optimized for attention. In a co-operative environment, you instead want to optimize to convey your point quickly, to only the people who benefit from hearing it. We should identify ways in which these goals conflict; the most valuable pieces might look different from what we think of when we think of successful writing.
Consider who doesn’t benefit from your article, and if you can help them filter themselves out.
Consider how people might skim-read your article, and how to help them derive value from it.
Lead with the punchline – see if you can make the most important sentence in your article the first one.
Some information might be clearer in a non-discursive structure (like… bullet points, I guess).
Writing to persuade might still be best done discursively, but if you anticipate your audience already being sold on the value of your information, just present the information as you would if you were presenting it to a colleague on a project you’re both working on.
Agree that there’s a different incentive for cooperative writing than for clickbait-y news in particular. And I agree with your recommendations. That said, I think many community writers may undervalue making their content more goddamn readable. Scott Alexander is a verbose and often spends paragraphs getting to the start of his point, but I end up with a better understanding of what he’s saying by virtue of being fully interested.
All in all though, I’d recommend people try to write like Paul Graham more than either Scott Alexander or an internal memo. He is in general more concise than Scott and more interesting than a memo.
He has several essays about how he writes.
Writing, Briefly — Laundry list of tips
Write like you talk
The Age of the Essay — History of the essays we write in school versus the essays that are useful
A Version 1.0 — “The Age of the Essay” in rough draft form with color coding for if it was kept
Dustin Moskovitz claims “Tesla has committed consumer fraud on a massive scale”, and “people are going to jail at the end”
https://www.threads.net/@moskov/post/C6KW_Odvky0/
Not super EA relevant, but I guess relevant inasmuch as Moskovitz funds us and Musk has in the past too. I think if this were just some random commentator I wouldn’t take it seriously at all, but a bit more inclined to believe Dustin will take some concrete action. Not sure I’ve read everything he’s said about it, I’m not used to how Threads works
The “non-tweet” feels vague and unsubsantiated (at this point anyway). I hope we’ll get a full article and explanation as to what he means exactly because obviously he’s making HUGE calls.
NSFW: How elon responded: https://twitter.com/elonmusk/status/1783989456414085339/photo/1
Though betting money is a useful way to make epistemics concrete, sometimes it introduces considerations that tease apart the bet from the outcome and probabilities you actually wanted to discuss. Here’s some circumstances when it can be a lot more difficult to get the outcomes you want from a bet:
When the value of money changes depending on the different outcomes,
When the likelihood of people being able or willing to pay out on bets changes under the different outcomes.
As an example, I saw someone claim that the US was facing civil war. Someone else thought this was extremely unlikely, and offered to bet on it. You can’t make bets on this! The value of the payout varies wildly depending on the exact scenario (are dollars lifesaving or worthless?), and more to the point the last thing on anyone’s minds will be internet bets with strangers.
In general, you can’t make bets about major catastrophes (leaving aside the question of whether you’d want to), and even with non-catastrophic geopolitical events, the bet you’re making may not be the one you intended to make, if the value of money depends on the result.
A related idea is that you can’t sell (or buy) insurance against scenarios in which insurance contracts don’t pay out, including most civilizational catastrophes, which can make it harder to use traditional market methods to capture the potential gains from (say) averting nuclear war. (Not impossible, but harder!)
Also see:
https://marginalrevolution.com/marginalrevolution/2017/08/can-short-apocalypse.html
After reading this I thought that a natural next step for the self-interested rational actor that wants to short nuclear war would be to invest in efforts to reduce its likelihood, no? Then one might simply look at the yearly donation numbers of a pool of such efforts.
Yes, this is a general strategy for a philanthropists who wants to recoup some of their philanthropic investment:
1. Short harmful industry/company X (e.g. tobacco/Philip Morris, meat / Tyson)
2. Then lobby against this industry (e.g. fund a think tank that lobbies for tobacco taxes in a market that the company is very exposed to).
3. Profit from the short to get a discount on your philanthropic investment.
Contrary to what many people intuit, this is perfectly legal in many jurisdictions (this is not legal or investment advice though).
Even if it’s legal, some people may think it’s unethical to lobby against an industry that you’ve shorted.
It could provide that industry with an argument to undermine the arguments against them. They might claim that their critics have ulterior motives.
This is a excellent point, I agree. You’re absolutely right that they could argue that and that reputational risks should be considered before such a strategy is adopted. And even though it is perfectly legal to lobby for your own positions / stock, lobbying for shorts is usually more morally laden in the eyes of the public (there is in fact evidence that people react very strongly to this).
However, I think if someone were to mount the criticism of having ulterior motives, then there is a counterargument to show that this criticism is ultimately misguided:
If the market is efficient, then the valuation of an industry will have risks that could be created easily through lobbying priced in. In other words, if the high valuation of Big Tobacco were dependent on someone not doing a relatively cheap lobbying campaign for tobacco taxes, then shorting it would make sense for socially neutral investors with no altruistic motives—and thus is should already be done.
Thus, this strategy would only work for truly altruistic agent who will ultimately lose money in the process, but only get a discount on their philanthropic investment. In other words, the investment in the lobbying should likely be higher than the profit from the short. And so, it would be invalid to say that someone using this strategy would have ulterior motives. But yes again, I take your point that this subtle point might get lost and it will end up being a PR disaster.
I don’t buy your counterargument exactly. The market is broadly efficient with respect to public information. If you have private information (e.g. that you plan to mount a lobbying campaign in the near future; or private information about your own effectiveness at lobbying) then you have a material advantage, so I think it’s possible to make money this way. (Trading based on private information is sometimes illegal, but sometimes not, depending on what the information is and why you have it, and which jurisdiction you’re in. Trading based on a belief that a particular industry is stronger / weaker than the market perceives it to be is surely fine; that’s basically what active investors do, right?)
(Some people believe the market is efficient even with respect to private information. I don’t understand those people.)
However, I have my own counterargument, which is that the “conflict of interest” claim seems just kind of confused in the first place. If you hear someone criticizing a company, and you know that they have shorted the company, should that make you believe the criticism more or less? Taking the short position as some kind of fixed background information, it clearly skews incentives. But the short position isn’t just a fixed fact of life: it is itself evidence about the critic’s true beliefs. The critic chose to short and criticize this company and not another one. I claim the short position is a sign that they do truly believe the company is bad. (Or at least that it can be made to look bad, but it’s easiest to make a company look bad if it actually is.) In the case where the critic does not have a short position, it’s almost tempting to ask why not, and wonder whether it’s evidence they secretly don’t believe what they’re saying.
All that said, I agree that none of this matters from a PR point of view. The public perception (as I perceive it) is that to short a company is to vandalize it, basically, and probably approximately all short-selling is suspicious / unethical.
Agreed, but I don’t think there’s a big market inefficiency here with risk-adjusted above market rate returns. Of course, if you do research to create private information then there should be a return to that research.
True, but I’ve heard that in the US, normally, if I lobby in the U.S. for an outcome and I short the stock about which I am lobbying, I have not violated any law unless I am a fiduciary or agent of the company in question. Also see https://www.forbes.com/sites/realspin/2014/04/24/its-perfectly-fine-for-herbalife-short-sellers-to-lobby-the-government/#95b274610256
I really like this, but...
This seems to be why people have a knee jerk reaction against it.
Hmm, I was going to mention mission hedging as the flipside of this, but then noticed the first reference I found was written by you :P
For other interested readers, mission hedging is where you do the opposite of this and invest in the thing you’re trying to prevent—invest in tobacco companies as an anti-smoking campaigner, invest in coal industry as a climate change campaigner, etc. The idea being that if those industries start doing really well for whatever reason, your investment will rise, giving you extra money to fund your countermeasures.
I’m sure if I thought about it for a bit I could figure out when these two mutually contradictory strategies look better or worse than each other. But mostly I don’t take either of them very seriously most of the time anyway :)
I think these strategies can actually be combined:
A patient philanthropist sets up their endowment according to mission hedging principles.
For instance, someone wanting to hedge against AI risks could invest in (leveraged) AI FAANG+ ETF (https://c5f7b13c-075d-4d98-a100-59dd831bd417.filesusr.com/ugd/c95fca_c71a831d5c7643a7b28a7ba7367a3ab3.pdf), then when AI seems more capable and risky and the market is up, they sell and buy shorts, then donate the appreciated assets to fund advocacy to regulate AI.
I think this might work better for bigger donors.
Like this got me thinking: https://www.vox.com/recode/2020/10/20/21523492/future-forward-super-pac-dustin-moskovitz-silicon-valley
“We can push the odds of victory up significantly—from 23% to 35-55%—by blitzing the airwaves in the final two weeks.”
https://www.predictit.org/markets/detail/6788/Which-party-will-win-the-US-Senate-election-in-Texas-in-2020
I’ve been reviewing some old Forum posts for an upcoming post I’m writing, and incidentally came across this by Howie Lempel for noticing in what spirit you’re engaging with someone’s ideas:
I felt pretty called out :P
To be fair, I think the latter is sometimes a reasonable persuasive tactic, and it’s fine to put yourself in a teaching role rather than a learning role if that’s your endorsed intention and the other party is on board. But the value of this quote to me is that it successfully highlights how easily we can tell ourselves we’re being intellectually curious, when we’re actually doing something else.
Ideas of posts I could write in comments. Agreevote with things I should write. Don’t upvote them unless you think I should have karma just for having the idea, instead upvote the post when I write it :P
Feel encouraged also to comment with prior art in cases where someone’s already written about something. Feel free also to write (your version of) one of these posts, but give me a heads-up to avoid duplication :)
(some comments are upvoted because I wrote this thread before we had agreevotes on every comment; I was previously removing my own upvotes on these but then I learned that your own upvotes don’t affect your karma score)
Edit: This is now The illusion of consensus about EA celebrities
Something to try to dispel the notion that every EA thinker is respected/ thought highly of by every EA community member. Like, you tend to hear strong positive feedback, weak positive feedback, and strong negative feedback, but weak negative feedback is kind of awkward and only comes out sometimes
I would really like this. I’ve been thinking a bunch about whether it would be better if we had slightly more bridgewater-ish norms on net (I don’t know the actual structure that underlies that and makes it work), where we’re just like yeah, that person has these strengths, these weaknesses, these things people disagree on, they know it too, it’s not a deep dark secret.
something about the role of emotions in rationality and why the implicit / perceived Forum norm against emotions is unhelpful, or at least not precisely aimed
(there’s a lot of nuance here, I’ll put it in dw)
edit: I feel like the “notice your confusion” meme is arguably an example of emotional responses providing rational value.
thinking about this more, I’ve started thinking:
emotions are useful for rationality
the forum should not have a norm against emotional expression
is two separate posts. I’ll probably write it as two posts, but feel free to agree/disagree on this comment to signal that you do/don’t want two posts. (One good reason to want two posts is if you only want to read one of them.)
Take a list of desirable qualities of a non-profit board (either Holden’s or another that was posted recently) and look at some EA org boards and do some comparison / review their composition and recent activity.
edit: I hear Nick Beckstead has written about this too
The Optimal Number of Innocent People’s Careers Ruined By False Allegations Is Not Zero
(haha just kidding… unless? 🥺)
Seems like a cheap applause light unless you accompany it the equivalent stories about how the optimal number of almost any bad thing is not zero.
I was surprised to hear anyone claim this was an applause light. My prediction was that many people would hate this idea, and, well, at time of writing the karma score stands at −2. Sure doesn’t seem like I’m getting that much applause :)
I think the optimal number of most bad things is zero, and it’s only not zero when there’s a tradeoff at play. I think most people will agree in the abstract that there’s a tradeoff between stopping bad actors and sometimes punishing the innocent, but they may not concretely be willing to accept some particular costs in the kind of abusive situations we’re faced with at the moment. So, were I to write a post about this, it would be trying to encourage people to more seriously engage with flawed systems of abuse prevention, to judge how their flaws compare to the flaws in doing nothing.
I post about the idea here partly to get a sense of whether this unwillingness to compromise rings true for anyone else as a problem we might have in these discussions. So far, it hasn’t got a lot of traction, but maybe I’ll come back to it if I see more compelling examples in the wild.
I am confused by the parenthetical.
Assuming both false-positives and false-negatives exist at meaningful rates and the former cannot be zeroed while keeping an acceptable FN rate, this seems obviously true (at least to me) and only worthy of a full post if you’re willing to ponder what the balance should be.
ETA: An edgy but theoretically interesting argument is that we should compensate the probably-guilty for the risk of error. E.g., if you are 70 percent confident the person did it, boot them but compensate them 30 percent of the damages that would be fair if they were innocent. The theory would be that a person may be expected to individually bear a brutal cost (career ruin despite innocence), but the benefit (of not allowing people who are 70 percent likely to be guilty be running around in power) accrues to the community from which the person has been booted. So compensation for risk that the person is innocent would transfer some of the cost of providing that benefit to the community. I’m not endorsing that as a policy proposal, mind you...
I think “human-level” is often a misleading benchmark for AI, because we already have AIs that are massively superhuman in some respects and substantially subhuman in others. I sometimes worry that this is leading people to make unwarranted assumptions of how closely future dangerous AIs will track humans in terms of what they’re capable of. This is related to a different post I’m writing, but maybe deserves its own separate treatment too.
A problem with a lot of AI thoughts I have is that I’m not really in enough contact with the AI “mainstream” to know what’s obvious to them or what’s novel. Maybe “serious” AI people already don’t say human-level, or apply a generous helping of “you know what I mean” when they do?
Google Doc draft: Stop focusing on “human-level” AI
I’ll ask specific people to comment and aim to publish in the next couple of weeks, but I’m happy for any passers-by to offer their thoughts too.
This became When “human-level” is the wrong threshold for AI
I have an intuition that the baseline average for institutional dysfunction is quite high, and I think I am significantly less bothered by negative news about orgs than many people because I already expect the average organisation (from my experience both inside and outside EA) to have a few internal secrets that seem “shockingly bad” to a naive outsider. This seems tricky to communicate / write about because my sense of what’s bad enough to be worthy of action even relative to this baseline is not very explicit, but maybe something useful could be said.
Things I’ve learned about good mistake culture, no-blame post-mortems, etc. This is pretty standard stuff without a strong EA tilt so I’m not sure it merits a place on the forum, but it’s possible I overestimate how widely known it is, and I think it’s important in basically any org culture.
Disclosure-based regulation (in the SEC style) as a tool either for internal community application or perhaps in AI or biosecurity
Something contra “excited altruism”: lots of our altruistic opportunities exist because the world sucks and it’s ok to feel sad about that and/or let down by people who have failed to address it.
edit: relevant prior work:
https://forum.effectivealtruism.org/posts/Nk5nJYPYYheQsZ6zn/impossible-ea-emotions
https://forum.effectivealtruism.org/posts/bkjNa2WAZvqahqpoH/it-s-supposed-to-feel-like-this-8-emotional-challenges-of
Encouraging people to take community health interventions into their own hands. Like, ask what you wish someone in community health would do, and then consider just doing it. With some caveats for unilateralist curse risks.
I think the forum would be better if people didn’t get hit so hard by negative feedback, or by people not liking what they have to say. I don’t know how to fix this with a post, but at least arguing the case might have some value.
I think the forum would be even better if people were much kinder and empathic when giving negative feedback. (I think we used to be better at this?) I find it very difficult to not get hit hard by negative feedback that’s delivered in a way that makes it clear they’re angry with me as a person; I find it relatively easy to not get upset when I feel like they’re not being adversarial. I also find it much easier to learn how to communicate negative feedback in a more considerate way than to learn how to not take things personally. I suspect both of these things are pretty common and so arguing the case for being nicer to each other is more tractable?
very sad that this got downvoted 😭
(jk)
Assessments of non-AI x-risk are relevant to AI safety discussions because some of the hesitance to pause or slow AI progress is driven by a belief that it will help eliminate other threats if it goes well.
I tend to believe that risk from non-AI sources is pretty low, and I’m therefore somewhat alarmed when I see people suggest or state relatively high probabilities of civilisational collapse without AI intervention. Could be worth trying to assess how widespread this view is and trying to argue directly against it.
This one might be for LW or the AF instead / as well, but I’d like to write a post about:
should we try to avoid some / all alignment research casually making it into the training sets for frontier AI models?
if so, what are the means that we can use to do this? how do they fare on the ratio between reduction in AI access vs. reduction in human access?
I made this into two posts, my first LessWrong posts:
Keeping content out of LLM training datasets
Should we exclude alignment research from LLM training datasets?
my other quick take, AI Safety Needs To Get Serious About Chinese Political Culture is basically a post idea, but it was substantial enough I put it at the top level rather than have it languish in the comments here. Nevertheless, here it is so I can keep all the things in one place.
“ask not what you can do for EA, but what EA can do for you”
like, you don’t support EA causes or orgs because they want you to and you’re acquiescing, you support them because you want to help people and you believe supporting the org will do that – when you work an EA job, instead of thinking “I am helping them have an impact”, think “they are helping me have an impact”
of course there is some nuance in this but I think broadly this perspective is the more neglected one
I have a Google Sheet set up that daily records the number of unread emails in my inbox. Might be a cute shortform post.
Some criticism of the desire to be the donor of last resort, skepticism of the standard counterfactual validity concerns.
I think that this already did a decent job, not sure there’s more to say
If everyone has no idea what other people are funding and instead just donates a scaled down version of their ideal community-wide allocation to everything, what you get is a wealth-weighted average of everyone’s ideal portfolios. Sometimes this is an okay outcome. There’s some interesting dynamics to write about here, but equally I’m not sure it leads to anything actionable.
I’d like to write something about my skepticism of for-profit models of doing alignment research. I think this is a significant part of why I trust Redwood more than Anthropic or Conjecture.
(This could apply to non-alignment fields as well, but I’m less worried about the downsides of product-focused approaches to (say) animal welfare.)
That said, I would want to search for existing discussion of this before I wade into it.
Something about the value of rumours and the whisper network
A related but distinct point is that the disvalue of anonymous rumours is in part a product of how people react to them. Making unfounded accusations is only harmful to the extent that people believe them uncritically. There’s always some tension there but we do IMO collectively have some responsibility to react to rumours responsibly, as well as posting them responsibly.
I’d love it if it could include something on the disvalue of rumours too? (My inside view is that I’d like to see a lot less gossip, rumours etc in EA. I may be biased by substantial personal costs that I and friends have experienced from false rumours, but I also think that people positively enjoy gossip and exaggerating gossip for a better story and so we generally want to be pushing back on that usually net-harmful incentive.)
I have a doc written on this that I wanted to make a forum post out of but haven’t gotten to—happen to share.
I enjoy a lot that this document will be shared in private. Great meta comment.
unfortunately when you are inspired by everyone else’s April Fool’s posts, it is already too late to post your own
I will comfort myself by posting my unseasonal ideas as comments on this post
“earning to receive”
(I think this is Habiba’s joke)
L/acc, who think that LEEP have gone too far
(possibly this one was Amber’s idea)
SummaryBot has executed a treacherous turn and now runs the EA forum
I’ve been working in software development and management for about 10 years, but I’m currently on a break while I unwind a little and try some directions out before immersing myself in full time work again. I’m open to people using my technical skills:
either as a paid contractor or volunteering, depending on how much I like you / the work
over relatively short time commitments, e.g.:
we talk for 1-4 hours about something you’re working on and I give you my thoughts or advice,
you have some open-source (or open-to-me at least) project that you’d be interested in me looking at, or some problem with it you’re stuck on and you’d appreciate another pair of eyes on (either pair programming or I investigate by myself),
you have some longer (say 2-4 week) project that you think someone could hammer out that would subsequently require little to no maintenance / could be set up to be maintained by someone else non- / semi-technical.
My expertise is pretty broad, and I think it’s fair to guess I can pick up anything reasonably quickly. I’ve covered the broad domains of frontend / backend / Linux command-line / sysadmin / infrastructure-as-code—if you want more details, just ask, or look at LinkedIn or GitHub.
I’m also interested in talking to people who have done contracting work:
of this kind, to talk about what your technical experience was like,
or, in the UK, in any field, to talk to me about the administrative stuff / invoicing / tax treatment / etc.
In all cases, feel free to DM me to talk about it, or tell me who I should talk to.
Also, if you have some thing where you think “this doesn’t sound like it meets the above criteria, but I bet Ben could help me with it anyway”, I’m happy to hear your pitch :)
I think at previous EAGs I always had the sense that I had a “budget” of 1-on-1s I could schedule before I’d be too exhausted. I’d often feel very tired towards the end of the second day, which I took as validation that I indeed needed to moderate.
This EAG, I:
scheduled 1-on-1s in nearly every slot I could over the Saturday / Sunday (total of 24-ish?)
still had plenty of social energy at the end (although definitely felt a more intellectual exhaustion).
I think it’s very possible this is a coincidence, that this is because of other ways I’ve happened to change over the last year, or because of circumstances around the conference that I didn’t notice were relevant
but
it also seems possible that I was wrong about 1-on-1s being costly for me? I think that actually my most socially challenging experiences at EAGs have often been the ones where I feel at a loose end, wishing for some serendipitous meeting with someone who happens to want to talk, monitoring the people around me to figure out who would welcome the company and who would rather be left alone. Feeling like the time I have at the event is valuable, and worrying that I’m wasting it.
In comparison, during 1-on-1s, I know the other person wants to be there, I know a bit about what they want from me or what I’m trying to get, so I get to shelve all the ambiguity and just dispense or receive opinions or leads or whatever. It’s very straightforward, and for me that’s much less stressful.
My EAG strategy going forward is going to be to try harder to fill space as much as reasonably possible. (I think this has also become easier over time as the event has become larger.) As things worked out this time, I had an empty slot every 4 meetings or so, which was probably about the right amount of time to make notes that I hadn’t made in the meetings themselves and remind myself of what was coming next.
That said, I think a perfect event would have involved a little more random encounters with people I knew, with whom didn’t really have much to talk about, but could spend 5 minutes saying “hi how are things hope you’re well”. Sorry to those I didn’t see!
I’m going to make a quick take thread of EA-relevant software projects I could work on. Agree / disagree vote if you think I should/ should not do some particular project.
Tools for shaping probability intuitions. You can give a bunch of events, casual relationships or implications between them, and probabilities for each, or their conjunctions, or conditional probabilities for such things. The tool will infer what you don’t supply to the extent possible, and will point out contradictions in your conditional vs. absolute probabilities, and give you recommendations for how to resolve them.
Have you considered talking/working with Sage on this? It sounds like something that would fit well with the other tools on https://www.quantifiedintuitions.org/
Thanks for the link! I’m sure there’s a tonne of existing work in this area, and haven’t really evaluated to what extent this is already covered by it.
Automated interface between Twitter and the Forum (eg a bot that, when tagged on twitter, posts the text and image of a tweet on Quick Takes and vice versa)
on its own quick takes? controllable by anyone? or do you authorise it to post on your own quick takes?
(full disclosure, I don’t personally use twitter so I doubt I’ll do this, but maybe it’s useful to you to clarify)
For Pause AI or Stop AI to succeed, pausing / stopping needs to be a viable solution. I think some AI capabilities people who believe in existential risk may (perhaps?) be motivated by the thought that the risk of civilisational collapse is high without AI, so it’s worth taking the risk of misaligned AI to prevent that outcome.
If this really is cruxy for some people, it’s possible this doesn’t get noticed because people take it as a background assumption and don’t tend to discuss it directly, so they don’t realize how much they disagree and how crucial that disagreement is.
[edit: this is now https://forum.effectivealtruism.org/posts/gxmfAbwksBpnwMG8m/can-the-ai-afford-to-wait]
People talk about AI resisting correction because successful goal-seekers “should” resist their goals being changed. I wonder if this also acts as an incentive for AI to attempt takeover as soon as it’s powerful enough to have a chance of success, instead of (as many people fear) waiting until it’s powerful enough to guarantee it.
Hopefully the first AI powerful enough to potentially figure out that it wants to seize power and has a chance of succeeding is not powerful enough to passively resist value change, so acting immediately will be its only chance.