Research associate at SecureBio, Research Affiliate at Kevin Esvelt’s MIT research group Sculpting Evolution, physician. Thinking about ways to safeguard the world from bio.
slg
New NAO preprint: Indoor air sampling for detection of viral nucleic acids
Thanks for flagging, fixed!
Comparing sampling strategies for early detection of stealth biothreats
Without saying much about the merits of various commenters’ arguments, I wanted to check if this is a rhetorical question:
Is anyone on this forum in a better position than the Secretary-General of the UN to analyze, for example, the impact of Israel’s actions on future, unrelated conflicts?
If so, this is an appeal to authority that isn’t very helpful in advancing this discussion. If it’s an actual question, never mind.
What’s the lore behind that update? This was before I followed EA community stuff
Thanks for writing this up, I was skeptical about Scott‘s strong take but didn’t take the time to check the links he provided as proof.
That’s a good pointer, thanks! I’ll drop the reference to Diggans and Leproust for now.
Thanks for the write-up. Just adding a note on how this distinction has practical implications for how to design databases containing hazardous sequences that are required for gene synthesis screening systems.
With gene synthesis screening, companies want to stop bad actors from getting access to the physical DNA or RNA of potential pandemic pathogens. Now, let’s say researchers find the sequence of a novel pathogen that would likely spark a pandemic if released. Most would want this sequence to be added to synthesis screening databases. But some also want this database to be public. The information hazards involved in making such information publicly available could be large, especially if there is attached discussion of how exactly these sequences are dangerous.
Predicting Virus Relative Abundance in Wastewater
I skimmed it, and it looks good to me. Thanks for the work! A separate post on this would be cool.
I set a reminder! Also, let me know if you do end up updating it.
Is there an updated version of this? E.g., GDP numbers have changed.
Flagging that I approve this post; I do believe that the relevant biosecurity actors within EA are thinking about this (though I’d love a more public write-up of this topic). Get in touch if you are thinking about this!
I’m excited that more people are looking into this area!
Flagging that I only read the intro and the conclusion, which might mean I missed something.
High-skilled immigrationFrom my current understanding, high-skilled immigration reform seems promising not so much because of the effects on the migrants (though they are positive) but mostly due to the effect on the destination country’s GDP and technological progress. The latter has sizeable positive spillover effects (that also accrue to poorer countries).
Advocacy for high-skilled immigration is less controversial and thus easier, which could make interventions in this area more valuable when compared to general immigration reform.
Then again, for the reasons above, more individuals are likely already working on improved high-skilled immigration.
MalengoAlso, have you chatted with Johannes Haushofer? He knows EA and recently started Malengo, which wants to facilitate educational migration from low-income countries. I’d assume he has thought about these topics a bunch.
Comment by Paul Christiano on Lesswrong:
“”RLHF and Fine-Tuning have not worked well so far. Models are often unhelpful, untruthful, inconsistent, in many ways that had been theorized in the past. We also witness goal misspecification, misalignment, etc. Worse than this, as models become more powerful, we expect more egregious instances of misalignment, as more optimization will push for more and more extreme edge cases and pseudo-adversarial examples.”″
These three links are:
The first is Mysteries of mode collapse, which claims that RLHF (as well as OpenAI’s supervised fine-tuning on highly-rated responses) decreases entropy. This doesn’t seem particularly related to any of the claims in this paragraph, and I haven’t seen it explained why this is a bad thing. I asked on the post but did not get a response.
The second is Discovering language model behaviors with model-written evaluations and shows that Anthropic’s models trained with RLHF have systematically different personalities than the pre-trained model. I’m not exactly sure what claims you are citing, but I think you are making some really wild leaps.
The third is Compendium of problems with RLHF, which primarily links to the previous 2 failures and then discusses theoretical limitations.
I think these are bad citations for the claim that methods are “not working well” or that current evidence points towards trouble.
The current problems you list—”unhelpful, untruthful, and inconsistent”—don’t seem like good examples to illustrate your point. These are mostly caused by models failing to correctly predict which responses a human would rate highly. That happens because models have limited capabilities and is rapidly improving as models get smarter. These are not the problems that most people in the community are worried about, and I think it’s misleading to say this is what was “theorized” in the past.
I think RLHF is obviously inadequate for aligning really powerful models, both because you cannot effectively constrain a deceptively aligned model and because human evaluators will eventually not be able to understand the consequences of proposed actions. And I think it is very plausible that large language models will pose serious catastrophic risks from misalignment before they are transformative (it seems very hard to tell). But I feel like this post isn’t engaging with the substance of those concerns or sensitive to the actual state of evidence about how severe the problem looks like it will be or how well existing mitigations might work.
This post reads like it wants to convince its readers that AGI is near/will spell doom, picking and spelling out arguments in a biased way.
Just because many ppl on the Forum and LW (including myself) believe that AI Safety is very important and isn’t given enough attention by important actors, I don’t want to lower our standards for good arguments in favor of more AI Safety.
Some parts of the post that I find lacking:
“We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to take it down.”
I don’t think more than 1⁄3 of ML researchers or engineers at DeepMind, OpenAI, or Anthropic would sign this statement.
“No one knows how to predict AI capabilities.”
Many people are trying though (Ajeya Cotra, EpochAI), and I think these efforts aren’t worthless. Maybe a different statement could be: “New AI capabilities appear discontinuously, and we have a hard time predicting such jumps. Given this larger uncertainty, we should worry more about unexpected and potentially dangerous capability increases”.
“RLHF and Fine-Tuning have not worked well so far.”
Not taking into account if RLHF scales (as linked, Jan Leike of OpenAI doesn’t think so) and if RLHF leads to deception, from my cursory reading and experience, ChatGPT shows substantially better behavior than Bing, which might be due to the latter not using RLHF.
Overall I do agree with the article and think that recent developments have been worrying. Still, if the goal of the articles is to get independently-thinking individuals to think about working on AI Safety, I’d prefer less extremized arguments.
Thanks for writing this up. I just wanted to note, the OWID graph that appears while hovering over a hyperlink is neat! @JP Addison or whoever created that, cool work.
Flagging that I’m only about 1⁄3 in.
Regarding this paragraph:
″ An epistemically healthy community seems to be created by acquiring maximally-rational, intelligent, and knowledgeable individuals, with social considerations given second place. Unfortunately, the science does not bear this out. The quality of an epistemic community does not boil down to the de-biasing and training of individuals;[3] more important factors appear to be the community’s composition, its socio-economic structure, and its cultural norms.[4]”
When saying that the science doesn’t bear this out you go on to cite footnotes in your original article. If you want to make the case for this, it might be better to either i) point to very specific ways how the current qualities of EA lead to flawed conclusions, or ii) point to research that makes a similar claim.
Alright; I’ll do so later today!
One crux here might be what improved lives the most over the last three hundred years.
If you think economic growth has been the main driver of (human) well-being, then the mindset of people driving that growth is what the original post might have been hinting at. And I do agree with Richard that many of those people had something closer to master morality in their mind.