Is helpful/friendly :-) Loves to learn. Wants to solve neglected problems. See website for current progress.
Madhav Malhotra
Just to play devil’s advocate (without harmful intentions :-), what are the largest limitations or disclaimers that we should keep in mind regarding your results or methods?
Sorry if I missed this in your post, but how many policies did you analyse that were passed via referendum vs. by legislation? How many at the state level vs. federal US vs. international?
@trevor1 Thank you for the detailed response!
RE: Crossposting to LessWrong
I’ve crossposted it now. If there are other forums relevant to cybersecurity topics in EA in particular, I’d appreciate suggestions :-)
RE: Personal Cybersecurity and IoT
Yes, I agree that the best way to improve cybersecurity with personal IoT devices is to avoid them. I’ll update the wording to be more clear about that.
Here’s a summary of the report from Claude-1 if someone’s looking for an ‘abstract’:
There are several common misconceptions about biological weapons that contribute to underestimating the threat they pose. These include seeing them as strategically irrational, not tactically useful, and too risky for countries to pursue.
In reality, biological weapons have served strategic goals for countries in the past like deterrence and intimidation. Their use could also provide tactical advantages in conflicts.
Countries have historically taken on substantial risks in pursuing risky weapons programs when they believe the strategic benefits outweigh the costs. Accidents and blowback would not necessarily deter programs.
Decisions around biological weapons activities are not always top-down and known to all national leaders. Bureaucratic and individual interests can influence programs apart from formal policy.
International norms and laws alone are insufficient to deter or discover clandestine biological weapons work given lack of verification. COVID has shown existing vulnerabilities.
Dispelling these misconceptions is important for strengthening defenses against the real biological weapons threat, which pandemic has shown remains serious despite decades of effort. More investment is needed.
“There are many other things that could have been done to prevent Russia’s unprovoked, illegal attack on Ukraine. Ukraine keeping nuclear weapons is not one of them.”
Could you explain your thinking more for those not familiar with the military strategy involved? What about having nuclear weapons makes an invasion more viable? Which specific alternatives would be more useful in preventing the attacks and why?
Context: I’m hoping to learn lessons in nuclear security that are transferable to AI safety and biosecurity.
Question: Would you have any case studies or advice to share on how regulatory capture and lobbying was mitigated in US nuclear security regulations and enforcement?
Are there any misconceptions, stereotypes, or tropes that you commonly see in academic literature around nuclear security or biosecurity that you could correct given your perspective inside government?
Could you share the top 3 constraints and benefits you had in improving global nuclear security while you were working for the US DoD compared to now, when you’re working as an academic?
Context: I’m hoping to find lessons from nuclear security that are transferable to the security of bioweapons and transformative AI.
Question: Are there specific reports you could recommend on prevening these nuclear security risks:
Insider threats (including corporate/foreign espionage)
Cyberattacks
Arms races
Illicit / black market proliferation
Fog of war
Any updates on how the event went? :-) Any cause priorities or research questions identified to mitigate existential cybersecurity risks?
A lot of people have gotten the message: “Direct your career towards AI Safety!” from EA. Yet there seem to be way too few opportunities to get mentorship or a paying job in AI safety. (I say this having seen others’ comments on the forum and applied to 5+ fellowships personally where there were 500-3000% more applicants than spots).
What advice would you give to those feeling disenchanted by their inability to make progress in AI safety? How is 80K hours working to better (though perhaps not entirely) balance the supply and demand for AI safety mentorship/jobs?
For what it’s worth, I run an EA university group outside of the U.S (at the University of Waterloo in Canada). I haven’t observed any of the points you mentioned in my experience with the EA group:
We don’t run intro to EA fellowships because we’re a smaller group. We’re not trying to convert more students to be ‘EA’. We more so focus on supporting whoever’s interested in working on EA-relevant projects (ex: a cheap air purifier, a donations advisory site, a cybersecurity algorithm, etc.). Whether they identify with the EA movement or not.
Since we’re not trying to get people to become EA members, we’re not hosting any discussions where a group organiser could convince people to work on AI safety over all else.
No one’s getting paid here. We have grant money that we’ve used for things like hosting an AI governance hackathon. But that money gets used for things like marketing, catering, prizes, etc. - not salaries.
Which university EA groups specifically did you talk to before proclaiming “University EA Groups Need Fixing”? Based only on what I read in your article, a more accurate title seems to be “Columbia EA Needs Fixing”
Out of curiosity @LondonGal, have you received any followups from HLI in response to your critique? I understand you might not be at liberty to share all details, so feel free to respond as you feel appropriate.
Context: I work as a remote developer in a government department.
Practices that help:
Show up at least 3 minutes early to every meeting. Change your clocks to run 3 minutes ahead if you can’t discipline yourself to do it. Shows commitment.
On a related note, take personal time to reflect before a meeting. Think of questions you want to ask or what you want to achieve, even if you’re not hosting the meeting and you just do it for 5 minutes.
Try scheduling a calendar reminder with an intention before the meeting. Ex: Say back what others said before you speak (active listening). Ex: Go out of your way to help. Ex: Red team ideas.
Create a physical calendar and cross off days until the end of a project. Creates urgency.
Displace email communication to some organised form/tracker. Ex: When I have a bunch of bug/features to write code for, I’ll ask people to put their comments in one centralised spreadsheet instead of keeping track of email threads.
Host events to build personal connections. Ex: Games lunches, making cards for someone who just had a baby, etc. Takes virtual relationships a lot further.
Ask for recurring feedback. Ex: in a weekly meeting. Forces people to actually reflect on how you’ve been doing instead of giving superficial answers impromptu. Also, normalises negative feedback as well as positive.
If you do get superficial responses: “X looks awesome!”—ask followups like: “Could you give me an example of what went well so that I know what to keep doing?”
It takes courage to share such detailed stories of goals not going right! Good on you for having the courage to do so :-)
It seems that two kinds of improvements within EA might be helpful to reduce the probability of other folks having similar experiences.
Proactively, we could adjust the incentives promoted (especially by high-visibility organisations like 80K hours). Specifically, I think it would be helpful to:
Recommend that early-career folks try out university programs with internships/coops in the field they think they’d enjoy. This would help error-correct earlier rather than later.
Adjust the articles on high-visibility sites to focus less on finding the “most” impactful career path, but instead one of many impactful career paths. I especially say this because sites like 80K hours have gotten a lot more general traffic ever since they vastly increased marketing. When you’re reaching a broader target audience (especially for the first time), it’s not as essential to urgently direct someone to the exact right career path. It might be a more reasonable goal to get them thinking about a few options. Then, those who want to refine their plan can be directed to more specialised resources within EA (ex: biosecurity → reading list).
To be more specific about what I mean by making content focus on “one of many impactful paths,” here are examples of content rewrites on 80K hour’s career reviews:
Original: “The highest-impact career for you is the one that allows you to make the biggest contribution to solving one of the world’s most pressing problems.”
Rewrite: The highest-impact career for you depends on your unique skills and motivations. Out of the careers that suit you, which ones increase your contributions to solving one of the world’s most pressing problems?
Original: “Below we list some other career paths that we don’t recommend as often or as highly as those above, but which can still often be top options for people we advise.”
Rewrite: Below, we list some career paths that we recommend less frequently than those above. However, they might specifically be a good fit for your unique preferences.
Original: “The lists are based on 10 years of research and experience advising people, and represent the careers it seems to us will be most impactful over the long run if you get started on them now — though of course we can’t be sure what the future holds.”
Rewrite: None, the ending clause on uncertainty is good :-)
Reactively, various efforts have been trying to improve mental health support within EA. I look forward to seeing continued progress in creating easily-accessible collections of resources!
Thank you for your thoughtful questions!
RE: “I guess the goal is to be able to run models on devices controlled by untrusted users, without allowing the user direct access to the weights?”You’re correct in understanding that these techniques are useful for preventing models from being used in unintended ways where models are running on untrusted devices! However, I think of the goal a bit more broadly; the goal is to add another layer of defence behind a cybersecure API (or another trusted execution environment) to prevent a model from being stolen and used in unintended ways.
These methods can be applied when model parameters are distributed on different devices (ex: on a self-driving car that downloads model parameters for low-latency inference time). But they can also be applied when a model is deployed on an API hosted on a trusted server (ex: to reduce the damage caused by a breach).
RE: “without allowing the user direct access to the weights? Because if the user had access to the weights, they could take them as a starting point for fine tuning?”
The four papers I presented don’t focus on allowing authorised parties to use AI models without accessing their weights. However, this is recommended by implementing secure APIs instead of directly distributing model parameters whenever possible in (Shevlane, 2022).
Instead, the papers I presented focused on preventing unauthorised parties from being able to use AI models that they illegitimately acquired. The content about fine-tuning was referring to tests to see if unauthorised parties could fine-tune stolen models back to original performance if they also stole some of the original data used to train the model.
RE: “As far as I can tell, the key problem with all of the methods you cover is that, at some point you have have to have the decrypted weights in the memory of an untrusted device.” and “The DeepLock paper gestures at the possibility of putting the keys in a TPM. I don’t understand their scheduling solution or TPMs well enough to know if that’s feasible, but I’m intuitively suspicious”
You’re correct about the technical hypotheses you had about when models is unencrypted parameters are stored in memory. I agree, the authors generally give vague explanations for how to keep the keys of the models secure.
Personally, I saw the presented techniques as mainly reducing the easiest opportunities for misuse (ex: a sufficiently well-funded actor like a state or large company could plausibly bypass these techniques, whereas a rogue hacker group may lack the knowledge or resources to do so). This is a useful (but not complete) start, since it means that fewer parties with more predictable incentives can be regulated regarding their use of AI. This is relatively preferred compared to the difficulty of regulating the use of a model like LLaMA (or more advanced) after it is publicly leaked.
RE: Given this, I don’t really understand how any of these papers improve over the solution of “just encrypt the weights when not in use”? I feel like there must be something I’m missing here.
You can think of the DeepLock paper as “just encrypt the weights when not in use.” Then, the AdvParams paper becomes: “be intelligent about which parameters you encrypt so that you don’t have to encrypt/decrypt every single parameter out of millions-billions”
In contrast, the preprocessed input paper has nothing to do with encrypting weights. Its aim is to make the possession of the parameters useless (whether encrypted or not), unless you can preprocess your input in the right way with the secret key.
The hardware accelerated retraining paper is similar in that the model’s parameters are intended to be useless (encrypted or not) without the secret key and the hardware scheduling algorithm that determines which neurons get associated with which key. Here, the key is needed to flip the signs of the right weighted inputs at inference time.
RE: Trusted Multiparty Computing
Yes, your analogy is insightful about thinking of the model weights as data contributed by the developer and the in prince data as being contributed by the and user. I certainly agree with (Shevlane, 2022) that we should aim for these kinds of trusted execution environments whenever possible.
However, this may not be possible for all use cases. (I’ve just been listing the one example with a self-driving car that doesn’t have local trusted computing hardware for cost-efficiency purposes, but cannot use servers with these devices for latency reasons. There are lots of other examples in the real world, however.) The other thing to note is that different solutions can be used in combination as “layers of defence.” (Ex: encrypt parameter snapshots from training that aren’t actively being used, while deploying the most updated parameter snapshot with trusted hardware—assuming this is possible for the use case being considered.)
RE: Model Stealing and Side-Channel Attacks
Yes, the current techniques have important limitations that still need to be fixed (including these attacks and just basic fine-tuning as I showed with some of the techniques above). There’s a long way to go in deploying AI algorithms securely :-) In some ways, we’re solving this problem at an unprecedented scale after generalised models like ChatGPT became useful to many actors, without the need for any fine tuning. Though an argument is made about how the Google Cloud Computer Vision platform also faced a similar problem previously (Shevlane, 2022).
Hi!
As I mentioned in the post, I’d delete the database in a month from the post for privacy reasons. My apologies for the inconvenience :/
This is certainly a useful resource for those who live in areas without the effective altruism groups around them! Thank you for sharing :-)
Could you please share more details on which parts of the curriculum would be inaccessible to recent graduates? From the outline of the book alone, it’s hard to estimate the level of technical depth needed.
The UX has so much improvement since the 2022 version of this :-) It feels concise and the scrolling to each new graph makes it interesting to learn each new thing. Kudos to whoever designed it this way!