Linch comments on Open Thread: January — March 2023

Linch Mar 31, 2023, 6:29 AM
10 points
3 ∶ 0
I feel confused about how dangerous/costly it is to use LLMs for private documents or thoughts to assist longtermist research, in a way that may wind up in the training data for future iterations of LLMs. Some sample use cases that I’d be worried about:
- Summarizing private AI evals docs about plans to evaluate future models
- Rewrite emails on high-stakes AI gov conversations
- Generate lists of ideas for biosecurity interventions that can be helped/harmed by AI
- Scrub potentially risky/infohazard-y information from a planned public forecasting questions
- Summarize/rewrite speculations of potential near-future AI capabilities gains.
I’m worried about using LLMs for the following reasons:
1. Standard privacy concerns/leakage to dangerous (human) actors
  1. If it’s possible to back out your biosecurity plans from the models, this might give ideas to terrorists/rogue gov’ts.
  2. your infohazards might leak
  3. People might (probabilistically) back out private sensitive communication, which could be embarrassing
    I wouldn’t be surprised if care for consumer privacy at AGI labs for chatbot consumers is much lower than say for emails hosted by large tech companies
    I’ve heard rumors to this effect, also see
  4. (unlikely) your capabilities insights might actually be useful for near-future AI developers.
2. Training models in an undesirable direction:
  1. Give pre-superintelligent AIs more-realistic-than-usual ideas/plans for takeover
  2. Subtly bias the motivations of future AIs in dangerous ways.
  3. Perhaps leak capabilities gains ideas that allows for greater potential for self-improvement.
I’m confused whether these are actually significant concerns, vs pretty minor in the grand scheme of things. Advice/guidance/more considerations highly appreciated!
- Carlos Ramírez Apr 4, 2023, 7:35 PM
  1 point
  0 ∶ 0
  Parent
  The privacy concerns seem more realistic. A rogue superintelligence will have no shortage of ideas, so 2 does not seem very important. As to biasing the motivations of the AI, well, ideally mechanistic interpretability should get to the point we can know for a fact what the motivations of any given AI are, so maybe this is not a concern. I guess for 2a, why are you worried about a pre-superintelligence going rogue? That would be a hell of a fire alarm, since a pre-superintelligence is beatable.
  Something you didn’t mention though: how will you be sure the LLM actually successfully did the task you gave it? These things are not that reliable: you will have to double-check everything for all your use cases, making using it kinda moot.