William_S

Karma: 158

William_S 3 May 2024 18:14 UTC
57 points
2 ∶ 0
on: William_S’s Quick takes
I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to the release of an open source “transformer debugger” tool.
I resigned from OpenAI on February 15, 2024.

William_S 26 Feb 2019 16:41 UTC
42 points
0 ∶ 0
on: After one year of applying for EA jobs: It is really, really hard to get hired by an EA organisation
I wonder how much of the interview/work stuff is duplicated between positions—if there’s a lot of overlap, then maybe it would be useful for someone to create the EA equivalent of TripleByte—run initial interviews/work projects with a third party organization to evaluate quality, pass along to most relevant EA jobs.

William_S 12 May 2019 21:58 UTC
23 points
0 ∶ 0
on: Aligning Recommender Systems as Cause Area
If we want to maximize flow-through effects to AI Alignment, we might want to deliberately steer the approach adopted for aligned recommender systems to one that is also designed to scale to more difficulty problems/more advanced AI systems (like Iterated Amplification). Having an idea become standard in the world of recommender systems could significantly increase the amount of non-saftey researcher effort put towards that idea. Solving the problem a bit earlier with a less scalable approach could close off this opportunity.

William_S 4 May 2024 18:21 UTC
19 points
0 ∶ 0
in reply to: yanni kyriacos’s comment on: William_S’s Quick takes
No comment.

William_S 18 May 2019 21:35 UTC
8 points
0 ∶ 0
on: Aligning Recommender Systems as Cause Area
While fully understanding a user’s preferences and values requires more research, it seems like there are simpler things that could be done by the existing recommender systems that would be a win for users, ie. facebook having a “turn off inflammatory political news” switch (or a list of 5-10 similar switches), where current knowledge would suffice to train a classification system.
It could be the case that this is bottlenecked by the incentives of current companies, in that there isn’t a good revenue model for recommender systems other than advertising, and advertising creates the perverse incentive to keep users on your system as long as possible. Or it might be the case that most recommender systems are effectively monopolies on their respective content, and users will choose an aligned system over an unaligned one if options are available, but otherwise a monopoly faces no pressure to align their system.
In these cases, the bottleneck might be “start and scale one or more new organizations that do aligned recommender systems using current knowledge” rather than “do more research on how to produce more aligned recommender systems”.

William_S 31 Oct 2017 20:50 UTC
8 points
0 ∶ 0
in reply to: WyattTessari’s comment on: Introducing Canada’s first political advocacy group on AI Safety and Technological Unemployment
Seems like the main argument here is that: “The general public will eventually clue in to the stakes around ASI and AI safety and the best we can do is get in early in the debate, frame it as constructively as possible, and provide people with tools (petitions, campaigns) that will be an effective outlet for their concerns.”

One concern about this is that “getting in early in the debate” might move up the time that the debate happens or becomes serious, which could be harmful.

An alternative approach would be to simply build latent capacity—work on issues that are already in the political domain (I think basic income as a solution for technological employment is something that is already out there in Canada), but avoid raising new issues until other groups move into that space too. While you’re doing that, you could build latent capacity (skills, networks) and learn how to effectively advocate in spaces that don’t carry the same risks of prematurely politicizing AI related issues. Then when something related to AI becomes a clear goal for policy advocacy, moving onto it at the right time.

William_S 25 Aug 2015 21:06 UTC
6 points
0 ∶ 0
in reply to: David_Moss’s comment on: EA risks falling into a “meta trap”. But we can avoid it.
Another way of thinking about this is that in an overdetermined environment it seems like there would be a point at which the impact of EA movement building will be “causing a person to join EA sooner” instead of “adding another person to EA” (which is the current basis for evaluating EA movement building impact), which would be much less valuable.

William_S 2 Jan 2015 17:08 UTC
6 points
0 ∶ 0
in reply to: Dale’s comment on: Blind Spots: Compartmentalizing
I think if you want people to think about the meta-level, you would be better off with a post that says “suppose you have an argument for abortion” or “suppose you believe this simple argument X for abortion is correct” (where X is obviously a strawman, and raised as a hypothetical), and asks “what ought you do based on assuming this belief is true”. There may be a less controversial topic to use in this case.

If you want to start an object level on abortion (which, if you believe this argument is true, it seems you ought to), it might be helpful to circulate the article you want to use to start the discussion to a few EAs with varying positions on the topic before posting for feedback, because it is on a topic likely to trigger political buttons.

William_S 25 Aug 2015 19:25 UTC
5 points
0 ∶ 0
on: EA risks falling into a “meta trap”. But we can avoid it.
What sort of feedback signals would we get if EA was currently falling into a meta-trap? What is the current state of those signals?

William_S 28 May 2015 4:02 UTC
5 points
0 ∶ 0
on: Solving donation coordination problems
Maybe price in the cost of staff time spent on the fundraiser—that is, if everyone donates immediately, it takes $X to fill the fundraiser. But if everyone donates at the end, it takes $X + $Y, where $Y is the cost of additional staff time spend on the fundraiser.

William_S 21 May 2015 23:48 UTC
5 points
0 ∶ 0
on: Log-normal lamentations
If anyone is ever at a point where they are significantly discouraged by thoughts along these lines (as I’ve been at times), there’s an Effective Altruist self-help group where you can find other EAs to talk to about how you’re feeling (and it really does help!). The group is hidden, but if you message me, I can point you in the right direction (or you can find information about it on the sidebar of the Effective Altruist facebook group).

William_S 27 Dec 2021 23:55 UTC
4 points
0 ∶ 0
on: Can/should we automate most human decisions, pre-AGI?
One tool that I think would be quite useful is having some kind of website where you gather:
1. Situations: descriptions of decisions that people are facing, and their options
2. Outcomes: the option that they took, and how they felt about it after the fact
Then you could get a description of a decision that someone new is facing and automatically assemble a reference class for them of people with the most similar decisions and how they turned out. Could work without any ML, but language modelling to cluster similar situations would help.
Kind of similar information to a review site, but hopefully can aggregate by situation instead of by product used, and cover decisions that are not in the category of “pick a product to buy”

William_S 26 Dec 2018 22:52 UTC
4 points
0 ∶ 0
in reply to: Lukas_Gloor’s comment on: Why I prioritize moral circle expansion over artificial intelligence alignment
I agree with this. It seems like the world where Moral Circle Expansion is useful is the world where:
The creators of AI are philosophically sophisticated (or persuadable) enough to expand their moral circle if they are exposed to the right arguments or work is put into persuading them.
They are not philosophically sophisticated enough to realize the arguments for expanding the moral circle on their own (seems plausible).
They are not philosophically sophisticated enough to realize that they might want to consider a distribution of arguments that they could have faced and could have persuaded them about what is morally right, and design AI with this in mind (ie CEV), or with the goal of achieving a period of reflection where they can sort out the sort of arguments that they would want to consider.
I think I’d prefer pushing on point 3, as it also encompasses a bunch of other potential philosophical mistakes that AI creators could make.

William_S 26 Feb 2018 3:50 UTC
3 points
0 ∶ 0
in reply to: Jacy’s comment on: Why I prioritize moral circle expansion over artificial intelligence alignment

I don’t think CEV or similar reflection processes reliably lead to wide moral circles. I think they can still be heavily influenced by their initial set-up (e.g. what the values of humanity when reflection begins).

Why do you think this is the case? Do you think there is an alternative reflection process (either implemented by an AI, by a human society, or combination of both) that could be defined that would reliably lead to wide moral circles? Do you have any thoughts on what would it look like?

If we go through some kind of reflection process to determine our values, I would much rather have a reflection process that wasn’t dependent on whether or not MCE occurred before hand, and I think not leading to a wide moral circle should be considered a serious bug in any definition of a reflection process. It seems to me that working on producing this would be a plausible alternative or at least parallel path to directly performing MCE.

William_S 23 Aug 2017 17:21 UTC
3 points
0 ∶ 0
on: Open Thread #38
Any thoughts on individual-level political de-polarization in the United States as a cause area? It seems important, because a functional US government helps with a lot of things, including x-risk. I don’t know whether there are tractable/neglected approaches in the space. It seems possible that interventions on individuals that are intended to reduce polarization and promote understanding of other perspectives, as opposed to pushing a particular viewpoint or trying to lobby politicians, could be neglected. http://web.stanford.edu/~dbroock/published%20paper%20PDFs/broockman_kalla_transphobia_canvassing_experiment.pdf seems like a useful study in this area (it seems possible that this approach could be used for issues on the other side of the political spectrum)

William_S 22 May 2015 0:26 UTC
3 points
0 ∶ 0
on: Log-normal lamentations
I wonder if there’s a large amount of impact to be had in people outside of the tail trying to enhance the effectiveness of people in the tail (these might look like being someone’s personal assistant or sidekick, introducing someone in the tail to someone cool outside of the EA movement, being a solid employee for someone who founds an EA startup, etc.)? Being able to improve impact of someone in the tail (even if you can’t quantify what you accomplished) might avert the social comparison aspect, as one would feel like they’d be able to at least take partial credit for the accomplishments of the EA superstars.

William_S 20 Mar 2015 21:14 UTC
3 points
0 ∶ 0
on: March Open Thread
When considering working for a startup/company with significant positive externalities, would it be far off to estimate your share of impact as (estimate of total impact of the company vs. the world where it did not exist) * (equity share of company)?

This seems easier to estimate than your impact on company as a whole, and matches up with something like the impact certificate model (equity share seems like the best estimate we would have of what impact certificate division might look like). It’s also possible that there are distortions in allocation of money that would lead to an underestimate of true impact.

On the downside, it doesn’t fully account for replaceabilty, and I’m not sure if it meshes with the assessment that “negative externalities don’t matter too much in most cases because someone else would take your job” that seems to be the typical EA position.

William_S 2 Dec 2014 21:15 UTC
3 points
0 ∶ 0
in reply to: Peter Wildeford’s comment on: Spitballing EA career ideas
A couple examples I’ve run across: DataWind (http://en.wikipedia.org/wiki/DataWind), which is now at a more mature stage. Went to a talk by one of the founders recently. They made a really cheap tablet and internet services that work over 2G, which opens up the market of large sections of India currently without internet access. I think they could end up being quite successful.

A early stage example is EyeCheck (http://www.eyechecksolutions.com/), started by a couple of engineers out of undergrad. They’re developing a tool to improve diagnosis of vision problems to increase efficiency of providing glasses (think they’re starting working with NGOs running vision camps).

William_S 21 May 2019 0:49 UTC
2 points
0 ∶ 0
in reply to: IvanVendrov’s comment on: Aligning Recommender Systems as Cause Area
Appreciate that point that they are competing for time (as I was only thinking of monopolies over content).
If the reason it isn’t used is that users don’t “trust that the system will give what they want given a single short description”, then part of the research agenda for aligned recommender systems is not just producing systems that are aligned, but systems where their users have a greater degree of justified trust that they are aligned (placing more emphasis on the user’s experience of interacting with the system). Some of this research could potentially take place with existing classification-based filters.

William_S 6 Nov 2017 18:29 UTC
2 points
0 ∶ 0
on: Introducing Canada’s first political advocacy group on AI Safety and Technological Unemployment
I’ve talked to Wyatt and David, afterwards I am more optimistic that they’ll think about downside risks and be responsive to feedback on their plans. I wasn’t convinced that the plan laid out here is a useful direction, but we didn’t dig into it into enough depth for me to be certain.