What do you think of the stability under self-modification example in this essay?
I haven’t taken the time to fully understand MIRI’s work. But my reading is that MIRI’s work is incremental without being empirical—like most people working in math & theoretical computer science, they are using proofs to advance their knowledge rather than randomized controlled trials. So this might meet the “tight feedback loops” criterion without meeting the “approached experimentally” criterion.
BTW, you might be interested in this comment of mine about important questions for which it’s hard to gather relevant experimental data.
Here are some related guesses of mine if anyone is interested:
The importance of the far future is so high that there’s nothing to do but bite the bullet and do the best we can to improve it.
MIRI represents a promising approach to improving the far future, but it shouldn’t be the only approach we investigate. For example, I would like to see an organization that attempted to forecast a broad variety of societal and technological trends, predict how they’ll interact, and try to identify the best spots to apply leverage.
The first thing to do is to improve our competency at predicting the future in general. The organization I describe could evolve out of a hedge fund that learned to generate superior returns through long-term trading, for instance. The approach to picking stocks that Charlie Munger, Warren Buffet’s partner, describes in Poor Charlie’s Almanack sounds like the sort of thing that might work for predicting other aspects of how the future will unfold. Munger reads a ton of books and uses a broad variety of mental frameworks to try to understand the assets he evaluates (more of a fox than a hedgehog).
(Interesting to note that the Givewell founders are ex-employees of Bridgewater, one of the world’s top hedge funds.)
A meta-level approach to predictions: push for the legalization of prediction markets that would let us aggregate the views of many people and financially incentivize them to forecast accurately. Although there are likely problems with this approach, e.g. markets for unwanted events creating financial incentives for speculators to cause those unwanted events.
When thinking about the far future, the best we may be able to do is identify specific key parameters that we think will have a positive impact on the future and then use experimental approaches with tight feedback loops to measure whether we are nudging those parameters in the right direction. For example, maybe we think a world with fewer belligerent people is one that’s more likely to survive existential threats. We write a bot that uses sentiment analysis to measure the level of belligerence in online discussion. We observe that the legalization of marijuana in a particular US state causes a noticeable drop in the level of belligerence of people talking online. We sponsor campaigns to legalize marijuana in other states and notice more drops. Etc. This isn’t a serious suggestion; legalizing marijuana in the US makes other countries like Iran and Russia even more belligerent by comparison; it’s just an illustration.
(Cast in these terms, MIRI is valuable if “a taller stack of quality AI safety papers leads to a world that’s more likely to survive AGI threats”.)
But maybe we think that even the truth value of a statement like “a world with fewer belligerent people is more likely to survive existential threats” is essentially a coin toss once you look far enough out. In that case, the best we can do might be to try to attain wealth and positions of power as a movement, while improving our prediction capabilities so they are at least a little better than everyone else’s. Maybe we’d be able to see Bad Stuff on the horizon before others were paying much attention and direct resources to avert it.
It might also be wise to develop a core competency in “averting disasters on the horizon”, whatever that might look like… e.g. practice actually nudging society to see which strategies work effectively. The broad ability to nudge society is one that can be developed through experiment and tight feedback loops, and could be effective for lots of different things.
financial incentives for speculators to cause those unwanted events.
I’ve never understood this argument. There has always been a latent incentive to off CEOs or destroy infrastructure and trade on the resulting stock price swings. In practice this is very difficult to pull off. Prediction markets would be under more scrutiny and thus harder to game in this manner.
To take a step back, this objection is yet another example of one that gets trotted out against prediction markets all the time but which has been addressed in the white papers on the topic.
What do you think of the stability under self-modification example in this essay?
I haven’t taken the time to fully understand MIRI’s work. But my reading is that MIRI’s work is incremental without being empirical—like most people working in math & theoretical computer science, they are using proofs to advance their knowledge rather than randomized controlled trials. So this might meet the “tight feedback loops” criterion without meeting the “approached experimentally” criterion.
BTW, you might be interested in this comment of mine about important questions for which it’s hard to gather relevant experimental data.
Here are some related guesses of mine if anyone is interested:
The importance of the far future is so high that there’s nothing to do but bite the bullet and do the best we can to improve it.
MIRI represents a promising approach to improving the far future, but it shouldn’t be the only approach we investigate. For example, I would like to see an organization that attempted to forecast a broad variety of societal and technological trends, predict how they’ll interact, and try to identify the best spots to apply leverage.
The first thing to do is to improve our competency at predicting the future in general. The organization I describe could evolve out of a hedge fund that learned to generate superior returns through long-term trading, for instance. The approach to picking stocks that Charlie Munger, Warren Buffet’s partner, describes in Poor Charlie’s Almanack sounds like the sort of thing that might work for predicting other aspects of how the future will unfold. Munger reads a ton of books and uses a broad variety of mental frameworks to try to understand the assets he evaluates (more of a fox than a hedgehog).
(Interesting to note that the Givewell founders are ex-employees of Bridgewater, one of the world’s top hedge funds.)
A meta-level approach to predictions: push for the legalization of prediction markets that would let us aggregate the views of many people and financially incentivize them to forecast accurately. Although there are likely problems with this approach, e.g. markets for unwanted events creating financial incentives for speculators to cause those unwanted events.
When thinking about the far future, the best we may be able to do is identify specific key parameters that we think will have a positive impact on the future and then use experimental approaches with tight feedback loops to measure whether we are nudging those parameters in the right direction. For example, maybe we think a world with fewer belligerent people is one that’s more likely to survive existential threats. We write a bot that uses sentiment analysis to measure the level of belligerence in online discussion. We observe that the legalization of marijuana in a particular US state causes a noticeable drop in the level of belligerence of people talking online. We sponsor campaigns to legalize marijuana in other states and notice more drops. Etc. This isn’t a serious suggestion; legalizing marijuana in the US makes other countries like Iran and Russia even more belligerent by comparison; it’s just an illustration.
(Cast in these terms, MIRI is valuable if “a taller stack of quality AI safety papers leads to a world that’s more likely to survive AGI threats”.)
But maybe we think that even the truth value of a statement like “a world with fewer belligerent people is more likely to survive existential threats” is essentially a coin toss once you look far enough out. In that case, the best we can do might be to try to attain wealth and positions of power as a movement, while improving our prediction capabilities so they are at least a little better than everyone else’s. Maybe we’d be able to see Bad Stuff on the horizon before others were paying much attention and direct resources to avert it.
It might also be wise to develop a core competency in “averting disasters on the horizon”, whatever that might look like… e.g. practice actually nudging society to see which strategies work effectively. The broad ability to nudge society is one that can be developed through experiment and tight feedback loops, and could be effective for lots of different things.
Related: Robin Hanson and Charles Twardy AMA on the SciCast tech forecasting project. Some correlates of forecasting success.
I’ve never understood this argument. There has always been a latent incentive to off CEOs or destroy infrastructure and trade on the resulting stock price swings. In practice this is very difficult to pull off. Prediction markets would be under more scrutiny and thus harder to game in this manner.
To take a step back, this objection is yet another example of one that gets trotted out against prediction markets all the time but which has been addressed in the white papers on the topic.