Huh? I did not like the double-page style for the non-mobile pdf, as it required some manual rescaling on my PC.
And the mobile version has the main table cut between two pages in a pretty horrible way. I think I would have much preferred a single pdf in the mobile/single page style that is actually optimized for that style, rather than this.
Maybe I should have used the HTML version instead?
More detailed action points on safety from page 32: The Office for AI will coordinate cross-government processes to accurately assess long term AI safety and risks, which will include activities such as evaluating technical expertise in government and the value of research infrastructure. Given the speed at which AI developments are impacting our world, it is also critical that the government takes a more precise and timely approach to monitoring progress on AI, and the government will work to do so.
The government will support the safe and ethical development of these technologies as well as using powers through the National Security & Investment Act to mitigate risks arising from a small number of potentially concerning actors. At a strategic level, the National Resilience Strategy will review our approach to emerging technologies; the Ministry of Defence will set out the details of the approaches by which Defence AI is developed and used; the National AI R&I Programme’s emphasis on AI theory will support safety; and central government will work with the national security apparatus to consider narrow and more general AI as a top-level security issue.
I don’t think I get your argument for why the approximation should not depend on the downstream task. Could you elaborate? I am also a bit confused about the relationship between spread and resiliency: a larger spread of forecasts does not seem to necessarily imply weaker evidence: It seems like for a relatively rare event about which some forecasters could acquire insider information, a large spread might give you stronger evidence.
Imagine E is about the future enactment of a quite unusual government policy, and one of your forecasters is a high ranking government official. Then, if all of your forecasters are relatively well calibrated and have sufficient incentive to report their true beliefs, a 90% forecast for E by the government official and a 1% forecast by everyone else should likely shift your beliefs a lot more towards E than a 10% forecast by everyone.
This seems to connect to the concept of—fmeans: If the utility for an option is proportional to f(p), then the expected utility of your mixture model is equal to the expected utility using the f-mean of the expert’s probabilities p1 and p2 defined as f−1(f(p1)+f(p2)2), as the f in the utility calculation cancels out the f−1. If I recall correctly, all aggregation functions that fulfill some technical conditions on a generalized mean can be written as a f-mean. In the first example, f is just linear, such that the f-mean is the arithmetic mean. In the second example, f is equal to the expected lifespan of 11−(1−p)=1p which yields the harmonic mean. As such, the geometric mean would correspond to the mixture model if and only if utility was logarithmic in p, as the geometric mean is the f-mean corresponding to the logarithm.
For a binary event with “true” probability q, the expected log-score for a forecast of p is qlog(p)∗(1−q)log(1−p)=log(pq(1−p)1−q), which equals log(√p1−p)=0.5log(p1−p) for q=0.5. So the geometric mean of odds would optimize yield the correct utility for the log-score according to the mixture model, if all the events we forecast were essentially coin tosses (which seems like a less satisfying synthesis than I hoped for).
Further questions that might be interesting to analyze from this point of view:
Is there some kind of approximate connection between the Brier score and the geometric mean of odds that could explain the empirical performance of the geometric mean on the Brier score? (There might very well not be anything, as the mixture model might not be the best way to think about aggregation).
What optimization target (under the mixture model) does extremization correspond to? Edit: As extremization is applied after the aggregation, it cannot be interpreted in terms of mixture models (if all forecasters give the same prediction, any f-mean has to have that value, but extremization yields a more extreme prediction.)
Note: After writing this, I noticed that UnexpectedValue’s comment on the top-level post essentially points to the same concept. I decided to still post this, as it seems more accessible than their technical paper while (probably) capturing the key insight.
Edit: Replaced “optimize” by “yield the correct utility for” in the third paragraph.
I wanted to flag that many PhD programs in Europe might require you to have a Master’s degree, or to essentially complete the coursework for Master’s degree during your PhD (as seems to be the case in the US), depending on the kind of undergraduate degree you hold. Obviously, the arguments regarding funding might still partially hold in that case.
Do you have a specific definition of AI Safety in mind? From my (biased) point of view, it looks like large fractions of work that is explicitly branded “AI Safety” is done by people who are at least somewhat adjacent to the EA community. But this becomes a lot less true if you widen the definition to include all work that could be called “AI Safety” (so anything that could conceivably help with avoiding any kind of dangerous malfunction of AI systems, including small scale and easily fixable problems).
Relatedly, what is the likelihood that future iterations of the fellowship might be less US-centric, or include Visa sponsorship?
The job posting states:
“All participants must be eligible to work in the United States and willing to live in Washington, DC, for the duration of their fellowship. We are not able to sponsor US employment visas for participants; US permanent residents (green card holders) are eligible to apply, but fellows who are not US citizens may be ineligible for placements that require a security clearance.”
So my impression would be that it would be pretty difficult to participate for non-US citizens who do not already live in the US.
https://en.wikipedia.org/wiki/Technological_transitions might be relevant.
The Geels book cited in the article (Geels, F.W., 2005. Technological transitions and system innovations. Cheltenham: Edward Elgar Publishing.) has a bunch of interesting case studies I read a while ago and a (I think popular) framework for technological change, but I am not sure the framework is sufficiently precise to be very predictive (and thus empirically validatable).
I don’t have any particular sources on this, but the economic literature on the effects of regulation might be quite relevant. In particular, I do remember attending a lecture arguing that limited liability played an important role for innovation during the industrial revolution.
Facebook has at least experimented with using deep reinforcement learning to adjust its notifications according to https://arxiv.org/pdf/1811.00260.pdf . Depending on which exact features they used for the state space (i.e. if they are causally connected to preferences), the trained agent would at least theoretically have an incentive to change user’s preferences.
The fact that they use DQN rather than a bandit algorithm seems to suggest that what they are doing involves at least some short term planning, but the paper does not seem to analyze the experiments in much detail, so it is unclear whether they could have used a myopic bandit algorithm instead. Either way, seeing this made me update quite a bit towards being more concerned about the effect of recommender systems on preferences.
Depending on your intended audience, it might make sense to add more details for some of the proposals. For example, why is scenario planning a good idea compared to other methods of decision making? Is there a compelling story, or strong empirical evidence for its efficacy?
Some small nitpicks:
There seems to be a mistake here:
“Bostrom argues in The Fragile World Hypothesis that continuous technological development will increase systemic fragility, which can be a source of catastrophic or existential risk. In the Precipice, he estimates the chances of existential catastrophe within the next 100 years at one in six.”
I also find this passage a bit odd:
“One example of moral cluelessness is the repugnant conclusion, which assumes that by adding more people to the world, and proportionally staying above a given average in happiness, one can reach a state of minimal happiness for an infinitely large population.”
The repugnant conclusion might motivate someone to think about cluelessness, but it does not really seem to be an example of cluelessness (the question whether we should accept it might or might not be).
Most of the links to the papers seem to be broken.
So for the maximin we are minimizing over all joint distributions that are κ-close to our initial guess?
“One intuitive way to think about this might be considering circles of radius κ>0 centered around fixed points, representing your first guesses for your options, in the plane. As κ becomes very large, the intersection of the interiors of these circles will approach 100% of their interiors. The distance between the centres becomes small relative to their radii. Basically, you can’t tell the options apart anymore for huge κ. (I might edit this post with a picture...)”
If I can’t tell the options apart any more, how is the 1/n strategy better than just investing everything into a random option? Is it just about variance reduction? Or is the distance metric designed such that shifting the distributions into “bad territories” for more than one of the options requires more movement?
I wrote up my understanding of Popper’s argument on the impossibility of predicting one’s own knowledge (Chapter 22 of The Open Universe) that came up in one of the comment threads. I am still a bit confused about it and would appreciate people pointing out my misunderstandings.Consider a predictor:
A1: Given a sufficiently explicit prediction task, the predictor predicts correctly
A2: Given any such prediction task, the predictor takes time to predict and issue its reply (the task is only completed once the reply is issued).
T1: A1,A2=> Given a self-prediction task, the predictor can only produce a reply after (or at the same time as) the predicted event
T2: A1,A2=> The predictor cannot predict future growth in its own knowledge
A3: The predictor takes longer to produce a reply, the longer the reply is
A4: All replies consist of a description of a physical system and use the same (standard) language.
A1 establishes implicit knowledge of the predictor about the task. A2, A3 and A4 are there to account for the fact that the machine needs to make its prediction explicit.
A5: Now, consider two identical predictors, Tell and Told. At t=0 give Tell the task to predict Told’s state (including it’s physically issued reply) at t=1 from Told’s state at t=0. Give Told the task to predict a third predictor’s state (this seems to later be interpreted as Tell’s state) at t = 1 from that predictor’s state at t=0 (such that Tell and Told will be in the exact same state at t=0).
If I understand correctly, this implies that Tell and Told will be in the same state all the time, as future states are just a function of the task and the initial state.
T3: If Told has not started issuing its reply at t=1, Tell won’t have completed its task at t=1
Argument: Tell must issue its reply to complete the task, but Tell has to go through the same states as Told in equal periods of time, so it cannot have started issuing its reply.
T4: If Told has completed its task at t=1, Tell will complete its task at t=1.
Argument: Tell and Told are identical machines
T5: Tell cannot predict its own future growth in knowledge
Argument: Completing the prediction would take until the knowledge is actually obtained.
A6: The description of the physical state of another description (that is for example written on a punch card) cannot be shorter than said other description.
T6: If Told has completed its task at t=1, Tell must have taken longer to complete its task
This is because its reply is longer than TOLD’s given that it needs to describe TOLD’s reply.
T6 contradicts T4, so some of the assumptions must be wrong.
A5 and A1 are some of the most shaky assumptions. If A1 fails, we cannot predict the future. If A5 fails, there is a problem with self-referential predictions.
This seems to establish too little, as it is about deterministic predictions. Also, the argument does not seem to preclude partial predictions about certain aspects of the world’s state (for example, predictions that are not concerned with the other predictor’s physical output might go through). Less relevantly, the argument heavily relies on (pseudo) self-references and Popper distinguishes between explicit and implicit knowledge and only explicit knowledge seems to be affected by the argument. It is not clear to me that making an explicit prediction about the future necessarily requires me to make all of the knowledge gains I have until then explicit (If we are talking about determinstic predictions of the whole world’s state, I might have to, though, especially if I predict state-by-state ).
Then, if all of my criticism was invalid and the argument was true, I don’t see how we could predict anything in the future at all (like the sun’s existence or the coin flips that were discussed in other comments). Where is the qualitative difference between short- and long-term predictions? (I agree that there is a quantitative one, and it seems quite plausible that some lontermists are undervaluing that.)
I am also slightly discounting the proof, as it uses a lot of words that can be interpreted in different ways. It seems like it is often easier to overlook problems and implicit assumptions in that kind of proof as opposed to a more formal/symbolic proof.
Popper’s ideas seem to have interesting overlap with MIRI’s work.
They are, but I don’t think that the correlation is strong enough to invalidate my statement. P(sun will exist|AI risk is a big deal) seems quite large to me.
Obviously, this is not operationalized very well...
It seems like the proof critically hinges on assertion 2) which is not proven in your link. Can you point me to the pages of the book that contain the proof?
I agree that proofs are logical, but since we’re talking about probabilistic predictions, I’d be very skeptical of the relevance of a proof that does not involve mathematical reasoning,
I don’t think I buy the impossibility proof as predicting future knowledge in a probabilistic manner is possible (most simply, I can predict that if I flip a coin now, that there’s a 50⁄50 chance I’ll know the coin landed on heads/tails in a minute). I think there is some important true point behind your intuition about how knowledge (especially of more complex form than about a coin flip) is hard to predict, but I am almost certain you won’t be able to find any rigorous mathematical proof for this intuition because reality is very fuzzy (in a mathematical sense, what exactly is the difference between the coin flip and knowledge about future technology?) so I’d be a lot more excited about other types of arguments (which will likely only support weaker claims).
Ok, makes sense. I think that our ability to make predictions about the future steeply declines with increasing time horizions, but find it somewhat implausible that it would become entirely uncorrelated with what is actually going to happen in finite time. And it does not seem to be the case that data supporting long term predictions is impossible to get by: while it might be pretty hard to predict whether AI risk is going to be a big deal by whatever measure, I can still be fairly certain that the sun will exist in a 1000 years; in part due to a lot of data collection and hypothesis testing done by physicist.
“The “immeasurability” of the future that Vaden has highlighted has nothing to do with the literal finiteness of the timeline of the universe. It has to do, rather, with the set of all possible futures (which is provably infinite). This set is immeasurable in the mathematical sense of lacking sufficient structure to be operated upon with a well-defined probability measure. “This claim seems confused, as every nonempty set allows for the definition of a probability measure on it and measures on function spaces exist ( https://en.wikipedia.org/wiki/Dirac_measure , https://encyclopediaofmath.org/wiki/Wiener_measure ). To obtain non-existence, further properties of the measure such as translation-invariance need to be required (https://aalexan3.math.ncsu.edu/articles/infdim_meas.pdf) and it is not obvious to me that we would necessarily require such properties.