[Closed] Hiring a mathematician to work on the learning-theoretic AI alignment agenda

Cross-posted from the AI alignment forum.

UPDATE: The position is now closed. My thanks to everyone who applied, and also to those who spread the word.

The Association for Long Term Existence and Resilience (ALTER) is a new charity for promoting longtermist[1] causes based in Israel. The director is David Manheim, and I (Vanessa Kosoy) am a member of the board. Thanks to a generous grant by the FTX Future Fund Regranting Program, we are recruiting a researcher to join me in working the learning-theoretic research agenda[2]. The position is remote and suitable for candidates in most locations around the world.

Apply here.

Requirements

  • The candidate must have a track record in mathematical research, including proving non-trivial original theorems.

  • The typical candidate has a PhD in theoretical computer science, mathematics, or theoretical physics. However, we do not require the diploma. We do require the relevant knowledge and skills.

  • Background in one or several of the following fields is an advantage: statistical/​computational learning theory, algorithmic information theory, computational complexity theory, functional analysis.

Job Description

The researcher is expected to make progress on open problems in the learning-theoretic agenda. They will have the freedom to choose any of those problems to work on, or come up with their own research direction, as long as I deem the latter sufficiently important in terms of the agenda’s overarching goals. They are expected to achieve results with minimal or no guidance. They are also expected to write their results for publication in academic venues (and/​or informal venues such as the alignment forum), prepare technical presentations et cetera. (That said, we rate researchers according to the estimated impact of their output on reducing AI risk, not according to standard academic publication metrics.)

Here are some open problems from the agenda, described very briefly:

  • Study the mathematical properties of the algorithmic information-theoretic definition of intelligence. Build and analyze formal models of value learning based on this concept.

  • Pursue any of the future research directions listed in the article on infra-Bayesian physicalism.

  • Continue the study of reinforcement learning with imperceptible rewards.

  • Develop a theory of quantilization in reinforcement learning (building on the corresponding control theory).

  • Study the overlap of algorithmic information theory and statistical learning theory.

  • Study infra-Bayesian logic in general, and its applications to infra-Bayesian reinforcement learning in particular.

  • Develop a theory of antitraining: preventing AI systems from learning particular domains while learning other domains.

  • Study the infra-Bayesian Turing reinforcement learning setting. This framework has applications to reflective reasoning and hierarchical modeling, among other things.

  • Develop a theory of reinforcement learning with traps, i.e. irreversible state transitions. Possible research directions include studying the computational complexity of Bayes-optimality for finite state policies (in order to avoid the NP-hardness for arbitrary policies) and bootstrapping from a safe baseline policy.

Terms

The salary is between 60,000 USD/​year to 180,000 USD/​year, depending on the candidate’s prior track record. The work can be done from any location. Further details depend on the candidate’s country of residence.

Personally, I don’t think the long-term future should override every other concern. And, I don’t consider existential risk from AI especially “long term” since it can plausibly materialize in my own lifetime. Hence, “longtermist” is better understood as “important even if you only care about the long-term future” rather than “important only if you care about the long-term future”. ↩︎

The linked article in not very up-to-date in terms of the open problem, but is still a good description on the overall philosophy and toolset. ↩︎

  1. ^

    Personally, I don’t think the long-term future should override every other concern. And, I don’t consider existential risk from AI especially “long term” since it can plausibly materialize in my own lifetime. Hence, “longtermist” is better understood as “important even if you only care about the long-term future” rather than “important only if you care about the long-term future”.

  2. ^

    The linked article in not very up-to-date in terms of the open problem, but is still a good description on the overall philosophy and toolset.