Tangentially, this made me wonder whether the ppl running EAF/LW/etc are thinking about and “ready” wrt the risk of mass-produced BS from LLMs flooding online spaces, including potentially forums like these.
nora
We did not consider the discussion on specific research projects to be within the scope of this post. As mentioned in the beginning, we tried to cover as much as we could that would be relevant to other field builders and related audiences.
It primarily focusses on information we think might be relevant for other people and initiatives in this space. We also do not go into specific research outputs produced by fellows within the scope of this post
There are a few reasons for why it made sense this way.
As discussed in other parts of this post, a lot of research output has not yet been published. Some teams did publicly share their work (as an example, one of the two teams that worked on “dynamical systems perspective on goal-oriented behavior and relativistic agency” posted their updates on the Alignment Forum: [1] and [2], which we hugely appreciate), some have submitted their manuscripts to academic venues, and several others have not yet. This has been for various reasons including e.g. because they are continuing the project and waiting to only publish at some further level of maturity, (info) hazard considerations and sanity checks, preferences over the format of research output they’d want to pursue and working towards that, or the project was primarily directed at informing the mentor’s research and that may not involve an explicit public output.
From our end, while we might hold preferences for certain insights to flow outwards more efficiently, we also wanted to defer decisions about the form and content of research outputs to the shared judgement between fellows and their respective mentors.
Note that in some of the cases this absence of public communication till now is fairly justifiable, especially in the cases of promising projects that became long term collaborations.
(Fwiw, as we mention in this post, we have also gained a better understanding of how to facilitate outward communication without constraining such research autonomy, we will take into account in future.)
There are also other reasons why detailed evaluation of projects is difficult to do based on partial outputs and mentor-specific inside-view motivations. In the light of all this, we did decide for this reflections post to be a high-level abstraction, and not include either a Research Showcase or a detailed Portfolio Evaluation. Based on what we understand right now, this seems like a reasonable decision.
At the same time, if a project evaluator or somebody in a related capacity wishes to take a look at a more detailed evaluation report, we’d be open to discussing that (under some info sharing constraints) and would be happy to hear from you at contact@pibbss.ai
TLDR; PIBBSS is hiring for a full-time Project Manager who will be responsible for running the second iteration of the PIBBSS Summer Research Fellowship.
To apply, please complete this application form.
---
PIBBSS aims to facilitate knowledge transfer from fields studying intelligent behaviour in natural systems to AI safety and alignment.
The Project Manager will be supported by TJ and Nora (who ran the fellowship in 2022) to help transfer learnings from last year’s fellowship, and work alongside (and manage) 1-3 team members to help execute the program.
More information about the role here.
We accept and evaluate applications on a rolling basis. Note that we are looking to hire as soon as possible and no later than early October.
We’re happy to discuss this opportunity with any potential applicants. Feel free to contact me with any questions you might have at: fellowship@pibbss.ai.
What is the timeline of the Century Fellowship/application? Is there a time when applications will be closed?
Another thought in the gendre “consequentialism+”: capabilitariansim à la Senn and Nussbaumer (e.g. here (h/t TJ) for an intro or SEP) seems attractive to me (among other reasons) because I believe it makes a practically useful abstraction from “what we believe ultimately matters” to “what are the best levers to affect that which we believe ultimatly matters”. (In this case, the suggestion would be, while we might still think that some broad notion of utility is what we consider to ultimately matter morally, given the specific world we live in and its causal structure, focus on improving people’s centrals capabilities (as listed for example in the post linked earlier) is a effective and robust way of promoting that good.
And importantly, consequentialism-viewed-through-the-lens-of-capabilitairnism will equip you with some different intuitions in e.g. political philosophy than a more “straitghtforward” notion of consequentialism will (at least before you reach what I am suggesting here to be a new reflective equilibrium).
FWIW I would be a regular reader of Nuno’s monthly (or some other interval) forum digest. Also think that having a number of other people (potentially with complementary profiles) could be valuable. Given the depth d breadth of EA/EA Forum these days, trying to find the “common denominator of relevance” in the form of a single digest will result in a digest that is of limited usefulness for most readers.
Some of the section ideas are great, in particularly “underupvoted underdogs”.
PIBBSS Summer Research Fellowship—Q&A event
What? Q&A session with the fellowship organizers about the program and application process. You can submit your questions here.
For whom? For everyone curious about the fellowship and for those uncertain whether they should apply.
When? Wednesday 12th January, 7 pm GMT
Where? On Google Meet, add to your calendar
PIBBSS Summer Research Fellowship—Q&A event
What? Q&A session with the fellowship organizers about the program and application process. You can submit your questions here.
For whom? For everyone curious about the fellowship and for those uncertain whether they should apply.
When? Wednesday 12th January, 7 pm GMT
Where? On Google Meet, add to your calendar
Somewhat related: What to do with people? https://forum.effectivealtruism.org/posts/oNY76m8DDWFiLo7nH/what-to-do-with-people
Context: (1) Motivations for fostering EA-relevant interdisciplinary research; (2) “domain scanning” and “epistemic translation” as a way of thinking about interdisciplinary research
List of fields/questions for interdisciplinary AI alignment research
The following list of fields and leading questions could be interesting for interdisciplinry AI alignment reserach. I started to compile this list to provide some anchorage for evaluating the value of interdiscplinary research for EA causes, specifically AI alignment.
Some comments on the list:
Some of these domains are likely already very much on the radar of some people, other’s are more speculative.
In some cases I have a decent idea of concrete lines of question that might be interesting, in other cases all I do is very broadly gesturing that “something here might be of interest”.
I don’t mean this list to be comprehensive or authoritative. On the contrary, this list is definitely skewed by domains I happened to have come across and found myself interested in.
While this list is specific to AI alignment (/safety/governance), I think the same rationale applies to other EA-relevant domains and I’d be excited for other people to compile similar lists relevant to their area of interest/expertise.
Very interested in hearing thoughts on the below!
Target domain: AI alignment/safety/governance
Evolutionary biology
Evolutionary biology seems to have a lot of potentially interesting things to say about AI alignment. Just a few examples include:
The relationship between environment, agent, evolutionary paths (which e.g. relates to to the role of training environments)
Niche construction as an angle on embedded agency
The nature of intelligence
Linguistics and Philosophy of language
Lots of things that are relevant to understanding the nature and origin of (general) intelligence better.
Sub-domains, such as semiotics could, for example, have relevant insights on topics like delegation and interpretability.
Cognitive science and neuroscience
Examples include Minsky’s Society of Minds (“The power of intelligence stems from our vast diversity, not from any single, perfect principle”), Hawkin’s A thousand brains (the role of reference frames for general intelligence), Frinston et al’s Predictive Coding/Predictive Processing (in its most ambitious versions a near universal theory of all things cognition, perception, comprehension and agency), and many more
Information theory
Information theory is hardly news to the AI alignment idea space. However, there might still be value on the table from deeper dives or more out-of-the-orderly applications of its insights. One example of this might be this paper on The Information Theory of Individuality.
Cybernetics/Control Systems
Cybernetics seems straightforwardly relevant to AI alignment. Personally, I’d love to have a piece of writing synthesising the most exciting intellectual developments under cybernetics done by someone with awareness of where the AI alignment field is at currently.
Complex systems studies
What does the study of complex systems have to say about robustness, interoperability, emergent alignment? It also offers insights into and methodology for approaching self-organization and collective intelligence which is interesting in particular in multi-multi scenarios.
Heterodox schools of economic thinking
Schools of thought are trying to reimagine the economy/capitalism and (political) organization, e.g. through decentralization and self-organization, by working on antitrust, by trying to understand potentially radical implications of digitalization on the fabric of the economy, etc. Complexity economics, for example, can help understanding the out-of-equilibrium dynamics that shape much of our economy and lives.
Political economy
An interesting framework for thinking about AI alignment as a socio-technical challenge. Particularly relevant from a multi-multi perspective, or for thinking along the lines of cooperative AI. Pointer: Mapping the Political Economy of Reinforcement Learning Systems: The Case of Autonomous Vehicles
Political theory
The richness of the history of political thought is astonishing; the most obvious might be ideas related to social choice or principles of governance. (A denses while also high-quality overview is offered by this podcast series History Of Ideas.) The crux in making the depth of political thought available and relevant to AI alignment is formalization, which seems extremely undersupplied in current academia for very similar reasons as I’ve argued above.
Management and organizational theory, Institutional economics and Institutional design
Has things to say about e.g. interfaces (read this to get a gist for why I think interfaces are interesting for AI alignment); delegation ( e.g. Organizations and Markets by Herbert SImon; (potentially) the ontology form forms and (the relevant) agent boundaries (e.g. The secret to social forms has been in institutional economics all along?)
Talks for example about desiderata for institutions like robustness (e.g. here), or about how to understand and deal with institutional path-dependencies (e.g. here).
Below, I briefly discuss some motivating reasons, as I see them, to foster more interdisciplinary thought in EA. This includes ways EA’s current set of research topics might have emerged for suboptimal reasons.
More EA-relevant interdisciplinary research : why?
The ocean of knowledge is vast. But the knowledge commonly referenced within EA and longtermism represents only a tiny fraction of this ocean.I argue that EA’s knowledge tradition is skewed for reasons including but not-limited-to the epistemic merit of those bodies of knowledge. There are good reasons for EA to focus in certain areas:
Direct relevance (e.g. if you’re trying to do good, it seems clearly relevant to look into philosophy a bunch; if you’re trying to do good effectively, it seems clearly relevant to look into economics (among others) a bunch; if you came to think that existential risks are a big deal, it is clearly relevant to look into bioengineering, international relations, etc. a bunch; etc.)
Evidence of epistemic merit (e.g. physics has more evidence for epistemic merit than psychology, which in return has more evidence for epistemic merit than astrology; in other words, beliefs gathered from different fields are are likely to pay more/less rent, or are likely to be more/less explanatory virtuous)
However, some of the reasons we’ve ended up with our current foci may not be as good:
The, in parts arbitrary, way academic disciplines have been carved up
Inferential distances between knowledge traditions that hamper the free diffusion of knowledge between disciplines and schools of thought
Having a skewed knowledge basis is problematic. There is a significant likelihood that we are missing out on insights or perspectives that might critically advance our undertaking. We don’t know what we don’t know. We have all the reasons to expect that we have blindspots.
***
I am interested in the potential value and challenges of interdisciplinary research.Neglectedness
(Academic) incentives make it harder for transdisciplinary thought to flourish, resulting in what I expect to be an undersupply thereof. One way of thinking about why we would see an undersupply of interdisciplinry thought is in terms of “market inefficiencies”. For one, individual actors are incentivised (because it’s less risky) to work on topics that are already recognised as interesting by the community (“exploitation”), as opposed to venturing into new bodies of knowledge that might or might not prove insightful (“exploration”). What is “already recognized as valuable by the community”, however, will only in part be determined by epistemic considerations, and in another part be shaped by path-dependencies.
For two, “markets” are insufficiently liquid and thus tend to fail where we cannot easily specify what we want. I’d argue that this is the case for DS/ET work. This is generally true for intellectual work, but is likely even more true for DS/ET work due to the relatively siloed structure of academia that adds additional “transaction costs” to attempts of communicating across disciplinary boundaries.One way to reduce these inefficiencies is by improving the interfaces between the disciplines. “Domain scanning” and “episetmic translation” are precisely about creating such interfaces. Their purpose is to identify knowledge that is concretely relevant to a given target domain and make that knolwege accessible to thinkers entrenched in the “vocabulary” of that target domain. A useful interface between political philosophy and computer science, for example, might require a mathematical formalization of central ideas such as justice.
Challenges
At the same time, doing interdisciplinary well is callenging. For example, interdisciplinary research can only be as valuable as a researcher’s ability to identify knowledge relevant to their target domain; or as a research community’s quality assurance/error correction mechanisms. Phenomena like citogenesis or motivatiogensis are examples of manifestations of these difficulties.
There have been various attempts at overcoming these incentive barriers, for example the Santa Fe Institute whose organizational structure completely disregards scientific disciplines; -ARPAs have a similar flavour; the field of cybernetics which proposed an inherently transdisciplinary view on regulatory systems; or the recent surge in the literature on “mental models” (e.g. here or here).A closer inspection of such examples—in how far they were successful and how they went about it—might bear some interesting insights. I don’t have the capacity to properly puruse such case studies in the near future, but it’s definteily something on my list of potentially promising (side) projects.
If readers are aware of other examples of innovative approaches trying to solve this problem that might make for insightful case studies, I’d love to hear them.
- Jul 12, 2021, 7:40 AM; 12 points) 's comment on nora’s Quick takes by (
The below provides definitions and explanations of “domain scanning” and “epistemic translation”, in an attempt of adding further gears to how interdisciplinary research works.
Domain scanning and epistemic translation
I suggest understanding domain scanning and epistemic translation as a specific type of research that both plays (or ought to play) an important role as part of a larger research progress, or can be usefully pursued as “its own thing”.
Domain Scanning
By domain scanning, I mean the activity of searching through diverse bodies and traditions of knowledge with the goal of identifying insights, ontologies or methods relevant to another body of knowledge or to a research question (e.g. AI alignment, Longtermism, EA).
I call source domains those bodies of knowledge where insights are being drawn from. The body of knowledge that we are trying to inform through this approach is called the target domain. A target domain can be as broad as an entire field or subfield or a specific research problem (in which case I often use the term target problem instead of target domain).Domain scanning isn’t about comprehensively surveying the entire ocean of knowledge, but instead about selectively scouting for “bright spots”—domains that might importantly inform the target domain or problem.
An important rationale for domain scanning is the belief that model selection is a critical part of the research process. By model selection, I mean the way we choose to conceptualize a problem at a high-level of abstraction (as opposed to, say, working out the details given a certain model choice). In practice, however, this step often doesn’t happen at all because most research happens within a paradigm that is already “in the water”.
As an example, say an economist wants to think about a research question related to economic growth. They will think about how to model economic growth and will make choices according to the shape of their research problem. They might for example decide between using an endogenous growth or an exogenous growth model, and other modeling choices at a similar level of abstraction. However, those choices happen within an already comparably limited space of assumptions—in this case namely neoclassical economics. It’s at this higher level of abstraction that I think we’re often not sufficiently looking beyond a given paradigm. Like fish in the water.
Neoclassical economics, as an example, is based on assumptions such as agents being rational and homogenous, and the economy being an equilibrium system. Those are, in fact, not straightforward assumptions to make, as heterodox economics have in recent years slowly been bringing to the attention of the field. Complexity economics, for example, drops the above-mentioned assumptions which helps broaden our understanding of economics in ways I think are really important. Notably, complexity economics is inspired by the study of non-equilibrium systems from physics and its conception of heterogeneous and boundedly rational agents come from fields such as psychology and organizational studies.
Research within established paradigms is extremely useful a lot of the time and I am not suggesting that an economist who tackles their research question from a neoclassical angle is necessarily doing something wrong. However, this type of research can only ever make incremental progress. As a research community, I do think we have a strong interest in fostering, at a structural level, the quality of interdisciplinary transfer.
The role of model selection is particularly important in the case of pre-paradigmatic fields (examples include AI Safety or Complexity Science). In this case, your willingness to test different frameworks for conceiving of a given problem seems particularly valuable in expectation. Converging too early on one specific way of framing the problem risks locking in the burgeoning field too early. Pre-paradigmatic fields can often appear fairly chaotic, unorganized and unprincipled (“high entropy”). While this is sometimes evidence against the epistemic merit of a research community, I tend to want to abstain from holding this against emerging fields, because, since the variance of outcomes is higher, the potential upsides are higher too. (Of course, one’s overall judgement of the promise of an emerging paradigm will also depend more than just this factor.)
Epistemic Translation
By epistemic translation, I mean the activity of rendering knowledge commensurable between different domains. In other words, epistemic translation refers to the intellectual work necessary to i) understand a body of knowledge, ii) identify its relevance for your target domain/problem, and iii) render relevant conceptual insights accessible to (the research community of) the target domain, often by integrating it.
Epistemic translation isn’t just about translating one vocabulary into another or merely sharing factual information. It’s about expanding the concept space of the target domain by integrating new conceptual insights and perspectives.
The world is complicated and we are at any one time working with fairly simple models of reality. By analogy, when I look at a three-dimensional cube, I can only see a part of the entire cube at any one time. By taking different perspectives on the same cube and putting these perspectives together—an exercise one might call “triangulating reality” -, I can start to develop an increasingly accurate understanding of the cube. The box inversion hypothesis by Jan Kulveit is another, AI alignment specific example of what I’m thinking about.
I think something like this is true for understanding reality at large, - be it magnitudes more difficult than the cube example suggests. Domain scanning is about seeking new perspectives on your object of inquiry, and epistemic translation is required for integrating these numerous perspectives with one another in an epistemically faithful manner.
In the case of translation between technical and non-technical fields—say translating central notions of political philosophy into game theoretic or CS language—the major obstacle to epistemic translation is formalization. A computer scientist might well be aware of, say, the depth of discourse on topics like justice or democracy. But that doesn’t yet mean that they can integrate this knowledge into their own research or engineering. Formalization is central to creating useful disciplinary interfaces and close to no resources are spent to systematically spreading up this process.
Somewhere in between domain scanning and epistemic translation, we could talk about “prospecting” as the activity of providing epistemic updates on how valuable a certain source domain is likely to be. This involves some scanning and some translation work (therefore categorized as “in between the two”), and would serve the central function of a community mechanism for coordinating around what a community might want to pay attention to.
- Jul 12, 2021, 7:26 AM; 15 points) 's comment on nora’s Quick takes by (
- Jul 12, 2021, 7:40 AM; 12 points) 's comment on nora’s Quick takes by (
I think the “so that they become more predictable [to the recommender algorithm]” is crucial in Russel’s argument. IF human preferences were malleable in this way, and IF recommender algorithms are strong enough to detect that malleability, then the pressures towards the behaviour that Russel suggests is strong and we have a lot of reasons to expect it. I think the answer to both IFs is likely to be yes.
I agree something about influence is important. As a counterpoint, I think many manifestations of “having influence” don’t store well (e.g. the fact that at a given time, a relatively large number of EAs have an “influential role” (whatever that means exactly) is only weakly related to how many EAs will have an influential role in t+1 (say a generation later).
Wrt accumulation, influence also seems less straightforward to grow when you compare it to e.g. money (and to a lesser extent to knowledge) which, thanks to interest rates, accumulates at a certain rate basically for free (without you having to do anything) and fairly robustly. I’m not saying that influence is clearly a worse investment than money when it comes to future impact potenital, but that money is a pretty good and stable baseline that might not be as easy to beat as one might think at first sight. Also I think approaches of using “influence” to store and accumulate impact potential will vary a lot on these dimensions, so we’d probably want to talk about such approaches in the concrete rather than the abstract
> under your framework, community building is also an intervention for patient longtermism
+1 and also worth flagging that e.g. Philip Trammel explicitly says so too in his work on patient longtermism (though he clarifies that this is only true for specific types of community building)
This made me think of the way David Deutsch talks about knoweldge creation—where knowledge manifests physically in e.g. the way a species is adapted to its niche. The process of natural selection that lead to this adapation is a process of “exploratiin” and “error correction” that accumulates knoweldge. That degree of adaptation is the physical manifestation of kowledge. DNA is an important substrate of this process—however, I expect that DNA won’t be the most fruitful level of abstraction at which to think about the patient longtermist question.
Still, to explore this framework a bit more …Re accumulation, one potential implication is that we might want to pay attention to the “error correction” mechanism that is essential to knoweldge accumulation. The scientific method is an example of this. We could try to improve the “machinery of science” that is based on this error correction logic, and we could try to apply this logic of error correction(more/better) to more areas beyond academia. Some examples here might be ways to make it easier to have constructive disagreements (eg. adverserial collaborations, the Letter community, a hypothetical wiki that is structured in a way that shows main disagreeing view poitns on a topic, …) or more experimentation/evaluation/updating mechanisms, in particualr in policy making. (Some areas, e.g. business or medicine, have figured out a bunch about how to do these sorts of things, but for reasons these insights are not necesarily being applied as widely as they could).
Moral progress
I largely agree with your assessment that and how automation puts a lot of pressure on the fate of democracy (although, as you acknowledge, there are ways automation could strengthen democracy, and the way this will cash out sure seems liek it’s subject to strong path dependency.)
When we compare pre-industrial times to post-industrial times, it is not only our economy and our arsenal of technologies that is different. Within these ~200-300 years, humanity has also undergone meaningful intellectual and moral progress. This includes things like coming to think that women and people of colour are full members of society, or spelling out values such as freedom, self-realization, etc. If automatisation will lead to power being concentrated in the hands of a small elite, this also means that the beliefs and values of this elite become more important.
Of course, if their moral ideals are in stark contrast with others, e.g. economic interests, we should expect they will just throw most of these ideals over board or engage in elaborate rationalizations to present they are still holding them up high. But if the conflict of interst remains relatively weak, I do think this migth be a factor that palys a role.
What plausible outside views are there? How much to rely on which?
Here is another possible outside view one could take. Under this view, the question of how societies govern themselves is subject to evolutionary dynamics. (You allude to this a bit in one of your footnotes, when talking about economic determinism.) Different societies adopt different approaches, and societies with better approaches are more successful and become more dominant. Less successful societies either cease to exist or adopt the better approaches by imitation. Based on this view, we can identify “evolutionary pressures” and know some things about where these pressures are likely to steer us in the future. (Obviously we still don’t know exactly where this development leads us, but the space of possible developments is in fact constrained by these co-evolutionary dynamics.)
What specifically might “fitness” look like here? Taking a perspective as roughly outlined in this paper, we could posit that in order for a species to grow ever larger in scale, it requires (what in the paper is called) information processing capacity. Democracy (or government/the policy making apparatus at large) can be viewed as essentially such an information processing technology, and thus adaptive/fitness enhancing. Given the size and complexity of present day societies, it does look like the largely top-down information processing technology of an authoritative regime would less adaptive.
One can argue that democracy is a “successful adaptation” and thus is likely stick around. Maybe this is true, but I think this argument is way harder to make than what I’ve offered above, and I’m not actually sure it stands. Reasons why this isn’t straightforward include that the evolutionary dynamics described above are not very pure (compared to “proper” Darwinian natural selection), and that the environmental conditions within which the process unfolds are changing drastically, which could for example mean that adaptations that were fitness enhancing in the past won’t be in the future.
The reason I do bring this argument up however is that I think it suggests that we shouldn’t pay much attention to the “regression to the means” type arguments. I agree this is a prior to use, but I think we know enough about the territory that we shouldn’t rely much on it.
(I don’t necessarily think you do (though I don’t know). This is to say, I can see how you might get to the 4 in 5 prediction without invoking a “regression to the means” type argument, but by solely looking at the arguments you have for example layed out in your section on automation.)
If democracy retreats, what will it be replaced by?
A lot of the time, people assume a natural dichotomy between democracy and authoritative regimes. While this is certainly a useful shorthand when looking at history, I think it is likely to be misleading when thinking about the future.
This “false dichotomy” between democracy and authoritative regimes often contrasts “my values and needs are adequately taken into account” (<> democracy) with “my values and needs basically don’t matter” (<>authoritative regimes). By putting these things into the same bucket, we might overlook ways in which these connections might come apart.
For example, I might not inherently care about whether I will be able to directly or indirectly choose my political leader, but I definitely care about how well my values and needs will be taken into account in this process that steers my society into alternative futures.
Relatedly, discussions about democracy are often just as much about “democratic values″ (e.g. liberty, equality, justice) as they are about “the process of choosing our own leaders”.
I’d be curious whether your prediction about whether democracy will still be around in one thousand years largely overlaps with your prediction about, say, “will an average person in a thousand years from now feel like their values and needs are adequately taken into account by whoever or whatever is making decisions about how their society is being governed?”. (Of course, other operationalizations might be interesting, too).
The latter is much harder to predict, and democracy as you defined it might be the correct way of approaching the latter question. That said, understanding more about how lieky they are to come apart, and if so how seems potentially interesting.
Thanks, I enjoyed reading this.
Here are a few thoughts; they aren’t meant as critiques of things you say, but simple thoughts triggered by, building on or attempting to complement your analysis.
I agree there are those things, but I am overall probably more pessimistic than you; I think there is a (significant) assymmetry towards pollution-y and not-truth-conducive content production here.
(That said, I am not too concerned overall either; I think the solution of making it harder/require some form of verification to make an account.)