Introduction

In this post I will describe one possible design for Artificial Wisdom (AW.) This post can easily be read as a stand-alone piece, however it is also part of a series on artificial wisdom. In essence:

Artificial Wisdom refers to artificial intelligence systems which substantially increase wisdom in the world. Wisdom may be defined as “thinking/planning which is good at avoiding large-scale errors,” or as “having good terminal goals and sub-goals.” By “strapping” wisdom to AI via AW as AI takes off, we may be able to generate enormous quantities of wisdom which could help us navigate Transformative AI and The Most Important Century wisely.

TL;DR

This AW design involves using advanced forecasting AI to help humans make better decisions. Such a decision forecasting system could help individuals, organizations, and governments achieve their values while maintaining important side constraints and minimizing negative side effects.

An important feature to include in such AW systems is the ability to accurately forecast even minuscule probabilities of actions increasing the likelihood of catastrophic risks. The system could refuse to answer, attempt to persuade the user against such actions, and the analyses of such queries could be used to better understand the risk humanity is facing, and to formulate counter-strategies and defensive capabilities.

In addition to helping users select good strategies to achieve values or terminal goals, it is possible such systems could also learn to predict and help users understand what values and terminal goals will be satisfying once achieved.

While such technologies seem likely to be developed, it is questionable whether this is a good thing due to potential dual-use applications, for example use by misaligned AI agents; therefore, while it is good to use such capabilities wisely if they arise, it is important to do more research on whether differential technological development of such systems is desirable.

AI Forecasting & Prediction Markets

There has been some buzz lately about the fact that LLM’s are now able to perform modestly-moderately well compared to human forecasters. This has even led Metaculus to host an AI bot forecasting benchmark tournament series with $120,000 in prizes.

I think things are just getting started, and as LLM’s become increasingly good at forecasting, it may soon be possible to automate the work of decision markets, and perhaps even Futarchy.

Forecasting and prediction markets (which use markets to aggregate forecasts) are important because by knowing what is likely to happen in the future, we can more wisely choose our present actions in order to achieve our desired goals. While it is uncertain whether prediction markets could yet help us choose our terminal goals wisely, it seems likely they could help us choose our sub-goals wisely—especially a certain type of prediction markets:

Decision Markets & Futarchy

Decision markets and Futarchy were invented by Robin Hanson. In decision markets, there are multiple separate prediction markets, one for each option. The example Robin Hanson always gives is a “fire the CEO” market. Market participants then predict whether the stock price will go up or down if the CEO is fired, and whether it will go up or down if the CEO is not fired. Depending on which decision gets more votes for stock prices going higher, a decision can be made whether or not to fire the CEO.

Futarchy, in turn, can roughly be described as a society governed by decision markets. The phrase most associated with Futarchy is “vote on values, bet on beliefs.” First, the citizenry vote on values, or elect officials who will define various measures of national well-being. Then the policies most likely to achieve those values or measures of well-being are decided by betting on decision markets which predict which policies are most likely to achieve those values.

So decision markets and Futarchy can allow people to make better decisions by using the power of forecasting and markets to help us choose the correct sub-goals that will lead us to achieve our terminal goals^[1], our values.

Decision Forecasting AI’s

I am not entirely sure why decision markets and Futarchy are not more popular^[2], but in any case, one of the largest obstacles could soon be removed when AI’s are able to predict as well as human forecasters, largely removing the human labor requirement, making generating well-calibrated answers to forecasting questions as easy as asking a question and pressing a button—and such systems will be especially appealing if there are superhuman AI forecasters, or many human-level AI forecasters with diverse strengths and weaknesses so there can be a market-like system of AI forecasters which are collectively more reliable than individual human-level AI forecasters, enabling super-humanly effective decision-making.

If AI’s could rapidly, reliably predict which of several courses of action (sub-goals) fulfill a set of values and achieve certain terminal goals^[3] with a high degree of likelihood, and can explain their reasoning (bots being able to explain their reasoning is one of the requirements of the Metaculus tournament,) then humans who utilize such AI forecasting systems could have a huge advantage in achieving their terminal goals reliably without making mistakes; hence such decision-forecasting AI’s are another path to artificial wisdom.

There would of course still be the necessity of generating possible courses of action to choose between, although it seems likely AI could eventually also be used to help do this; in fact, such possibility generation was already mentioned in the previous workflows post and forecasting AI’s could be just one more element in the artificial wisdom workflow system.

Additional Features

It would be very useful if such AI’s could learn to automatically generate new predictions that would be useful to us, perhaps another system could generate a large number of predictions options, trying to guess ones that will be useful to users and tractable to predict and we could give them feedback on which ones are actually useful, and they could use this as a training signal to learn to auto-generate progressively more useful predictions.

Another feature that would seem very wise to include is the ability to accurately forecast even minuscule probabilities of a course of action increasing the likelihood of catastrophic risks. For example, perhaps someone is considering developing a new technology and the system predicts that the efforts of this specific individual pursuing this project, against the counterfactual, would lead to a 0.1% increase in the chance humanity ends in an existential catastrophe, and furthermore it is estimated that there are at least 1,000 people who are likely to take risks of similar magnitude within a time-frame of vulnerability to such risks.

Perhaps the system could refuse to answer, and instead explain the above analysis to the user in a highly persuasive way, perhaps describing a vivid (info hazard free) story of how this pursuit could lead to the end of humanity, and then forward the query and analysis to the creators of the system in order to inform better understanding and estimates of the risk humanity is facing, as well as to help formulate counter-strategies and defensive capabilities.

It is possible such systems could also learn to predict how satisfied we will be once we have achieved the goals or values (including the effects of sub-goals^[1]) that we give as our terminal goals and values. Perhaps the artificially wise forecasting system could be designed in such a way that it is able to give us feedback on our goals and values, explain why in certain situations it predicts we will be unsatisfied, or sub-optimally satisfied with the results when we achieve them, and give highly intelligent premortem advice on how we might rethink what we are actually aiming for so that we do not end up regretting getting what we thought we wanted, perhaps suggesting possible alternative goals we might consider instead, explaining why it predicts these will give better results.

Decision Forecasting AI’s At Scale

At scale, people could use such systems to get helpful insights and input to more wisely achieve their goals and values, as could nonprofits, companies, and governments.

There would of course be serious safety concerns with having such a system run the sub-goals of the entire government as in Futarchy, and it would be absolutely essential to achieve AI alignment (more below) and extensively test the system first; but it is encouraging that there is, in the mechanism of Futarchy, a way to separate “voting on values” and “betting on beliefs,” so that these systems could be restricted to only improving our ability to forecast and understand the consequences of various decisions, and helping us determine the best paths to achieve our goals, yet we could still be in control of the values and terminal goals which our decisions are working to achieve—although, as mentioned, it could also be nice to have input on what values and terminal goals will actually be most satisfying.

It would be interesting to see what happens when individual humans, nonprofits, or companies use a scaled down version of Futarchy to operate a significant fraction of their decisions. This seems like a good first experiment for interested entities, to see what works well, and what bugs need to be worked out before moving on to larger experiments.

Again, this is another example of an AW system that is strapped to increases in AI capabilities, allowing humans access to increasing power in predicting the future and making wise decisions, in parallel to base models becoming more powerful as AI takes off.

Dual-Use Concerns

Of course, such prediction systems could also be used to enhance AI agents. AI agents would hence be able to predict the consequences of high-level strategies such that they are better able to make plans and achieve medium-term goals set for them, including anything that we explicitly instruct them to adopt as a goal and any side constraints we give them.

This could hence be good news for enhancing the wisdom and usefulness of AI agents at a certain level of intelligence, however, it seems highly dubious as to whether such forecasting abilities would have positive consequences as AI agents scale to general intelligence and superintelligence due to increasing risk of misalignment with potentially catastrophic failure modes.

Because of this, it seems advisable to be highly cautious when increasing AI forecasting capabilities, and because such technology is dual-use this topic deserves more research to determine if this technology deserves differential technological development or should instead be avoided, advocated against, or developed under carefully controlled conditions.

That said, if this technology is developed, as it currently looks like it is on course to do, it seems highly desirable to adapt it for human use and make sure humans working to ensure a safe and positive long-term future are able to use it and benefit from it to the fullest extent safely possible, and to advocate that it be developed in such a way as to be safe as possible.

It is encouraging to see the requirement in the Metaculus tournament that forecasting bots be able to explicitly explain the reasoning used to make the prediction, increasing interpretability. Yet it seems like as bots scale to be much more intelligent, a great deal more probing will be required to make sure that such systems are safe and not subtly deceptive or misaligned.

I greatly appreciate feedback, or if you want to learn more about AW, see a full list of posts at the Series on Artificial Wisdom homepage.

^
To be clear, terminal goals are not fully separable from sub-goals, but rather, in a sense include sub-goals. For example, if someone’s goal was to live a happy and virtuous life, etc., the sub-goals of these terminal goals would themselves contain a large amount of the value being pursued. Furthermore, as discussed in the introductory piece in the series, it is essential that both terminal goals and sub-goals meet at least minimum acceptability requirements, or better yet are themselves good.
^
I believe one reason that decision markets and Futarchy are not more popular is that large amount of human forecasting interest and participation are required to get such markets off the ground, and at present forecasting is relatively niche. One suggested reason that prediction markets are not more popular is that they are not good savings devices or attractive to gamblers and so do not attract sufficient capital to be attractive to professional traders without market subsidization—the same article argues that regulation is not the primary obstacle, though of course in the case of Futarchy there is the massive obstacle that society as a whole would need to coordinate to transition to government by decision markets, or to at least test the idea on a smaller scale. One reason Robin Hanson mentions is that people in companies who could use such markets don’t actually want to know the truth due to office politics, for example higher-ups who possess decision-making power, which such markets might take away from them.^[4] If the norm was in place it would be a net benefit to everyone, but since it is not in place, it feels threatening. I believe another reason could be that the vast majority of the population has not heard of these markets, and most who have heard of them are uncertain about them and haven’t heard of them being successful in practice (since they haven’t been put into practice, the “Matthew Effect.”)
^
Terminal goals which, as discussed in the first post, include both what you want and what you don’t want, hence minimizing negative side effects
^
There are some closely related business mechanisms which achieve meritocratic decision making through other mechanisms such as Ray Dalio’s “believability-weighted decision making”

Designing Artificial Wisdom: Decision Forecasting AI & Futarchy