Your definition seems to constrain ‘epistemic process’ to mere analytic tasks. It seems to me that it’s a big leap from there to effective decision-making. For instance, I can imagine how LLMs could effective produce resolvable, non-conditional questions, and then answer them with relatively high accuracy. Yet there are three other tasks I’m more skeptical about: 1) generating conditional forecasting questions that encapsulate decision options; 2) making accurate probability judgements of those questions; and thus 3) the uptake of such forecasts into a ‘live’ decision process. This all seems more likely to work better for environments that seem to have discrete and replicable processes, some of which you mention, like insurance calculations. But these tasks seem potentially unsolvable by LLM for more complex decision environments that require more ethical, political, and creative solutions. By ‘creative’ I mean solutions (e.g. conditional forecasting question) that simply cannot be assembled from training data because the task is unique. What comprises ‘unique’ is perhaps an interesting discussion? Nevertheless, this post helped me work through some of these questions—thanks for sharing! Curious if you have any reactions.
> “generating conditional forecasting questions that encapsulate decision options; 2) making accurate probability judgements of those questions”
This is a subset of what I referred to as “scorable functions”. Conditional questions can be handled in functions.
Humans now have a hard time with these. I’m optimistic that AIs could at least do around as good as humans. There’s a lot of training data and artificial situations we could come up with for training and testing.
> By ‘creative’ I mean solutions (e.g. conditional forecasting question) that simply cannot be assembled from training data because the task is unique.
I don’t have ideas of what sorts of questions we’d expect humans to dominate AI systems, for this. LLMs can come up with ideas. LLM agents can search the web, like humans search the web.
Do you see any fundamental limitations of LLM-agents that humans can reliably do? Maybe you could come up with a concrete sort of metric/task where you’d expect LLMs to substantially underperform humans?
An anecdote: the US government is trying to convince a foreign government to sign an agreement with the United States but is repeatedly stymied by presidents from both parties for two decades. Let’s assume a forecast at that moment suggests a 10% change the law will be passed within a year. A creative new ambassador designs a creative new strategy that hasn’t been attempted before. Though the agreement would require executive signature, she’s decides instead to meet with every single member of parliament and tell them the United States would owe them if they came out publicly in favor of the deal. Fast forward a year, and the agreement is signed.
Another anecdote: the invention of the Apple computer.
Presumably you could use LLM+scaffold to generate a range of options and compare conditional forecasts of likelihood of success. But will it beat a human? I’m skeptical that an LLM is ever going to be able to “think” through the layers of contextual knowledge about a particular challenge (say nothing of prioritizing the correct challenge in the first place) to be able to generate winning solutions.
Metric: give forecasters a slate of decision options—some calculated by LLM, some by humans—and see who wins.
Another thought on metrics: calculate a “similarity score” between a decision option and previous at solving similar challenges. Almost like a metric that calculates “neglectedness” and “tractability”?
I imagine that some forms of human invention will be difficult to beat for some time. But I think there’s a lot of more generic strategic work that could be automated. Like what some hedge fund researchers do.
Forecasting systems now don’t even really try to come up with new ideas (they just forecast on existing ones), but they still can be useful.
Your definition seems to constrain ‘epistemic process’ to mere analytic tasks. It seems to me that it’s a big leap from there to effective decision-making. For instance, I can imagine how LLMs could effective produce resolvable, non-conditional questions, and then answer them with relatively high accuracy. Yet there are three other tasks I’m more skeptical about: 1) generating conditional forecasting questions that encapsulate decision options; 2) making accurate probability judgements of those questions; and thus 3) the uptake of such forecasts into a ‘live’ decision process. This all seems more likely to work better for environments that seem to have discrete and replicable processes, some of which you mention, like insurance calculations. But these tasks seem potentially unsolvable by LLM for more complex decision environments that require more ethical, political, and creative solutions. By ‘creative’ I mean solutions (e.g. conditional forecasting question) that simply cannot be assembled from training data because the task is unique. What comprises ‘unique’ is perhaps an interesting discussion? Nevertheless, this post helped me work through some of these questions—thanks for sharing! Curious if you have any reactions.
Thanks for the comment!
> “generating conditional forecasting questions that encapsulate decision options; 2) making accurate probability judgements of those questions”
This is a subset of what I referred to as “scorable functions”. Conditional questions can be handled in functions.
Humans now have a hard time with these. I’m optimistic that AIs could at least do around as good as humans. There’s a lot of training data and artificial situations we could come up with for training and testing.
> By ‘creative’ I mean solutions (e.g. conditional forecasting question) that simply cannot be assembled from training data because the task is unique.
I don’t have ideas of what sorts of questions we’d expect humans to dominate AI systems, for this. LLMs can come up with ideas. LLM agents can search the web, like humans search the web.
Do you see any fundamental limitations of LLM-agents that humans can reliably do? Maybe you could come up with a concrete sort of metric/task where you’d expect LLMs to substantially underperform humans?
An anecdote: the US government is trying to convince a foreign government to sign an agreement with the United States but is repeatedly stymied by presidents from both parties for two decades. Let’s assume a forecast at that moment suggests a 10% change the law will be passed within a year. A creative new ambassador designs a creative new strategy that hasn’t been attempted before. Though the agreement would require executive signature, she’s decides instead to meet with every single member of parliament and tell them the United States would owe them if they came out publicly in favor of the deal. Fast forward a year, and the agreement is signed.
Another anecdote: the invention of the Apple computer.
Presumably you could use LLM+scaffold to generate a range of options and compare conditional forecasts of likelihood of success. But will it beat a human? I’m skeptical that an LLM is ever going to be able to “think” through the layers of contextual knowledge about a particular challenge (say nothing of prioritizing the correct challenge in the first place) to be able to generate winning solutions.
Metric: give forecasters a slate of decision options—some calculated by LLM, some by humans—and see who wins.
Another thought on metrics: calculate a “similarity score” between a decision option and previous at solving similar challenges. Almost like a metric that calculates “neglectedness” and “tractability”?
I imagine that some forms of human invention will be difficult to beat for some time. But I think there’s a lot of more generic strategic work that could be automated. Like what some hedge fund researchers do.
Forecasting systems now don’t even really try to come up with new ideas (they just forecast on existing ones), but they still can be useful.