> “generating conditional forecasting questions that encapsulate decision options; 2) making accurate probability judgements of those questions”
This is a subset of what I referred to as “scorable functions”. Conditional questions can be handled in functions.
Humans now have a hard time with these. I’m optimistic that AIs could at least do around as good as humans. There’s a lot of training data and artificial situations we could come up with for training and testing.
> By ‘creative’ I mean solutions (e.g. conditional forecasting question) that simply cannot be assembled from training data because the task is unique.
I don’t have ideas of what sorts of questions we’d expect humans to dominate AI systems, for this. LLMs can come up with ideas. LLM agents can search the web, like humans search the web.
Do you see any fundamental limitations of LLM-agents that humans can reliably do? Maybe you could come up with a concrete sort of metric/task where you’d expect LLMs to substantially underperform humans?
An anecdote: the US government is trying to convince a foreign government to sign an agreement with the United States but is repeatedly stymied by presidents from both parties for two decades. Let’s assume a forecast at that moment suggests a 10% change the law will be passed within a year. A creative new ambassador designs a creative new strategy that hasn’t been attempted before. Though the agreement would require executive signature, she’s decides instead to meet with every single member of parliament and tell them the United States would owe them if they came out publicly in favor of the deal. Fast forward a year, and the agreement is signed.
Another anecdote: the invention of the Apple computer.
Presumably you could use LLM+scaffold to generate a range of options and compare conditional forecasts of likelihood of success. But will it beat a human? I’m skeptical that an LLM is ever going to be able to “think” through the layers of contextual knowledge about a particular challenge (say nothing of prioritizing the correct challenge in the first place) to be able to generate winning solutions.
Metric: give forecasters a slate of decision options—some calculated by LLM, some by humans—and see who wins.
Another thought on metrics: calculate a “similarity score” between a decision option and previous at solving similar challenges. Almost like a metric that calculates “neglectedness” and “tractability”?
I imagine that some forms of human invention will be difficult to beat for some time. But I think there’s a lot of more generic strategic work that could be automated. Like what some hedge fund researchers do.
Forecasting systems now don’t even really try to come up with new ideas (they just forecast on existing ones), but they still can be useful.
Thanks for the comment!
> “generating conditional forecasting questions that encapsulate decision options; 2) making accurate probability judgements of those questions”
This is a subset of what I referred to as “scorable functions”. Conditional questions can be handled in functions.
Humans now have a hard time with these. I’m optimistic that AIs could at least do around as good as humans. There’s a lot of training data and artificial situations we could come up with for training and testing.
> By ‘creative’ I mean solutions (e.g. conditional forecasting question) that simply cannot be assembled from training data because the task is unique.
I don’t have ideas of what sorts of questions we’d expect humans to dominate AI systems, for this. LLMs can come up with ideas. LLM agents can search the web, like humans search the web.
Do you see any fundamental limitations of LLM-agents that humans can reliably do? Maybe you could come up with a concrete sort of metric/task where you’d expect LLMs to substantially underperform humans?
An anecdote: the US government is trying to convince a foreign government to sign an agreement with the United States but is repeatedly stymied by presidents from both parties for two decades. Let’s assume a forecast at that moment suggests a 10% change the law will be passed within a year. A creative new ambassador designs a creative new strategy that hasn’t been attempted before. Though the agreement would require executive signature, she’s decides instead to meet with every single member of parliament and tell them the United States would owe them if they came out publicly in favor of the deal. Fast forward a year, and the agreement is signed.
Another anecdote: the invention of the Apple computer.
Presumably you could use LLM+scaffold to generate a range of options and compare conditional forecasts of likelihood of success. But will it beat a human? I’m skeptical that an LLM is ever going to be able to “think” through the layers of contextual knowledge about a particular challenge (say nothing of prioritizing the correct challenge in the first place) to be able to generate winning solutions.
Metric: give forecasters a slate of decision options—some calculated by LLM, some by humans—and see who wins.
Another thought on metrics: calculate a “similarity score” between a decision option and previous at solving similar challenges. Almost like a metric that calculates “neglectedness” and “tractability”?
I imagine that some forms of human invention will be difficult to beat for some time. But I think there’s a lot of more generic strategic work that could be automated. Like what some hedge fund researchers do.
Forecasting systems now don’t even really try to come up with new ideas (they just forecast on existing ones), but they still can be useful.