We’ve recently published a set of design sketches for tools for strategic awareness.
We think that near-term AI could help a wide variety of actors to have a more grounded and accurate perspective on their situation, and that this could be quite important:
Tools for strategic awareness could make individuals more epistemically empowered and better able to make decisions in their own best interests.
Better strategic awareness could help humanity to handle some of the big challenges that are heading towards us as we transition to more advanced AI systems.
We’re excited for people to build tools that help this happen, and hope that our design sketches will make this area more concrete, and inspire people to get started.
The (overly-)specific technologies we sketch out are:
Ambient superforecasting — When people want to know something about the future, they can run a query like a Google search, and get back a superforecaster-level assessment of likelihoods.
Scenario planning on tap — People can easily explore the likely implications of possible courses of actions, summoning up coherent grounded narratives about possible futures, and diving seamlessly into analysis of the implications of different hypotheticals.
Automated OSINT — Everyone has instant access to professional-grade political analysis; when someone does something self-serving, this will be transparent.
If you have ideas for how to implement these technologies, issues we may not have spotted, or visions for other tools in this space, we’d love to hear them.
This article was created by Forethought. Read the full article on our website.
exopriors.com/scry is something I’ve been building quite in these directions.
I’ve done the fairly novel, tricky thing of hardening up a SQL+vector database to give the public and their research agents arbitrary readonly access over lots of important content. I’ve built robust ingestion pipelines where I can point my coding agents at any source and increasingly they can fully metabolize it dress-right-dress into database without my oversight. all of arXiv, EA Forum, thousands of substacks and comments, HackerNews, soon all of reddit pre-2025.5, soon all of Bluesky, etc. The content is well-indexed so along with the SQL query planner, they can be traversed much faster (and so much more exhaustively and nuancedly) than you could even if you had the files all locally on your file system and were e.g. asking Claude Code to explore them.
I’ve been adding all sorts of features and describing them for agents to use in a prompt, such as functions for meaningfully composing embedding vectors for navigating vibe-space. There’s also a knowledge emission protocol: caching provenanced LLM_model :: attribute_prompt :: entity_A :: entity_B :: ratio :: confidence tuples for community reuse, which much more sophisticated collective cognitive work harnesses can be built upon (and I am building them).
See also the collective epistemics discussion, if you haven’t already, which I suspect might also be of interest to you!
Could you explain the community reuse thing again? I don’t understand the tuples, but is the idea that query responses (which yield something like document sets?) can be cached with some identifiers? This helps future users by...? (Thinking: it can serve as a tag to a reproducible/amendable/updateable query, it can save someone running the exact same query again, …)
The judgement reuse functionality is not fully practical/suitably low friction yet, but the idea is caching LLM structured judgements that agents can emit over the data. Like if we think with a real abundance mindset about how to prioritize papers, we can decide to rate a whole bunch of papers by different attribute_prompts that hopefully capture the various qualities we care about, like “canonicalness at the end of time”. Researcher time is so important nowadays that it can be quite practical to rate hundreds of papers by that even it takes like $.05 each with Opus 4.6. But we can cache those judgements for other people/agents to access. And the way we can ensure quality judgements for reuse is that we manage the provenanced LLM judgements. Like right now in the API, there is a SOTA multi-objective cardinal reranker (prior elicitation has been my OCD obsession for years), where when your Claude Code agent finds like 30 entities it wants to prioritize, it can decide to to rate them by a weighted combination of arbitrary attributes, make an API call to launch this job, and Scry manages the orchestration of LLM judgements to ensure they are done as efficiently as possible in pursuit of that accurate multi-objective top-k ranking. These judgements can be made public with a simple parameter, and they’ll be reused when others rate by the same attribute_prompt. And soon, as agents get more agentic and I lower the friction for them, they’ll have a much easier time discovering existing judgements and building upon them.
And long-term this is an arguably canonical primitive for cognitive work economies, where you can incentivize things like bounties for entities that satisfy certain functions of attribute_prompts, and rank reversals. It also feels very interesting and high leverage in keeping the world multi-polar, as this is a natural way trillions of agents can cooperatively factorize cognitive work (will probably think of better ways of propagating priors by the time trillions of agents are participating), but this is a minimum viable way to support a variety of agent communication needs and the tackling of many open-ended problems, again to be clear that is: caching provenanced
rater :: attribute_prompt :: entity_A :: entity_B :: ratio :: confidencejudgements at scale in fast databases. It’s like the fastest way a lone small person/agent can become Legible to others :: they get Opus or some trusted LLMs to do judgements over new entities (e.g. an exciting idea they just uploaded, or a provenced datapoint they discovered) across other attributes that people/agents have said they care about, and produce new updated rankings.Helpful, thanks, I think I understand a little bit better now (still not yet sure what the specific tuple elements are doing)!
In case it’s inspiring or can provoke useful critique, here are some areas where I think compounding/reuse can be really useful in epistemic activities are:
claim decomposition of larger artefacts
citation (and perhaps provenance) resolution
clustering related claims and evidence
collecting ‘topics’ (including perhaps differing perspectives on broad or narrow subjects)
relating (especially, backlinking) claims which weaken/elaborate/refute other claims
detecting loadbearing evidence or subclaims
flagging scarcity of evidence or analysis
That looks ambitious and awesome! I haven’t looked deeply, but a few quick qs
what do the costs look like to get embeddings for all those docs? How are you making choices about which embedding models to use and things like that?
do you have qualitative (or quantitative?) sense of how well the semantic joins work for queries like the examples on the homepage?
what’s your sense of how this compares to tools like elicit?
Embeddings are about $20 per billion tokens with voyage-4-lite ($.02/M) and I’ve spent like $500. The model seemed to be strong on all the properties of a good embedding model at a viable price point, and Voyage 4 embeddings have an interesting property where voyage-4-lite works with voyage-4-nano (calculable locally), voyage-4, and voyage-4-large when I feel like upgrading. My chunking strategy is semantically-aware (e.g. working to split on sentences, paragraphs, common delimiters), with a target tokens of 164, with about 20% overlap.
Searching across corpuses absolutely works as embedding are just a function of text/tokens. The compositionality of embeddings works amazingly too (e.g. debias_vector(@guilt_axis, guilt_topic) searching for guilty vibes without overindexing on stuff mentioning “guilt”), although there’s absolutely footguns and intuition that should be built for it (which I try to distill in prompts for agents, and also I have a prompt designed to help teach the exploration of embedding space).
Like this is basically a canonical research substrate—a well-indexed large corpus of high leverage data with ML embeddings queryable with SQL (Datalog would be better but agents don’t have as much experience with it and implementations don’t have great support for embeddings). It really would be nice to be able to get funding for this and to have a more abundance mindset to improve shipping velocity and get this substrate in front of more researchers (e.g. Coefficient Giving) to help with triage grantmaking in the singularity.
As for comparing to Elicit, it certainly offers users powers they couldn’t dream of having Elicit answer without them basically implementing the same thing, but Elicit of course has beautiful UIs which are more friendly to the human eye and workflows researchers are more familiar with. Elicit should basically provide this functionality to users, and Scry could afford to offer novel UIs for people, but I tend to be much more comfortable iterating on backend and API functionality than UIs (which I do have taste for, but it takes a lot of time).