The judgement reuse functionality is not fully practical/suitably low friction yet, but the idea is caching LLM structured judgements that agents can emit over the data. Like if we think with a real abundance mindset about how to prioritize papers, we can decide to rate a whole bunch of papers by different attribute_prompts that hopefully capture the various qualities we care about, like “canonicalness at the end of time”. Researcher time is so important nowadays that it can be quite practical to rate hundreds of papers by that even it takes like $.05 each with Opus 4.6. But we can cache those judgements for other people/agents to access. And the way we can ensure quality judgements for reuse is that we manage the provenanced LLM judgements. Like right now in the API, there is a SOTA multi-objective cardinal reranker (prior elicitation has been my OCD obsession for years), where when your Claude Code agent finds like 30 entities it wants to prioritize, it can decide to to rate them by a weighted combination of arbitrary attributes, make an API call to launch this job, and Scry manages the orchestration of LLM judgements to ensure they are done as efficiently as possible in pursuit of that accurate multi-objective top-k ranking. These judgements can be made public with a simple parameter, and they’ll be reused when others rate by the same attribute_prompt. And soon, as agents get more agentic and I lower the friction for them, they’ll have a much easier time discovering existing judgements and building upon them.
And long-term this is an arguably canonical primitive for cognitive work economies, where you can incentivize things like bounties for entities that satisfy certain functions of attribute_prompts, and rank reversals. It also feels very interesting and high leverage in keeping the world multi-polar, as this is a natural way trillions of agents can cooperatively factorize cognitive work (will probably think of better ways of propagating priors by the time trillions of agents are participating), but this is a minimum viable way to support a variety of agent communication needs and the tackling of many open-ended problems, again to be clear that is: caching provenanced rater :: attribute_prompt :: entity_A :: entity_B :: ratio :: confidence judgements at scale in fast databases. It’s like the fastest way a lone small person/agent can become Legible to others :: they get Opus or some trusted LLMs to do judgements over new entities (e.g. an exciting idea they just uploaded, or a provenced datapoint they discovered) across other attributes that people/agents have said they care about, and produce new updated rankings.
Helpful, thanks, I think I understand a little bit better now (still not yet sure what the specific tuple elements are doing)!
In case it’s inspiring or can provoke useful critique, here are some areas where I think compounding/reuse can be really useful in epistemic activities are:
claim decomposition of larger artefacts
citation (and perhaps provenance) resolution
clustering related claims and evidence
collecting ‘topics’ (including perhaps differing perspectives on broad or narrow subjects)
relating (especially, backlinking) claims which weaken/elaborate/refute other claims
The judgement reuse functionality is not fully practical/suitably low friction yet, but the idea is caching LLM structured judgements that agents can emit over the data. Like if we think with a real abundance mindset about how to prioritize papers, we can decide to rate a whole bunch of papers by different attribute_prompts that hopefully capture the various qualities we care about, like “canonicalness at the end of time”. Researcher time is so important nowadays that it can be quite practical to rate hundreds of papers by that even it takes like $.05 each with Opus 4.6. But we can cache those judgements for other people/agents to access. And the way we can ensure quality judgements for reuse is that we manage the provenanced LLM judgements. Like right now in the API, there is a SOTA multi-objective cardinal reranker (prior elicitation has been my OCD obsession for years), where when your Claude Code agent finds like 30 entities it wants to prioritize, it can decide to to rate them by a weighted combination of arbitrary attributes, make an API call to launch this job, and Scry manages the orchestration of LLM judgements to ensure they are done as efficiently as possible in pursuit of that accurate multi-objective top-k ranking. These judgements can be made public with a simple parameter, and they’ll be reused when others rate by the same attribute_prompt. And soon, as agents get more agentic and I lower the friction for them, they’ll have a much easier time discovering existing judgements and building upon them.
And long-term this is an arguably canonical primitive for cognitive work economies, where you can incentivize things like bounties for entities that satisfy certain functions of attribute_prompts, and rank reversals. It also feels very interesting and high leverage in keeping the world multi-polar, as this is a natural way trillions of agents can cooperatively factorize cognitive work (will probably think of better ways of propagating priors by the time trillions of agents are participating), but this is a minimum viable way to support a variety of agent communication needs and the tackling of many open-ended problems, again to be clear that is: caching provenanced
rater :: attribute_prompt :: entity_A :: entity_B :: ratio :: confidencejudgements at scale in fast databases. It’s like the fastest way a lone small person/agent can become Legible to others :: they get Opus or some trusted LLMs to do judgements over new entities (e.g. an exciting idea they just uploaded, or a provenced datapoint they discovered) across other attributes that people/agents have said they care about, and produce new updated rankings.Helpful, thanks, I think I understand a little bit better now (still not yet sure what the specific tuple elements are doing)!
In case it’s inspiring or can provoke useful critique, here are some areas where I think compounding/reuse can be really useful in epistemic activities are:
claim decomposition of larger artefacts
citation (and perhaps provenance) resolution
clustering related claims and evidence
collecting ‘topics’ (including perhaps differing perspectives on broad or narrow subjects)
relating (especially, backlinking) claims which weaken/elaborate/refute other claims
detecting loadbearing evidence or subclaims
flagging scarcity of evidence or analysis