Correct me if I’m wrong, but this is close to what I’ve termed the “distribution mismatch” problem (see this unpublished draft). Ofer pointed it out here, and it’s been the main problem that caused me headaches over the past months.
I’m not confident that the solutions I’ve come up with so far are sufficient, but there are five, and I want to use them in conjunction if at all possible:
Attributed Impact
“Attributed impact” is a construct that I designed to (1) track our intuitions for what impact is, but (2) exclude pathological cases.
The main problem as I see it is that the ex ante expected impact of some action can be neutral even if it is bound to turn out (ex post) either extremely positive or extremely negative. For hypothetical pure for-profit investors that case is identical to that of a project that can turn out either extremely positive or neutral because their losses are always capped at their investment.
Attributed impact is defined such that it can ever exceed the ex ante expected impact as estimated by all market participants. If adopted, it’ll not penalize investors later for investing in projects that turned out negative but it’ll make investments into projects that might turn out very negative unattractive to avoid the investments in the first place.
Attributed impact has some more features and benefits, so that I hope that it’ll be in the rational self-interest of every retro funder to adopt it (or something like it) for their impact evaluations. Issuers who want to appeal to retro funders will then have to make a case why their impact is likely to be valuable in attributed-impact terms. Eventually, I hope, it’ll just become the natural Schelling point for how impact is valued.
It may of course fail to be adopted or fail in some harder-to-anticipate way, so I’m not fully confident in it.
Pot
We’re probably fine so long as the market is dominated by a retro funder with lots of capital, a strong altruistic concern for sentient life, and a sophisticated, thoughtful approach. Issuers and investors would try to predict the funding decisions of that funder. But as you noted, that may just not be the case.
The pot is designed to guard against the case where there is not enough capital in the hands of altruistic retro funders like that. It promises that an expert team will allocate all the funds from the pot to retro impact purchases of projects with lots of attributed impact just like any other aligned retro funder. But it also allows investors to pay into the pot, and makes transparent that (according to my current plans) their fraction of the windfall will be proportional to the product of their payment into the pot and their investment into the charity project.
Without injections (but the pot could sell its own attributed impact), this mechanism will probably only rarely make investments profitable for investors, but when combined with the funding from other aligned retro funders it will likely retain some influence on the market regardless of what scale it grows to, so even at scales where the market volume dwarfs the capital of all aliged retro funders.
Again, not a perfect solution, but my hope is to combine many imperfect solutions to maximize the safety.
Marketing
Marketing impact markets only to somewhat altruistic people and trying to keep everyone else unaware of them, at least at first, may help to give the market time to establishe the pro-social Schelling points firmly before other people inevitably find out about it. I haven’t thought enough about how to achieve this.
I’m thinking I may’ve been wrong to want to create a profitable market (at first). Maybe the goal (at least for the start) should be to create a market that is very rarely profitable unless you count the social bottom line too. Sophisticated altruists would love it because they’d expend all their time and effort and money anyway, and this way they get some 50–80% of their resources back in the form of money. Non-altruists will avoid it.
At some much later date we may find ways to make it profitable for the best predictors, but by then the pro-social Schelling points will be solidly established in the community.
I’ll update my document to reflect this thought. Thanks!
Shorting
I want to make it easy to short impact certificates or “impact stock/derivatives.” This could make life difficult for charities if it incentivizes bad actors to short them and spread FUD about them, but the other effect is that hypothetical purely profit-oriented investors will be a bit less excited to invest into highly controversial projects with huge upsides and downsides because the price will be depressed a bit by short sellers.
Hedge tokens will make this easier as they maintain a −1x leverage by increasing the short size when the price drops and their collateral thus increases. This could make it a lot safer and easier for people with little time or financial experience to hold shorts on such charities.
Another benefit is that it may decrease volatility because it incentivizes others to hold and lend out their shares to the short-sellers (or hold opposing hedged positions on perpetual futures for funding) to generate passive income.
But I think hedge tokens don’t work very well because in case of crashes – the very moments short sellers are waiting for – the rebalancing seems to break down because there is so little buy-side liquidity on the markets at those moments.
Centralized Gallery
[Edit: I forgot about this one. Thanks Matt!]
There will be tons and tons of impact certificates at some point, so people will welcome any group that vets them and advertises to them only the best. I’m envisining a group of sophisticated altruistic experts who maintain a centralized web2-like gallery of the most robustly positive impact certificates. If an issuer wants their certificate to be seen, so they’ll try to conform to the requirements of the gallery experts.
I feel like this is among the weakest solutions on my list because it’s not self-reinforcing. Anyone can just create an alternative gallery that is all laissez-faire about inclusion, and rich retro funders with bespoke priorities who want to co-opt the market for their purposes also have the money to set one up and promote it.
Finally, the risk may be a bit mitigated in the case of AI by the fact that a lot of AI research is quite profitable already in classic financial market terms. Impact markets may then have lots of benefits while causing little harm beyond that which classic financial markets cause in any case.
I’d be curious how reassured you are by all these solutions (1) right now and (2) conditional on them actually being adopted by the market the way I hope.
I’ve been thinking that there are three ways forward for us: (1) Create these markets, (2) create some safe part of these markets in a very controlled, centralized way, or (3) try to help other efforts to create such markets to do so more safely or pivot to something else. I’m currently somewhere between 1 and 2, but I’ll fall back on 3 if I become disillusioned with my solution concepts.
The attributed impact idea is interesting. I think in its current form it has the following problems:
Speculators can’t perfectly predict the ex ante impact as it will be estimated by experts at some point in time in the future. If a speculator thinks there’s some chance that the estimated ex ante impact will end up being astronomically positive, we get another version of the “distribution mismatch” problem. (I.e. from the perspective of the speculator, the estimated ex ante impact ending up being astronomically negative is not a worse outcome than it ending up being zero).
Even if most speculators are 100% sure that the ex ante impact is astronomically negative, if some think it may be positive then the price of the certificates can still be high (similarly to the more general problem that MakoYass mentioned).
Suppose a project clearly has a positive ex ante impact, but it is failing mundanely, and will end up with zero ex post impact if nothing dramatic happens. The people leading the project, who own some of the project’s certificates, may then be incentivized to carry out some newly planned intervention that is risky and net-negative, but has a chance of ending up being astronomically beneficial, in order for the project to have a positive expected future ex post impact (due to the distribution mismatch problem).
Perhaps problems 1-2 can be alleviated by something like the human-judgement filter that MakoYass mentioned.
For alleviating problem 3, maybe instead of defining attributed impact based on the state of the world in only two points in time, it should depend on all the points in time in between. I.e. defining it as the minimum subjective expected change [...] as the experts would estimate it in time t, for any t between the two points in time in the original definition.
Though the problematic incentive above may remain to some extent because: if a newly planned, risky, net-negative intervention will be carried out and some speculators will overestimate its value (for every t), that interventions can cause the certificate price to go up. Maybe this problem can be alleviated by using some stronger version of the “human-judgement filter” mechanism, maybe one that can cancel already-minted certificates at any point in time.
Yeah, that feels like a continuous kind of failure. Like, you can reduce the risk from 50% to 1% and then to 0.1% but you can’t get it down to 0%. I want to throw all the other solutions at the problem as well, apart from Attributed Impact, and hope that the aggregate of all of them will reduce the risk sufficiently that impact markets will be robustly better than the status quo. This case depends a lot on the attitudes, sophistication, and transparency of the retro funders, so it’ll be useful for the retro funders to be smart and value-aligned and to have a clear public image.
In a way this is similar to the above. Instead of some number of speculators having some credence that the outcome might be extremely good, we get the same outcome if a small number of speculators have a sufficiently high credence that the outcome will be good.
This one is different. I think here the problem is that the issuers lied and had an incentive to lie. They could’ve also gone the Nikola route of promising something awesome, then quickly giving up on it but lying about it and keeping the money. What the issuers did is just something other than the actions that the impact certificate are about; the problem is just that the issuers are keeping that a secret. I don’t want to (or can) change Attributed Impact to prevent lying, though it is of course a big deal…
I feel like the first two don’t call for changes to Attributed Impact but for a particular simplicity and transparency on the part of the retro funders, right? Maybe they need to monitor the market, and if a project they consider bad attracts too much attention from speculators, they need to release a statement that they’re not excited about that type of project. Limiting particular retro funders to particular types of projects could also aid that transparency – e.g., a retro funder only for scientific papers or one only for vaccinating wild animals. They can then probably communicate what they are and aren’t interested in without having to write reams upon reams.
The third one is something where I see the responsibility with auditors. In the short term speculators should probably only give their money to issuers who they somehow trust, e.g., because of their reputation or because they’re friends. In the long run, there should be auditors who check databases of all audited impact certificates to confirm that the impact is not being double-issued and who have some standards for how clearly and convincingly an impact certificate is justified. Later in the process they should also confirm any claims of the issuer that the impact has happened.
I’ll make some changes to my document to reflect these learnings, but the auditor part still feels completely raw in my mind. There’s just the idea of a directory that they maintain and of their different types of certification, but I’d like to figure out how much they’ll likely need to charge and how to prevent them from colluding with bad issuers.
The “human judgment filter,” which I’ve been calling “curation” (unless there are differences?) is definitely going to be an important mechanism, but I think it’ll fall short in cases where unaligned people are good at marketing and can push their Safemoon-type charity even if no reputable impact certificate exchange will list it.
Yeah, that feels like a continuous kind of failure. Like, you can reduce the risk from 50% to 1% and then to 0.1% but you can’t get it down to 0%.
Suppose we want the certificates of a risky, net-negative project to have a price that is lower by 10x than the price they would have on a naive impact market. Very roughly, it needs to be the case that the speculators have a credence of less than 10% that at least one relevant retro funder will evaluate the ex-ante impact to be high (positive). Due to the epistemic limitations of the speculators, that condition may not fulfill for net-negative projects even when the initial retro funders are very careful and transparent and no one else is allowed to become a retro funder. If the set of future retro funders is large and unknown and anyone can become a retro funder, the condition may rarely fulfill for net-negative projects.
Thanks! I’ve noted and upvoted your comment. I’ll create a document for all the things we need to watch out for when it comes to attacks by issuers, investors, and funders, so we can monitor them in our experiments.
In this case I think a partial remedy is for retro funders to take the sort of active role in the steering of the market that I’ve been arguing for where they notice when projects get a lot of investments that they’re not excited about and clarify their position.
But that does not solve the retro funder alignment that is part of your argument.
I’ll create a document for all the things we need to watch out for when it comes to attacks by issuers, investors, and funders, so we can monitor them in our experiments.
(I don’t think that potential “attacks” by issuers/investors/funders are the problem here.)
But that does not solve the retro funder alignment that is part of your argument.
I don’t think it’s an alignment issue here. The price of a certificate tracks the maximum amount of money that any future retro funder will be willing to pay for it. So even if 100% of the current retro funders say “we think project X has negative ex-ante impact”, speculators may still think it’s plausible that at some point there will be 100x retro funders in the market, and then at least one relevant retro funder will judge the ex-ante impact to be positive.
If you run a time-bounded experiment in which the set of retro funders is small and fixed, not observing this problem does not mean that it also won’t show up in a decentralized impact market in which the set of retro funders in unbounded.
The price of a certificate tracks the maximum amount of money that any future retro funder will be willing to pay for it
I get that. I call that retro funder alignment (actually Dony came up with the term :-)) in analogy with AI alignment, where it’s also not enough to just align one AI or all current AIs or some other subset of all AIs that’ll ever come into existence.
Our next experiment is actually not time-bounded but we’re the only buyers (retro funders), so the risk is masked again.
I wonder, though, when I play this through in my mind, I can’t quite see almost any investor investing anything but tiny amounts into a project on the promise that there might be at some point a retro funder for it. It’s a bit like name squatting of domains or Bitclout user names. People buy ten thousands of them in the hopes of reselling a few of them at a profit, so they buy them only when they are still very very cheap (or particularly promising). One place sold most of them at $50–100, so they must’ve bought them even cheaper. One can’t do a lot of harm (at the margin) with that amount of money.
Conversely, if an investor wants to bet a lot of money on a potential future unaligned retro funder, they need to be optimistic that the retro funding they’ll receive will be so massive that it makes up for all the time they had to stay invested. Maybe they’ll have to stay invested 5 years or 20 years, and even then only have a tiny, tiny chance that the unaligned retro funder, even conditional on showing up, will want to buy the impact of that particular project. Counterfactually they could’ve made a riskless 10–30% APY all the while. So it seems like a a rare thing to happen.
But I could see Safemoon type of things happening in more than extremely unlikely cases. Investors invest not because of any longterm promises of unaligned retro funders decades later but because they expect that other investors will invest because the other investors also expect other investors to invest, and so on. They’ll all try to buy in before most others buy in and then sell quickly before all others sell, so they’ll just create a lot of volatility and redistribute assets rather randomly. That seems really pointless, and some of the investors may suffer significant losses, but it doesn’t seem catastrophic for the world. People will probably also learn from it for a year or so, so it can only happen about once a year.
Or can you think of places where this happens in established markets? Penny stocks, yield farming platforms? In both cases the investors either seem small, unsophisticated, and having little effect on the world, or sophisticated and very quickly in and out, also with little effect on the world.
I think the concern here is not about “unaligned retro funders” who consciously decide to do harmful things. It doesn’t take malicious intent to misjudge whether a certain effort is ex-ante beneficial or harmful in expectation.
I wonder, though, when I play this through in my mind, I can’t quite see almost any investor investing anything but tiny amounts into a project on the promise that there might be at some point a retro funder for it.
Suppose investors were able to buy impact certificates of organizations like OpenAI, Anthropic, Conjecture, EcoHealth Alliance etc. These are plausibly very high-impact organizations. Out of 100 aligned retro funders, some may judge some of these organizations to be ex-ante net-positive. And it’s plausible that some of these organizations will end up being extremely beneficial. So it’s plausible that some retro funders (and thus also investors) would pay a lot for the certificates of such orgs.
Okay, but if you’re not actually talking about “malicious” retro funders (a category in which I would include actions that are not typically considered malicious today, such as defecting against minority or nonhuman interests), the difference between a world with and without impact markets becomes very subtle and ambiguous in my mind.
Like, I would guess that Anthropic and Conjecture are probably good, though I know little about them. I would guess that early OpenAI was very bad and current OpenAI is probably bad. But I feel great uncertainty over all of that. And I’m not even taking all considerations into account that I’m aware of because we still don’t have a model of how they interact. I don’t see a way in which impact markets could systematically prevent (as opposed to somewhat reduce) investment mistakes that today not even funders as sophisticated as Open Phil can predict.
Currently, all these groups receive a lot of funding from the altruistic funders directly. In a world with impact markets, the money would first come from investors. Not much would change at all. In fact I see most benefits here in the incentive alignment with employees.
In my models, each investor makes fewer grants than funders currently do because they specialize more and are more picky. My math doesn’t work out, doesn’t show that they can plausibly make a profit, if they’re similarly or less picky than current funders.
So I could see a drop in sophistication as relatively unskilled investors enter the market. But then they’d have to improve or get filtered out within a few years as they lose their capital to more sophisticated investors.
Relatively speaking, I think I’m more concerned about the problem you pointed out where retro funders get scammed by issuers who use p-hacking-inspired tricks to make their certificates seem valuable when they are not. Sophisticated retro funders can probably address that about as well as top journals can, which is already not perfect, but more naive retro funders and investors may fall for it.
One new thing that we’re doing to address this is to encourage people to write exposés of malicious certificates and sell their impact. Eventually of course I also want people to be able to short issuer stock.
Okay, but if you’re not actually talking about “malicious” retro funders (a category in which I would include actions that are not typically considered malicious today, such as defecting against minority or nonhuman interests), the difference between a world with and without impact markets becomes very subtle and ambiguous in my mind.
I think it depends on the extent to which the (future) retro funders take into account the ex-ante impact, and evaluate it without an upward bias even if they already know that the project ended up being extremely beneficial.
This case depends a lot on the attitudes, sophistication, and transparency of the retro funders, so it’ll be useful for the retro funders to be smart and value-aligned and to have a clear public image.
Agreed. Is there a way to control the set of retro funders? (If anyone who wants to act as a retro funder can do so, the set of retro funders may end up including well-meaning-but-not-that-careful actors).
In a way this is similar to the above.
The cause is different though, in (1) the failure is caused by a version of the distribution mismatch problem, in (2) the failure is analogous to the unilateralist’s curse.
I think here the problem is that the issuers lied and had an incentive to lie.
In the failure mode I had in mind the issuer does not lie or keeps something a secret at any point. Only after realizing that the project is failing (and that the ex post impact will be zero if nothing dramatic happens), a new intervention is planned and carried out. That intervention is risky and net-negative, but has a chance to be extremely beneficial, and thus (due to the distribution mismatch problem) makes the certificate price go up.
What the issuers did is just something other than the actions that the impact certificate are about
In that case, are the retro funders supposed to not buy the certificates? (Even if both the ex ante and ex post impacts are very beneficial?) [EDIT: for example, let’s say that the certificates are for creating some AI safety org, and that org carries out a newly planned, risky AI safety intervention.]
The “human judgment filter,” which I’ve been calling “curation” (unless there are differences?) is definitely going to be an important mechanism, but I think it’ll fall short in cases where unaligned people are good at marketing and can push their Safemoon-type charity even if no reputable impact certificate exchange will list it.
(I’m not familiar with the details of potential implementations, but...) if profit-motivated speculators can make a profit by trading certificates that are not listed on a particular portal (and everyone can create a competing portal) then most profit-motivated speculators may end up using a competing portal that lists all the tradable certificates.
Agreed. Is there a way to control the set of retro funders? (If anyone who wants to act as a retro funder can do so, the set of retro funders may end up including well-meaning-but-not-that-careful actors).
That’s a big worry of mine and the reason that I came up with the pot. So long as there are enough good retro funders, things are still sort of okayish as most issuers and speculators will be more interested in catering to the good retro funders will all the capital. But if the bad retro funders become too many or have too much capital, it becomes tricky. The pot is set up such that the capital it has scales in proportion to the activity on the market, so that it’ll always have roughly the same relative influence on the market. Sadly, it’s not huge, but it’s the best I’ve come up with so far.
The other solution is targeted marketing – making it so that the nice and thoughtful retro funders become interested in the market and not the reckless ones.
In the failure mode I had in mind the issuer does not lie or keeps something a secret at any point. Only after realizing that the project is failing (and that the ex post impact will be zero if nothing dramatic happens), a new intervention is planned and carried out. That intervention is risky and net-negative, but has a chance to be extremely beneficial, and thus (due to the distribution mismatch problem) makes the certificate price go up.
Impact certificates are required to be very specific. (Something we want to socially enforce, e.g., by having auditors refuse vague ones and retro funders avoid them.) So say an issuer issues and impact certificate for “I will distribute 1000 copies of the attached Vegan Outreach leaflet at Barbican Station in London between April 1 and August 1, 2022. [Insert many more details.]” They go on to do exactly that and poll a few people about their behavior change. But in July, as they hand out the last few leaflets, they start to realize that leafletting is fairly ineffective. They are disappointed, and so blow up a slaughterhouse instead and are transparent about it. The retro funder will read about that and be like, sure, you blew up a slaughterhouse, but that’s not what this impact certificate is about. And if they issue a new impact certificate for the action, they have to consider all the risks of killing humans, killing nonhumans, going to prison, hurting the reputation of the movement, etc., which will make everyone hesitant to invest because of the low ex ante expected impact.
Just saw your edit: One impact certificate for a whole org is much too vague imo. Impact certificates should be really well defined, and the actions and strategy of an org can change, as in your example. Orgs should rather sell all their activities as individual impact certificates. I envision them like the products that a company sells. E.g., if Nokia produces toilet paper, then the rolls or batches of toilet paper are the impact certificates and Nokia is the org that sells them.
Orgs can then have their own classic securities whose price they can control through buy-backs from the profits of impact certificate sales. Or I also envision perpetual futures that track an index of the market cap of all impact certificates issued by the same org.
Impact certificates as small as rolls of toilet paper are probably a bit unwieldy, but AMF, for example, has individual net distributions that are planned thoroughly in advance, so that would be suitable. I’ll rather want to err on the side of requiring more verifiability and clarity than on the side of allowing everyone to create impact certificates for anything they do because with vague interventions it’ll over time become more and more difficult to disentangle whether impact has been (or should be considered to have been) double-sold.
(I’m not familiar with the details of potential implementations, but...) if profit-motivated speculators can make a profit by trading certificates that are not listed on a particular portal (and everyone can create a competing portal) then most profit-motivated speculators may end up using a competing portal that lists all the tradable certificates.
Yep, also a big worry of mine. I had hoped to establish the pot better by making it mandatory to put a little fee into it and participate in the bet. But that would’ve just increased the incentives for people who don’t like the pot jury to build their own more laissez-faire platform without pot.
My fallback plan if I decide that impact markets are too dangerous is to stay in touch with all the people working on similar projects to warn them of the dangers too.
expected impact of some action can be neutral even if it is bound to turn out (ex post) either extremely positive or extremely negative
I would recommend biting the decision theoretic bullet that this is not a problem. If you feel that negative outcomes are worse than positive outcomes of equal quantity, then adjust your units, they’re miscalibrated.
Pot
So would The Pot be like, an organization devoted especially to promoting integrity in the market? I’m not sure I can see why it would hold together.
Maybe the goal (at least for the start) should be to create a market that is very rarely profitable unless you count the social bottom line too.
My Venture Granters design becomes relevant again. Investors just get paid a salary. Their career capital (ability to allocate funding) is measured in a play-currency. Selfish investors don’t apply. Unselfish investors are invited in and nurtured.
Finally, the risk may be a bit mitigated in the case of AI by the fact that a lot of AI research is quite profitable already in classic financial market terms
I think people will be shocked at how impactful public work in software will be. The most important, foundational, productivity-affecting software all generally has basically no funding in a private system. Examples include Web Browsers, Programming Languages, and UI toolkits. The progress in those areas transpires astoundingly slowly in proportion to how many people need it. As soon as you start measuring its impact and rewarding work in proportion to it, the world will change a lot.
I’m not sure how much of this applies to AI, though.
We (the Impact Certificate market convocation) just had a call, and we talked about this a bit, and we realize that most of this seems to crux on the question of whether there are any missing or underdeveloped public goods in AI. It seems like there might not be. It’s mysterious as to why there is so much open sharing in the sector, but the dominant private players do seem to be sharing everything they’d need to share in order to avoid the coordination problems that might have impeded progress.
Centralized Gallery
I’m pretty hopeful at this point that we’ll be able to establish a standard of auditing against work with potential long-term negative externalities.
Most certs will need some form of auditing. We already know from the carbon market that double spending and fraud can be an enormous problem and funders should expect the same in other sectors, buying impact certs with no audit signatures shouldn’t really be a thing. If we can make No Untracked Longterm Negative Externalities (NULNE) audits one of the default signatures that show a nasty red cross logo on the cert until one has been acquired, that could establish healthy culture of use.
I would recommend biting the decision theoretic bullet that this is not a problem. If you feel that negative outcomes are worse than positive outcomes of equal quantity, then adjust your units, they’re miscalibrated.
I’m on board with that, and the second that you’re quoting seems to express that. Or am I misunderstanding what you’re referring to? (The quoted section basically says that, e.g., +100 utility with 50% probability and −100 utility with 50% probability cancel out to 0 utility in expectation. So the positive and the negative side are weighed equally and the units are the same.)
Generally, this (yours) is also my critique of the conflict between prioritarianism and classic utilitarianism (or some formulations of those).
So would The Pot be like, an organization devoted especially to promoting integrity in the market? I’m not sure I can see why it would hold together.
Yeah, that’s how I imagine it. You mean it would just have a limited life expectancy like any company or charity? That makes sense. Maybe we could try to push to automate it and create several alternative implmentations of it. Being able to pay people would also be great. Any profit that it could use to pay staff would detract from its influence, but that’s also a tradeoff one could make.
Oh, another idea of mine was to use Augur markets. But I don’t know enough about Augur markets yet to tell if there are difficulties there.
My Venture Granters design becomes relevant again. Investors just get paid a salary. Their career capital (ability to allocate funding) is measured in a play-currency. Selfish investors don’t apply. Unselfish investors are invited in and nurtured.
I still need to read it, but it’s on my reading list! Getting investments from selfish investors is a large part of my motivation. I’m happy to delay that to test all the mechanisms in a safe environment, but I’d like it to be the goal eventually when we deem it to be safe.
We (the Impact Certificate market convocation) just had a call, and we talked about this a bit, and we realize that most of this seems to crux on the question of whether there are any missing or underdeveloped public goods in AI.
Yeah, it would be interesting to get opinions of anyone else who is reading this.
So the way I understand this questions is that there may be retro funders who reward free and open source software projects that have been useful. Lots of investors will be very quick and smart about ferreting out what the long-tail of the 10,000s of tiny libraries are that are holding all the big systems like GPT-3 together. Say, maybe the training data for GPT-3 is extracted by a custom software that relies on cchardet to detect the encoding of the websites it downloads if they’re not declared, misdeclared, or ambiuously declared. That influx of funding to these tiny projects will speed up their development and will supercharge them to the point where they can do their job a lot better and speed up development processes by 2–10x or so.
Attributed impact, the pot, and aligned retro funders would first need to become aware of this (or similar hidden risks), and would then decide that software projects like that are risky, and that they need to make a strong case for why they’re differentially more useful for safety or other net positive work than for enhancing AI capabilities. But the risk is sufficiently hidden that this is the sort of thing where another unaligned funder with a lot of money might come in an skew the market in the direction of their goals.
The assumptions, as I see them, are:
Small sortware projects can be sufficiently risky.
The influence of unaligned funders who disregard this risk is great.
The unaligned funders cannot be reasoned with and either reject attributed impact or argue that small software projects are not risky.
They overwhelm the influence of the pot.
They overwhelm the influence of all other retro funders that software projects might cater to.
If we can make No Untracked Longterm Negative Externalities (NULNE) audits one of the default signatures that show a nasty red cross logo on the cert until one has been acquired, that could establish healthy culture of use.
Yeah, that sounds sensible. Or make it impossible to display them anywhere in the first place without audit?
Thank you for the great comment!
Correct me if I’m wrong, but this is close to what I’ve termed the “distribution mismatch” problem (see this unpublished draft). Ofer pointed it out here, and it’s been the main problem that caused me headaches over the past months.
I’m not confident that the solutions I’ve come up with so far are sufficient, but there are five, and I want to use them in conjunction if at all possible:
Attributed Impact
“Attributed impact” is a construct that I designed to (1) track our intuitions for what impact is, but (2) exclude pathological cases.
The main problem as I see it is that the ex ante expected impact of some action can be neutral even if it is bound to turn out (ex post) either extremely positive or extremely negative. For hypothetical pure for-profit investors that case is identical to that of a project that can turn out either extremely positive or neutral because their losses are always capped at their investment.
Attributed impact is defined such that it can ever exceed the ex ante expected impact as estimated by all market participants. If adopted, it’ll not penalize investors later for investing in projects that turned out negative but it’ll make investments into projects that might turn out very negative unattractive to avoid the investments in the first place.
Attributed impact has some more features and benefits, so that I hope that it’ll be in the rational self-interest of every retro funder to adopt it (or something like it) for their impact evaluations. Issuers who want to appeal to retro funders will then have to make a case why their impact is likely to be valuable in attributed-impact terms. Eventually, I hope, it’ll just become the natural Schelling point for how impact is valued.
It may of course fail to be adopted or fail in some harder-to-anticipate way, so I’m not fully confident in it.
Pot
We’re probably fine so long as the market is dominated by a retro funder with lots of capital, a strong altruistic concern for sentient life, and a sophisticated, thoughtful approach. Issuers and investors would try to predict the funding decisions of that funder. But as you noted, that may just not be the case.
The pot is designed to guard against the case where there is not enough capital in the hands of altruistic retro funders like that. It promises that an expert team will allocate all the funds from the pot to retro impact purchases of projects with lots of attributed impact just like any other aligned retro funder. But it also allows investors to pay into the pot, and makes transparent that (according to my current plans) their fraction of the windfall will be proportional to the product of their payment into the pot and their investment into the charity project.
Without injections (but the pot could sell its own attributed impact), this mechanism will probably only rarely make investments profitable for investors, but when combined with the funding from other aligned retro funders it will likely retain some influence on the market regardless of what scale it grows to, so even at scales where the market volume dwarfs the capital of all aliged retro funders.
Again, not a perfect solution, but my hope is to combine many imperfect solutions to maximize the safety.
Marketing
Marketing impact markets only to somewhat altruistic people and trying to keep everyone else unaware of them, at least at first, may help to give the market time to establishe the pro-social Schelling points firmly before other people inevitably find out about it. I haven’t thought enough about how to achieve this.
I’m thinking I may’ve been wrong to want to create a profitable market (at first). Maybe the goal (at least for the start) should be to create a market that is very rarely profitable unless you count the social bottom line too. Sophisticated altruists would love it because they’d expend all their time and effort and money anyway, and this way they get some 50–80% of their resources back in the form of money. Non-altruists will avoid it.
At some much later date we may find ways to make it profitable for the best predictors, but by then the pro-social Schelling points will be solidly established in the community.
I’ll update my document to reflect this thought. Thanks!
Shorting
I want to make it easy to short impact certificates or “impact stock/derivatives.” This could make life difficult for charities if it incentivizes bad actors to short them and spread FUD about them, but the other effect is that hypothetical purely profit-oriented investors will be a bit less excited to invest into highly controversial projects with huge upsides and downsides because the price will be depressed a bit by short sellers.
Hedge tokens will make this easier as they maintain a −1x leverage by increasing the short size when the price drops and their collateral thus increases. This could make it a lot safer and easier for people with little time or financial experience to hold shorts on such charities.
Another benefit is that it may decrease volatility because it incentivizes others to hold and lend out their shares to the short-sellers (or hold opposing hedged positions on perpetual futures for funding) to generate passive income.
But I think hedge tokens don’t work very well because in case of crashes – the very moments short sellers are waiting for – the rebalancing seems to break down because there is so little buy-side liquidity on the markets at those moments.
Centralized Gallery
[Edit: I forgot about this one. Thanks Matt!]
There will be tons and tons of impact certificates at some point, so people will welcome any group that vets them and advertises to them only the best. I’m envisining a group of sophisticated altruistic experts who maintain a centralized web2-like gallery of the most robustly positive impact certificates. If an issuer wants their certificate to be seen, so they’ll try to conform to the requirements of the gallery experts.
I feel like this is among the weakest solutions on my list because it’s not self-reinforcing. Anyone can just create an alternative gallery that is all laissez-faire about inclusion, and rich retro funders with bespoke priorities who want to co-opt the market for their purposes also have the money to set one up and promote it.
Finally, the risk may be a bit mitigated in the case of AI by the fact that a lot of AI research is quite profitable already in classic financial market terms. Impact markets may then have lots of benefits while causing little harm beyond that which classic financial markets cause in any case.
I’d be curious how reassured you are by all these solutions (1) right now and (2) conditional on them actually being adopted by the market the way I hope.
I’ve been thinking that there are three ways forward for us: (1) Create these markets, (2) create some safe part of these markets in a very controlled, centralized way, or (3) try to help other efforts to create such markets to do so more safely or pivot to something else. I’m currently somewhere between 1 and 2, but I’ll fall back on 3 if I become disillusioned with my solution concepts.
The attributed impact idea is interesting. I think in its current form it has the following problems:
Speculators can’t perfectly predict the ex ante impact as it will be estimated by experts at some point in time in the future. If a speculator thinks there’s some chance that the estimated ex ante impact will end up being astronomically positive, we get another version of the “distribution mismatch” problem. (I.e. from the perspective of the speculator, the estimated ex ante impact ending up being astronomically negative is not a worse outcome than it ending up being zero).
Even if most speculators are 100% sure that the ex ante impact is astronomically negative, if some think it may be positive then the price of the certificates can still be high (similarly to the more general problem that MakoYass mentioned).
Suppose a project clearly has a positive ex ante impact, but it is failing mundanely, and will end up with zero ex post impact if nothing dramatic happens. The people leading the project, who own some of the project’s certificates, may then be incentivized to carry out some newly planned intervention that is risky and net-negative, but has a chance of ending up being astronomically beneficial, in order for the project to have a positive expected future ex post impact (due to the distribution mismatch problem).
Perhaps problems 1-2 can be alleviated by something like the human-judgement filter that MakoYass mentioned.
For alleviating problem 3, maybe instead of defining attributed impact based on the state of the world in only two points in time, it should depend on all the points in time in between. I.e. defining it as the minimum subjective expected change [...] as the experts would estimate it in time t, for any t between the two points in time in the original definition.
Though the problematic incentive above may remain to some extent because: if a newly planned, risky, net-negative intervention will be carried out and some speculators will overestimate its value (for every t), that interventions can cause the certificate price to go up. Maybe this problem can be alleviated by using some stronger version of the “human-judgement filter” mechanism, maybe one that can cancel already-minted certificates at any point in time.
Whee, thanks!
Yeah, that feels like a continuous kind of failure. Like, you can reduce the risk from 50% to 1% and then to 0.1% but you can’t get it down to 0%. I want to throw all the other solutions at the problem as well, apart from Attributed Impact, and hope that the aggregate of all of them will reduce the risk sufficiently that impact markets will be robustly better than the status quo. This case depends a lot on the attitudes, sophistication, and transparency of the retro funders, so it’ll be useful for the retro funders to be smart and value-aligned and to have a clear public image.
In a way this is similar to the above. Instead of some number of speculators having some credence that the outcome might be extremely good, we get the same outcome if a small number of speculators have a sufficiently high credence that the outcome will be good.
This one is different. I think here the problem is that the issuers lied and had an incentive to lie. They could’ve also gone the Nikola route of promising something awesome, then quickly giving up on it but lying about it and keeping the money. What the issuers did is just something other than the actions that the impact certificate are about; the problem is just that the issuers are keeping that a secret. I don’t want to (or can) change Attributed Impact to prevent lying, though it is of course a big deal…
I feel like the first two don’t call for changes to Attributed Impact but for a particular simplicity and transparency on the part of the retro funders, right? Maybe they need to monitor the market, and if a project they consider bad attracts too much attention from speculators, they need to release a statement that they’re not excited about that type of project. Limiting particular retro funders to particular types of projects could also aid that transparency – e.g., a retro funder only for scientific papers or one only for vaccinating wild animals. They can then probably communicate what they are and aren’t interested in without having to write reams upon reams.
The third one is something where I see the responsibility with auditors. In the short term speculators should probably only give their money to issuers who they somehow trust, e.g., because of their reputation or because they’re friends. In the long run, there should be auditors who check databases of all audited impact certificates to confirm that the impact is not being double-issued and who have some standards for how clearly and convincingly an impact certificate is justified. Later in the process they should also confirm any claims of the issuer that the impact has happened.
I’ll make some changes to my document to reflect these learnings, but the auditor part still feels completely raw in my mind. There’s just the idea of a directory that they maintain and of their different types of certification, but I’d like to figure out how much they’ll likely need to charge and how to prevent them from colluding with bad issuers.
The “human judgment filter,” which I’ve been calling “curation” (unless there are differences?) is definitely going to be an important mechanism, but I think it’ll fall short in cases where unaligned people are good at marketing and can push their Safemoon-type charity even if no reputable impact certificate exchange will list it.
Suppose we want the certificates of a risky, net-negative project to have a price that is lower by 10x than the price they would have on a naive impact market. Very roughly, it needs to be the case that the speculators have a credence of less than 10% that at least one relevant retro funder will evaluate the ex-ante impact to be high (positive). Due to the epistemic limitations of the speculators, that condition may not fulfill for net-negative projects even when the initial retro funders are very careful and transparent and no one else is allowed to become a retro funder. If the set of future retro funders is large and unknown and anyone can become a retro funder, the condition may rarely fulfill for net-negative projects.
Thanks! I’ve noted and upvoted your comment. I’ll create a document for all the things we need to watch out for when it comes to attacks by issuers, investors, and funders, so we can monitor them in our experiments.
In this case I think a partial remedy is for retro funders to take the sort of active role in the steering of the market that I’ve been arguing for where they notice when projects get a lot of investments that they’re not excited about and clarify their position.
But that does not solve the retro funder alignment that is part of your argument.
(I don’t think that potential “attacks” by issuers/investors/funders are the problem here.)
I don’t think it’s an alignment issue here. The price of a certificate tracks the maximum amount of money that any future retro funder will be willing to pay for it. So even if 100% of the current retro funders say “we think project X has negative ex-ante impact”, speculators may still think it’s plausible that at some point there will be 100x retro funders in the market, and then at least one relevant retro funder will judge the ex-ante impact to be positive.
If you run a time-bounded experiment in which the set of retro funders is small and fixed, not observing this problem does not mean that it also won’t show up in a decentralized impact market in which the set of retro funders in unbounded.
I get that. I call that retro funder alignment (actually Dony came up with the term :-)) in analogy with AI alignment, where it’s also not enough to just align one AI or all current AIs or some other subset of all AIs that’ll ever come into existence.
Our next experiment is actually not time-bounded but we’re the only buyers (retro funders), so the risk is masked again.
I wonder, though, when I play this through in my mind, I can’t quite see almost any investor investing anything but tiny amounts into a project on the promise that there might be at some point a retro funder for it. It’s a bit like name squatting of domains or Bitclout user names. People buy ten thousands of them in the hopes of reselling a few of them at a profit, so they buy them only when they are still very very cheap (or particularly promising). One place sold most of them at $50–100, so they must’ve bought them even cheaper. One can’t do a lot of harm (at the margin) with that amount of money.
Conversely, if an investor wants to bet a lot of money on a potential future unaligned retro funder, they need to be optimistic that the retro funding they’ll receive will be so massive that it makes up for all the time they had to stay invested. Maybe they’ll have to stay invested 5 years or 20 years, and even then only have a tiny, tiny chance that the unaligned retro funder, even conditional on showing up, will want to buy the impact of that particular project. Counterfactually they could’ve made a riskless 10–30% APY all the while. So it seems like a a rare thing to happen.
But I could see Safemoon type of things happening in more than extremely unlikely cases. Investors invest not because of any longterm promises of unaligned retro funders decades later but because they expect that other investors will invest because the other investors also expect other investors to invest, and so on. They’ll all try to buy in before most others buy in and then sell quickly before all others sell, so they’ll just create a lot of volatility and redistribute assets rather randomly. That seems really pointless, and some of the investors may suffer significant losses, but it doesn’t seem catastrophic for the world. People will probably also learn from it for a year or so, so it can only happen about once a year.
Or can you think of places where this happens in established markets? Penny stocks, yield farming platforms? In both cases the investors either seem small, unsophisticated, and having little effect on the world, or sophisticated and very quickly in and out, also with little effect on the world.
I think the concern here is not about “unaligned retro funders” who consciously decide to do harmful things. It doesn’t take malicious intent to misjudge whether a certain effort is ex-ante beneficial or harmful in expectation.
Suppose investors were able to buy impact certificates of organizations like OpenAI, Anthropic, Conjecture, EcoHealth Alliance etc. These are plausibly very high-impact organizations. Out of 100 aligned retro funders, some may judge some of these organizations to be ex-ante net-positive. And it’s plausible that some of these organizations will end up being extremely beneficial. So it’s plausible that some retro funders (and thus also investors) would pay a lot for the certificates of such orgs.
Okay, but if you’re not actually talking about “malicious” retro funders (a category in which I would include actions that are not typically considered malicious today, such as defecting against minority or nonhuman interests), the difference between a world with and without impact markets becomes very subtle and ambiguous in my mind.
Like, I would guess that Anthropic and Conjecture are probably good, though I know little about them. I would guess that early OpenAI was very bad and current OpenAI is probably bad. But I feel great uncertainty over all of that. And I’m not even taking all considerations into account that I’m aware of because we still don’t have a model of how they interact. I don’t see a way in which impact markets could systematically prevent (as opposed to somewhat reduce) investment mistakes that today not even funders as sophisticated as Open Phil can predict.
Currently, all these groups receive a lot of funding from the altruistic funders directly. In a world with impact markets, the money would first come from investors. Not much would change at all. In fact I see most benefits here in the incentive alignment with employees.
In my models, each investor makes fewer grants than funders currently do because they specialize more and are more picky. My math doesn’t work out, doesn’t show that they can plausibly make a profit, if they’re similarly or less picky than current funders.
So I could see a drop in sophistication as relatively unskilled investors enter the market. But then they’d have to improve or get filtered out within a few years as they lose their capital to more sophisticated investors.
Relatively speaking, I think I’m more concerned about the problem you pointed out where retro funders get scammed by issuers who use p-hacking-inspired tricks to make their certificates seem valuable when they are not. Sophisticated retro funders can probably address that about as well as top journals can, which is already not perfect, but more naive retro funders and investors may fall for it.
One new thing that we’re doing to address this is to encourage people to write exposés of malicious certificates and sell their impact. Eventually of course I also want people to be able to short issuer stock.
I think it depends on the extent to which the (future) retro funders take into account the ex-ante impact, and evaluate it without an upward bias even if they already know that the project ended up being extremely beneficial.
Yes, that’ll be important!
Agreed. Is there a way to control the set of retro funders? (If anyone who wants to act as a retro funder can do so, the set of retro funders may end up including well-meaning-but-not-that-careful actors).
The cause is different though, in (1) the failure is caused by a version of the distribution mismatch problem, in (2) the failure is analogous to the unilateralist’s curse.
In the failure mode I had in mind the issuer does not lie or keeps something a secret at any point. Only after realizing that the project is failing (and that the ex post impact will be zero if nothing dramatic happens), a new intervention is planned and carried out. That intervention is risky and net-negative, but has a chance to be extremely beneficial, and thus (due to the distribution mismatch problem) makes the certificate price go up.
In that case, are the retro funders supposed to not buy the certificates? (Even if both the ex ante and ex post impacts are very beneficial?) [EDIT: for example, let’s say that the certificates are for creating some AI safety org, and that org carries out a newly planned, risky AI safety intervention.]
(I’m not familiar with the details of potential implementations, but...) if profit-motivated speculators can make a profit by trading certificates that are not listed on a particular portal (and everyone can create a competing portal) then most profit-motivated speculators may end up using a competing portal that lists all the tradable certificates.
That’s a big worry of mine and the reason that I came up with the pot. So long as there are enough good retro funders, things are still sort of okayish as most issuers and speculators will be more interested in catering to the good retro funders will all the capital. But if the bad retro funders become too many or have too much capital, it becomes tricky. The pot is set up such that the capital it has scales in proportion to the activity on the market, so that it’ll always have roughly the same relative influence on the market. Sadly, it’s not huge, but it’s the best I’ve come up with so far.
The other solution is targeted marketing – making it so that the nice and thoughtful retro funders become interested in the market and not the reckless ones.
Impact certificates are required to be very specific. (Something we want to socially enforce, e.g., by having auditors refuse vague ones and retro funders avoid them.) So say an issuer issues and impact certificate for “I will distribute 1000 copies of the attached Vegan Outreach leaflet at Barbican Station in London between April 1 and August 1, 2022. [Insert many more details.]” They go on to do exactly that and poll a few people about their behavior change. But in July, as they hand out the last few leaflets, they start to realize that leafletting is fairly ineffective. They are disappointed, and so blow up a slaughterhouse instead and are transparent about it. The retro funder will read about that and be like, sure, you blew up a slaughterhouse, but that’s not what this impact certificate is about. And if they issue a new impact certificate for the action, they have to consider all the risks of killing humans, killing nonhumans, going to prison, hurting the reputation of the movement, etc., which will make everyone hesitant to invest because of the low ex ante expected impact.
Just saw your edit: One impact certificate for a whole org is much too vague imo. Impact certificates should be really well defined, and the actions and strategy of an org can change, as in your example. Orgs should rather sell all their activities as individual impact certificates. I envision them like the products that a company sells. E.g., if Nokia produces toilet paper, then the rolls or batches of toilet paper are the impact certificates and Nokia is the org that sells them.
Orgs can then have their own classic securities whose price they can control through buy-backs from the profits of impact certificate sales. Or I also envision perpetual futures that track an index of the market cap of all impact certificates issued by the same org.
Impact certificates as small as rolls of toilet paper are probably a bit unwieldy, but AMF, for example, has individual net distributions that are planned thoroughly in advance, so that would be suitable. I’ll rather want to err on the side of requiring more verifiability and clarity than on the side of allowing everyone to create impact certificates for anything they do because with vague interventions it’ll over time become more and more difficult to disentangle whether impact has been (or should be considered to have been) double-sold.
Yep, also a big worry of mine. I had hoped to establish the pot better by making it mandatory to put a little fee into it and participate in the bet. But that would’ve just increased the incentives for people who don’t like the pot jury to build their own more laissez-faire platform without pot.
My fallback plan if I decide that impact markets are too dangerous is to stay in touch with all the people working on similar projects to warn them of the dangers too.
I agree that requiring certificates to specify very specific interventions seems to alleviate (3).
I would recommend biting the decision theoretic bullet that this is not a problem. If you feel that negative outcomes are worse than positive outcomes of equal quantity, then adjust your units, they’re miscalibrated.
So would The Pot be like, an organization devoted especially to promoting integrity in the market? I’m not sure I can see why it would hold together.
My Venture Granters design becomes relevant again. Investors just get paid a salary. Their career capital (ability to allocate funding) is measured in a play-currency. Selfish investors don’t apply. Unselfish investors are invited in and nurtured.
I think people will be shocked at how impactful public work in software will be. The most important, foundational, productivity-affecting software all generally has basically no funding in a private system. Examples include Web Browsers, Programming Languages, and UI toolkits. The progress in those areas transpires astoundingly slowly in proportion to how many people need it. As soon as you start measuring its impact and rewarding work in proportion to it, the world will change a lot.
I’m not sure how much of this applies to AI, though.
We (the Impact Certificate market convocation) just had a call, and we talked about this a bit, and we realize that most of this seems to crux on the question of whether there are any missing or underdeveloped public goods in AI. It seems like there might not be. It’s mysterious as to why there is so much open sharing in the sector, but the dominant private players do seem to be sharing everything they’d need to share in order to avoid the coordination problems that might have impeded progress.
I’m pretty hopeful at this point that we’ll be able to establish a standard of auditing against work with potential long-term negative externalities.
Most certs will need some form of auditing. We already know from the carbon market that double spending and fraud can be an enormous problem and funders should expect the same in other sectors, buying impact certs with no audit signatures shouldn’t really be a thing. If we can make No Untracked Longterm Negative Externalities (NULNE) audits one of the default signatures that show a nasty red cross logo on the cert until one has been acquired, that could establish healthy culture of use.
I’m on board with that, and the second that you’re quoting seems to express that. Or am I misunderstanding what you’re referring to? (The quoted section basically says that, e.g., +100 utility with 50% probability and −100 utility with 50% probability cancel out to 0 utility in expectation. So the positive and the negative side are weighed equally and the units are the same.)
Generally, this (yours) is also my critique of the conflict between prioritarianism and classic utilitarianism (or some formulations of those).
Yeah, that’s how I imagine it. You mean it would just have a limited life expectancy like any company or charity? That makes sense. Maybe we could try to push to automate it and create several alternative implmentations of it. Being able to pay people would also be great. Any profit that it could use to pay staff would detract from its influence, but that’s also a tradeoff one could make.
Oh, another idea of mine was to use Augur markets. But I don’t know enough about Augur markets yet to tell if there are difficulties there.
I still need to read it, but it’s on my reading list! Getting investments from selfish investors is a large part of my motivation. I’m happy to delay that to test all the mechanisms in a safe environment, but I’d like it to be the goal eventually when we deem it to be safe.
Yeah, it would be interesting to get opinions of anyone else who is reading this.
So the way I understand this questions is that there may be retro funders who reward free and open source software projects that have been useful. Lots of investors will be very quick and smart about ferreting out what the long-tail of the 10,000s of tiny libraries are that are holding all the big systems like GPT-3 together. Say, maybe the training data for GPT-3 is extracted by a custom software that relies on cchardet to detect the encoding of the websites it downloads if they’re not declared, misdeclared, or ambiuously declared. That influx of funding to these tiny projects will speed up their development and will supercharge them to the point where they can do their job a lot better and speed up development processes by 2–10x or so.
Attributed impact, the pot, and aligned retro funders would first need to become aware of this (or similar hidden risks), and would then decide that software projects like that are risky, and that they need to make a strong case for why they’re differentially more useful for safety or other net positive work than for enhancing AI capabilities. But the risk is sufficiently hidden that this is the sort of thing where another unaligned funder with a lot of money might come in an skew the market in the direction of their goals.
The assumptions, as I see them, are:
Small sortware projects can be sufficiently risky.
The influence of unaligned funders who disregard this risk is great.
The unaligned funders cannot be reasoned with and either reject attributed impact or argue that small software projects are not risky.
They overwhelm the influence of the pot.
They overwhelm the influence of all other retro funders that software projects might cater to.
Yeah, that sounds sensible. Or make it impossible to display them anywhere in the first place without audit?