This assumes that the only benefit of public perception is that it brings in more people. In many instances, better perceptions could also mean various interventions working better (such as if an intervention depends on societal buy-in).
epistemic status: I saw the best minds of my generation
Limply balancing mile-deep cocktail glasses and clamoring for laughable Thielbucks on the cramped terraces of Berkely,
who invite you to the luring bay, promising to pay for your entire being while elegantly concealing the scarlet light evaporating from their tearducts and nostrils,
who down 3¾ bottles of Huel™ and sprint to their next fellowship or retreat or meetup or workshop, in the pore-showingly lit hallways with whiteboards and whiteboards and whiteboards and whiteboards,
who want to crown random nerds with aureal sigmas fished from the manifold crevaces of self-denying deities,
who write epics of institutionalized autists bent on terraforming these overpopulated hypothetical hells,
pointing out dynamic inconsistency in the comments and melting hectoton steel-marbles around civilizational recipes,
who improve their aim hurling yeeting philosophical tomes at unsuspecting passerbys and ascend urban lampposts to loudly deduce their mechanical liturgies,
who cut down oaks and poppies in the parks to make oil and space for those ones yet to come and who hunker down in fortified coalmines in Montana and the Ruhrgebiet, propelled into the rock by a Ordite
reactor, a Bostromite dynamo,
who exitedly pour & plaster tabulated exponents into webscripted canisters just below the teal-tinded ceiling,
who can name the 11 AI forecasting organisations and the 4 factors of successful nonprofits and the 7 noble ways of becoming more agentic and might even find Rwanda on a map not made in Silicon Valley,
contemplating hemispherectomies to purify their nascent idealism on the verge of a hope-ash London dawn,
who catch a feral heart in the garden behind the A100 rack and save it into a transparent domicile, injecting it with 17000 volts to illuminate all the last battery cages equally,
who empty out their pockets with uncountable glaring utilons onto innocent climate activists, promising to make them happy neutron stars one day,
Microscopically examining the abyssal monstrosities their oracles conjure up out of the lurching empty chaos,
who fever towards silver tendrils bashing open their skulls and eating up their brains and imaginations, losslessly swallowed into that ellipsoid strange matter gut pulsing out there between the holes
epistemic status: Borderline schizopost, not sure I’ll be able to elaborate much better on this, but posting anyway, since people always write that one shouldpost on the forum. Feel free to argue against. But: Don’t let this be the only thing you read that I’ve written.
Effective Altruism is a Pareto Frontier of Truth and Power
In order to be effective in the world one needs to coordinate
(exchange evidence, enact plans in groups, find shared descriptions
of the world) and interact with hostile entities (people who lie,
people who want to steal your resources, subsystems of otherwise
aligned people who want to do those things, engage in public
relations or zero-sum conflict). Solving those often requires trading
off truth for “power” on the margin, e.g. by nudging members to
“just accept” conclusions for action believed to be a basis for
effective action (since making elaborate arguments common knowledge is costly and agreement converges slowly to ε difference with O(1ε2) bits on evidence-sharing), by misrepresenting beliefs to other actors to make them more favorable towards effective
altruism, or by choosing easy-communicable Schelling categories that minmax utility to the lowest-bounded agents.
On the one side of the Pareto frontier one would have an even more
akrasia-plagued version of the rationality community with excellent
epistemics but which would be universally hated, on the other hand one
would have the attendants of this party.
Members of effective altruism seem not explicitely aware of this tradeoff or tension between truth-seeking and effectiveness/power (maybe for power-related reasons?) or at least don’t talk about it, even though it appears to be relevant.
Bad Information Hazard Disclosure for Effective Altruists (Bad I.D.E.A)
Epistemic effort: Four hours of armchair thinking, and two hours of discussion. No literature review, and the equations are intended as pointers rather than anything near conclusive.
Currently, the status quo in information sharing is that a suboptimally large amount of information hazards are likely being shared. In order to decrease infohazard sharing, we have modeled out a potential system for achieving such a goal. As with all issues related to information hazards, we strongly discourage unilateral action. Below you find a rough outline of such a possible system and descriptions of its downsides. We furthermore currently believe that the for the described system, in the domain of biosecurity the disadvantages likely outweigh the advantages (that’s why we called it Bad IDEA), and in the domain of AI capabilities research the advantages outweigh the disadvantages (due to suboptimal sharing norms such as “publishing your infohazard on arXiv”). It’s worth noting that there are potentially many more downside risks that neither author thought of.
Few incentives not to publish dangerous information
Based in previous known examples
We’d like a system to incentivize people not to publish infohazards
Model
Researcher discovers infohazard
Researcher writes up description of infohazard (longer is better)
Researcher computes cryptographic hash of infohazard
Researcher sends hash of description to IDEA
Bad IDEA stores hash
Two possibilities:
Infohazard gets published
Researcher sends in description of infohazard
Bad IDEA computes the cryptographic hash and compares the two
Bad IDEA estimates the badness of the infohazard
Researcher gets rewarded according to the reward function
Bad IDEA deletes the hash function and description of infohazard from their database
Researcher wants intermediate payout
All of the above steps, except IDEA doesn’t delete the hash function from the database, but does delete the the description of the infohazard
Reward Function Desiderata
We know it’ll be Goodharted, but we can at least try.
Reward function is dependent on
Danger of the infohazard (d)
Reward higher if danger is higher
Time between discovery & cashin (t)
Reward higher if time between discovery & cashin is longer
The number of people who found it (n)
Lower payout if more people found it, to discourage sharing of the idea
Reward being total_payout/discoveries? Or something that increases total_payout as a function of independent discoveries?
Latter case would make sense, since that indicates the idea is “easier” to find (or at least the counterfactual probability of discovery is higher)
Counterfactual probability of it being discovered (p)
This is really hard to estimate
If counterfactual discovery probability is high, we want to reward higher than if it’s low.
How difficult the idea is to discover
Ideas that are “very” difficult to discover would be given less than “easy” ideas. This could potentially discourage strong efforts to research new information hazards
Individual payout for a researcher then is
f(d,t,n,p)=d⋅tn+p⋅√d
Alternative version punishes actively for looking for infohazards
f(d,t,n,p)=d⋅tn−(1−p)⋅√d
Advantages
Base rate of information hazard discovery → publication rate
Disadvantages
Incentive for people to research & create infohazards
Might be counteracted by the right reward function which incorporates counterfactual discovery probability
Gently sharing existence of IDEA with trusted actors
Longtermism seems to rely on zero discount rates for the value of future lives. But per moral uncertainty, we probably have a probability distribution over discount rates. This probability distribution is very likely skewed towards having positive discount rates (there are much more plausible reasons why future lives are worth less than current lives, but very few (none?) why they should be more important ceteris paribus).
Therefore, expected discount rate is positive, and longtermism loses some of its bite.
Possible counterarguments
Discount rates are not part of moral uncertainty, but different kind of normativity (decision theoretic?), over which we ought not have uncertainty
Equally plausible reasons for positive as for negative discount rates (although I don’t know which ones?)
Complete certainty in 0 discount rate (seems way overconfident imho)
Main inspiration from the chapter on practical implications of moral uncertainty from MacAskill, Bykvist & Ord 2020. I remember them discussing very similar implications, but not this one – why?
Consider the problem of being automated away in a period of human history
with explosive growth, and having to subsist on one’s capital. Property
rights are respected, but there is no financial assistance by governments
or AGI corporations.
How much wealth does one need to have to survive, ideally indefinitely?
Finding: If you lose your job at the start of the singularity, with
monthly spending of $1k, you need ~$71k in total of capital. This
number doesn’t look very sensitive to losing one’s job slightly later.
At the moment, the world economy is growing at a pace that leads to
doublings in GWP
every 20 years, steadily since ~1960. Explosive growth might instead be
hyperbolic
(continuing the trend we’ve seen seen through human history so
far),
with the economy first doubling in 20, then in 10, then in 5, then
2.5, then 15 months, and so on. I’ll assume that the smallest time for
doublings is 1 year.
initial_doubling_time=20
final_doubling_time=1
initial_growth_rate=2^(1/(initial_doubling_time*12))
final_growth_rate=2^(1/(final_doubling_time*12))
function generate_growth_rate_array(months::Int)
growth_rate_array = zeros(Float64, years)
growth_rate_step = (final_growth_rate - initial_growth_rate) / (years - 1)
current_growth_rate = initial_growth_rate
for i in 1:years
growth_rate_array[i] = current_growth_rate
current_growth_rate += growth_rate_step
end
return growth_rate_array
end
And we can then write a very simple model of monthly spending to figure
out how our capital develops.
capital=collect(1:250000)
monthly_spending=1000 # if we really tighten our belts
for growth_rate in economic_growth_rate
capital=capital.*growth_rate
capital=capital.-monthly_spending
end
capital now contains the capital we end up with after 60 years. To find
the minimum amount of capital we need to start out with to not lose out
we find the index of the number closest to zero:
So, under these requirements, starting out with more than $71k should be fine.
But maybe we’ll only lose our job somewhat into the singularity
already! We can simulate that as losing a job when initial doubling
times are 15 years:
initial_doubling_time=15
initial_growth_rate=2^(1/(initial_doubling_time*12))
years=12*ceil(Int, 10+5+2.5+1.25+final_doubling_time)
economic_growth_rate = generate_growth_rate_array(years)
economic_growth_rate=cat(economic_growth_rate, repeat([final_growth_rate], 60*12-size(economic_growth_rate)[1]), dims=1)
capital=collect(1:250000)
monthly_spending=1000 # if we really tighten our belts
for growth_rate in economic_growth_rate
capital=capital.*growth_rate
capital=capital.-monthly_spending
end
The amount of initially required capital doesn’t change by that much:
If agents are usually suffering then negative sum games might be good via destructive warfare that removes negentropy, which is one consideration in cooperation-focused agendas for suffering-focused longtermism.
This sequence is spam and should be deleted.
Assuming that interventions have log-normally distributed impact, compromising on interventions for the sake of public perception is not worth it unless it brings in exponentially more people.
This assumes that the only benefit of public perception is that it brings in more people. In many instances, better perceptions could also mean various interventions working better (such as if an intervention depends on societal buy-in).
epistemic status: I saw the best minds of my generation
Limply balancing mile-deep cocktail glasses and clamoring for laughable Thielbucks on the cramped terraces of Berkely,
who invite you to the luring bay, promising to pay for your entire being while elegantly concealing the scarlet light evaporating from their tearducts and nostrils,
who down 3¾ bottles of Huel™ and sprint to their next fellowship or retreat or meetup or workshop, in the pore-showingly lit hallways with whiteboards and whiteboards and whiteboards and whiteboards,
who want to crown random nerds with aureal sigmas fished from the manifold crevaces of self-denying deities,
who write epics of institutionalized autists bent on terraforming these overpopulated hypothetical hells,
pointing out dynamic inconsistency in the comments and melting hectoton steel-marbles around civilizational recipes,
who improve their aim hurling yeeting philosophical tomes at unsuspecting passerbys and ascend urban lampposts to loudly deduce their mechanical liturgies,
who cut down oaks and poppies in the parks to make oil and space for those ones yet to come and who hunker down in fortified coalmines in Montana and the Ruhrgebiet, propelled into the rock by a Ordite reactor, a Bostromite dynamo,
who exitedly pour & plaster tabulated exponents into webscripted canisters just below the teal-tinded ceiling,
who can name the 11 AI forecasting organisations and the 4 factors of successful nonprofits and the 7 noble ways of becoming more agentic and might even find Rwanda on a map not made in Silicon Valley,
contemplating hemispherectomies to purify their nascent idealism on the verge of a hope-ash London dawn,
who catch a feral heart in the garden behind the A100 rack and save it into a transparent domicile, injecting it with 17000 volts to illuminate all the last battery cages equally,
who empty out their pockets with uncountable glaring utilons onto innocent climate activists, promising to make them happy neutron stars one day,
Microscopically examining the abyssal monstrosities their oracles conjure up out of the lurching empty chaos,
who fever towards silver tendrils bashing open their skulls and eating up their brains and imaginations, losslessly swallowed into that ellipsoid strange matter gut pulsing out there between the holes
epistemic status: Borderline schizopost, not sure I’ll be able to elaborate much better on this, but posting anyway, since people always write that one should post on the forum. Feel free to argue against. But: Don’t let this be the only thing you read that I’ve written.
Effective Altruism is a Pareto Frontier of Truth and Power
In order to be effective in the world one needs to coordinate (exchange evidence, enact plans in groups, find shared descriptions of the world) and interact with hostile entities (people who lie, people who want to steal your resources, subsystems of otherwise aligned people who want to do those things, engage in public relations or zero-sum conflict). Solving those often requires trading off truth for “power” on the margin, e.g. by nudging members to “just accept” conclusions for action believed to be a basis for effective action (since making elaborate arguments common knowledge is costly and agreement converges slowly to ε difference with O(1ε2) bits on evidence-sharing), by misrepresenting beliefs to other actors to make them more favorable towards effective altruism, or by choosing easy-communicable Schelling categories that minmax utility to the lowest-bounded agents.
On the one side of the Pareto frontier one would have an even more akrasia-plagued version of the rationality community with excellent epistemics but which would be universally hated, on the other hand one would have the attendants of this party.
Members of effective altruism seem not explicitely aware of this tradeoff or tension between truth-seeking and effectiveness/power (maybe for power-related reasons?) or at least don’t talk about it, even though it appears to be relevant.
In general, the thinking having come out of Lesswrong in the last couple of years strongly suggests that while (for ideal agents) there’s no such tension in individual rationality (because true beliefs are convergently instrumental), this does not hold for groups of humans (and maybe also not for groups of bounded agents in general, although there’s some people who believe strong coordination is easy for highly capable bounded agents).
There is also the thing where having more truth leads to more power, for instance by realizing that in some particular case the EMH is false.
This sequence is spam and should be deleted.
Bad Information Hazard Disclosure for Effective Altruists (Bad I.D.E.A)
Epistemic effort: Four hours of armchair thinking, and two hours of discussion. No literature review, and the equations are intended as pointers rather than anything near conclusive.
Currently, the status quo in information sharing is that a suboptimally large amount of information hazards are likely being shared. In order to decrease infohazard sharing, we have modeled out a potential system for achieving such a goal. As with all issues related to information hazards, we strongly discourage unilateral action. Below you find a rough outline of such a possible system and descriptions of its downsides. We furthermore currently believe that the for the described system, in the domain of biosecurity the disadvantages likely outweigh the advantages (that’s why we called it Bad IDEA), and in the domain of AI capabilities research the advantages outweigh the disadvantages (due to suboptimal sharing norms such as “publishing your infohazard on arXiv”). It’s worth noting that there are potentially many more downside risks that neither author thought of.
Note: We considered using the term sociohazard/outfohazard/exfohazard, but decided against it for reasons of understandability.
Current Situation
Few incentives not to publish dangerous information
Based in previous known examples
We’d like a system to incentivize people not to publish infohazards
Model
Researcher discovers infohazard
Researcher writes up description of infohazard (longer is better)
Researcher computes cryptographic hash of infohazard
Researcher sends hash of description to IDEA
Bad IDEA stores hash
Two possibilities:
Infohazard gets published
Researcher sends in description of infohazard
Bad IDEA computes the cryptographic hash and compares the two
Bad IDEA estimates the badness of the infohazard
Researcher gets rewarded according to the reward function
Bad IDEA deletes the hash function and description of infohazard from their database
Researcher wants intermediate payout
All of the above steps, except IDEA doesn’t delete the hash function from the database, but does delete the the description of the infohazard
Reward Function Desiderata
We know it’ll be Goodharted, but we can at least try.
Reward function is dependent on
Danger of the infohazard (d)
Reward higher if danger is higher
Time between discovery & cashin (t)
Reward higher if time between discovery & cashin is longer
The number of people who found it (n)
Lower payout if more people found it, to discourage sharing of the idea
Reward being total_payout/discoveries? Or something that increases total_payout as a function of independent discoveries?
Latter case would make sense, since that indicates the idea is “easier” to find (or at least the counterfactual probability of discovery is higher)
Counterfactual probability of it being discovered (p)
This is really hard to estimate
If counterfactual discovery probability is high, we want to reward higher than if it’s low.
How difficult the idea is to discover
Ideas that are “very” difficult to discover would be given less than “easy” ideas. This could potentially discourage strong efforts to research new information hazards
Individual payout for a researcher then is
f(d,t,n,p)=d⋅tn+p⋅√d
Alternative version punishes actively for looking for infohazards
f(d,t,n,p)=d⋅tn−(1−p)⋅√d
Advantages
Base rate of information hazard discovery → publication rate
Disadvantages
Incentive for people to research & create infohazards
Might be counteracted by the right reward function which incorporates counterfactual discovery probability
Gently sharing existence of IDEA with trusted actors
Bad IDEA observers might remember infohazard
Repository for malign actors to go & recruit
This could be solved by (unrealistically)
Real-world amnestics
AI systems trained to estimate badness
Em spurs estimating badness
Estimating danger of information hazard is quite difficult
Could be overcome through rough estimates on how many people would be capable of engaging with idea to do harm + how damaging it would be
Estimating difference between ideas are difficult
Attracting attention to the concept of infohazard
Argument against longtermism:
Longtermism seems to rely on zero discount rates for the value of future lives. But per moral uncertainty, we probably have a probability distribution over discount rates. This probability distribution is very likely skewed towards having positive discount rates (there are much more plausible reasons why future lives are worth less than current lives, but very few (none?) why they should be more important ceteris paribus).
Therefore, expected discount rate is positive, and longtermism loses some of its bite.
Possible counterarguments
Discount rates are not part of moral uncertainty, but different kind of normativity (decision theoretic?), over which we ought not have uncertainty
Equally plausible reasons for positive as for negative discount rates (although I don’t know which ones?)
Complete certainty in 0 discount rate (seems way overconfident imho)
Main inspiration from the chapter on practical implications of moral uncertainty from MacAskill, Bykvist & Ord 2020. I remember them discussing very similar implications, but not this one – why?
If there is a non-trivial possibility that a zero discount rate is correct, then the case with a zero discount rate dominates expected value calculations. See https://scholar.harvard.edu/files/weitzman/files/why_far-distant_future.pdf
You’re right. I had been thinking only about the mean on the distribution over discount rates, not the number of affected beings. Thanks :-)
Consider the problem of being automated away in a period of human history with explosive growth, and having to subsist on one’s capital. Property rights are respected, but there is no financial assistance by governments or AGI corporations.
How much wealth does one need to have to survive, ideally indefinitely?
Finding: If you lose your job at the start of the singularity, with monthly spending of $1k, you need ~$71k in total of capital. This number doesn’t look very sensitive to losing one’s job slightly later.
At the moment, the world economy is growing at a pace that leads to doublings in GWP every 20 years, steadily since ~1960. Explosive growth might instead be hyperbolic (continuing the trend we’ve seen seen through human history so far), with the economy first doubling in 20, then in 10, then in 5, then 2.5, then 15 months, and so on. I’ll assume that the smallest time for doublings is 1 year.
We can then define the doubling sequence:
And we can then write a very simple model of monthly spending to figure out how our capital develops.
capital
now contains the capital we end up with after 60 years. To find the minimum amount of capital we need to start out with to not lose out we find the index of the number closest to zero:So, under these requirements, starting out with more than $71k should be fine.
But maybe we’ll only lose our job somewhat into the singularity already! We can simulate that as losing a job when initial doubling times are 15 years:
The amount of initially required capital doesn’t change by that much:
If agents are usually suffering then negative sum games might be good via destructive warfare that removes negentropy, which is one consideration in cooperation-focused agendas for suffering-focused longtermism.