“Partial” might work instead of “non-absolute,” but I still favor the latter even though it’s bulkier. I like that “non-absolute” points to a challenge that arises when our predictive powers are nonzero, even if they are very slim indeed. By contrast, “partial” feels more aligned with the everyday problem of reasoning under uncertainty.
One of the challenges is that “absolute cluelessness” is a precise claim: beyond some threshold of impact scale or time, we can never have any ability to predict the overall moral consequences of any action.
By contrast, the practical problem is not as a precise claim, except perhaps as a denial of “absolute cluelessness.”
After thinking about it for a while, I suggest “problem of non-absolute cluelessness.” After all, isn’t it the idea that we are not clueless about the long term future, and therefore that we have a responsibility to predict and shape it for the good, that is the source of the problem? If we were absolutely clueless, then we would not have that responsibility and would not face that problem.
So I might vote for “absolutely clueless” and “non-absolutely clueless” to describe the state of being, and the “problem of absolute cluelessness” and “problem of non-absolute cluelessness” to describe the respective philosophical problems.
This reminds me of a conversation I had with John Wentworth on LessWrong, exploring the idea that establishing a scientific field is a capital investment for efficient knowledge extraction. Also of a piece of writing I just completed there on expected value calculations, outlining some of the challenges in acting strategically to diminish our uncertainty.
One interesting thing to consider is how to control such a capital investment, once it is made. Institutions have a way of defending themselves. Decades ago, people launched the field of AI research. Now, it’s questionable whether humanity can ever gain sufficient control over it to steer toward safe AI. It seems that instead, “AI safety” had to be created as a new field, one that seeks to impose itself on the world of AI research partly from the outside.
It’s hard enough to create and grow a network of researchers. To become a researcher at all, you have to be unusually smart and independent-minded, and willing to brave the skepticism of people who don’t understand what you do even a fraction as well as you do yourself. You have to know how to plow through to an achievement that will clearly stand out to others as an accomplishment, and persuade them to keep sustaining your funding. That’s the sort of person who becomes a scientist. Anybody with those characteristics is a hot commodity.
How do you convince a whole lot of people with that sort of mindset to work toward a new goal? That might be one measure of a “good research product” for a nascent field. If it’s good enough to convince more scientists, especially more powerful scientists, that your research question is worth additional money and labor relative to whatever else they could fund or work on, you’ve succeeded. That’s an adversarial contest. After all, you have to fight to get and keep their attention, and then to persuade them. And these are some very intelligent, high-status people. They absolutely have better things to do, and they’re at least as bright as you are.
All these projects seem beneficial. I hadn’t heard of any of them, so thanks for pointing them out. It’s useful to frame this as “research on research,” in that it’s subject to the same challenges with reproducibility, and with aligning empirical data with theoretical predictions to develop a paradigm, as in any other field of science. Hence, I support the work, while being skeptical of whether such interventions will be useful and potent enough to make a positive change.
The reason I brought this up is that the conversation on improving the productivity of science seems to focus almost exclusively on problems with publishing and reproducibility, while neglecting the skill-building and internal-knowledge aspects of scientific research. Scientists seem to get a feel through their interactions with their colleagues for who is trustworthy and capable, and who is not. Without taking into account the sociology of science, it’s hard to know whether measures taken to address problems with publishing and reproducibility will be focusing on the mechanisms by which progress can best be accelerated.
Honest, hardworking academic STEM PIs seem to struggle with money and labor shortages. Why isn’t there more money flowing into academic scientific research? Why aren’t more people becoming scientists?
The lack of money in STEM academia seems to me a consequence of politics. Why is there political reluctance to fund academic science at higher levels? Is academia to blame for part of this reluctance, or is the reason purely external to academia? I don’t know the answers to these questions, but they seem important to address.
Why don’t more people strive to become academic STEM scientists? Partly, industry draws them away with better pay. Part of the fault lies in our school system, although I really don’t know what exactly we should change. And part of the fault is probably in our cultural attitudes toward STEM.
Many of the pro-reproducibility measures seem to assume that the fastest road to better science is to make more efficient use of what we already have. I would also like to see us figure out a way to produce more labor and capital in this industry. To be clear, I mean that I would like to see fewer people going into non-STEM fields—I am personally comfortable with viewing people’s decision to go into many non-STEM fields as a form of failure to achieve their potential. That failure isn’t necessarily their fault. It might be the fault of how we’ve set up our school, governance, cultural or economic system.
Indoor CO2 concentrations and cognitive function: A critical review (2020)
“In a subset of studies that meet objective criteria for strength and consistency, pure CO2 at a concentration common in indoor environments was only found to affect high-level decision-making measured by the Strategic Management Simulation battery in non-specialized populations, while lower ventilation and accumulation of indoor pollutants, including CO2, could reduce the speed of various functions but leave accuracy unaffected.”I haven’t been especially impressed by claims that normal indoor CO2 levels are impairing cognitive function to any extent worth worrying about. Crack a window, I guess?
it could be a lot more valuable if reporting were more rigorous and transparent
Rigor and transparency are good things. What would we have to do to get more of them, and what would the tradeoffs be?
Do I understand your comment correctly that you think that in your field that the purpose of publishing is mainly to communicate to the public, and that publications are not very important for communicating within the field to other researchers or towards end users in the industry?
No, the purpose of publishing is not mainly to communicate to the public. After all, very few members of the public read scientific literature. The truth-seeking or engineering achievement the lab is aiming for is one thing. The experiments they run to get closer are another. And the descriptions of those experiments are a third thing. That third thing is what you get from the paper.
I find it useful at this early stage in my career because it helps me find labs doing work that’s of interest to me. Grantmakers and universities find them useful to decide who to give money to or who to hire. Publications show your work in a way that a letter of reference or a line on a resume just can’t. Fellow researchers find them useful to see who’s trying what approach to the phenomena of interest. Sometimes, an experiment and its writeup are so persuasive that they actually persuade somebody that the universe works differently than they’d thought.
As you read more literature and speak with more scientists, you start to develop more of a sense of skepticism and of importance. What is the paper choosing to highlight, and what is it leaving out? Is the justification for this research really compelling, or is this just a hasty grab at a publication? Should I be impressed by this result?
It would be nice for the reader if papers were a crystal-clear guide for a novice to the field. Instead, you need a decent amount of sophistication with the field to know what to make of it all. Conversations with researchers can help a lot. Read their work and then ask if you can have 20 minutes of their time; they’ll often be happy to answer your questions.
And yes, fields do seem to go down dead ends from time to time. My guess is it’s some sort of self-reinforcing selection for biased, corrupt, gullible scientists who’ve come to depend on a cycle of hype-building to get the next grant. Homophilia attracts more people of the same stripe, and the field gets confused.
Tissue engineering is an example. 20-30 years ago, the scientists in that field hyped up the idea that we were chugging toward tissue-engineered solid organs. Didn’t pan out, at least not yet. And when I look at tissue engineering papers today, I fear the same thing might repeat itself. Now we have bioprinters and iPSCs to amuse ourselves with. On the other hand, maybe that’ll be enough to do the trick? Hard to know. Keep your skeptical hat on.
My experience talking with scientists and reading science in the regenerative medicine field has shifted my opinion against this critique somewhat. Published papers are not the fundamental unit of science. Most labs are 2 years ahead of whatever they’ve published. There’s a lot of knowledge within the team that is not in the papers they put out.
Developing a field is a process of investment not in creating papers, but in creating skilled workers using a new array of developing technologies and techniques. The paper is a way of stimulating conversation and a loose measure of that productivity. But just because the papers aren’t good doesn’t mean there’s no useful learning going on, or that science is progressing in a wasteful manner. It’s just less legible to the public.
For example, I read and discussed with the authors a paper on a bioprinting experiment. They produced a one centimeter cube of human tissue via extrusion bioprinting. The materials and methods aren’t rigorously controllable enough for reproducibility. They use decellularized pig hearts from the local butcher (what’s it been eating, what were its genetics, how was it raised?), and an involved manual process to process and extrude the materials.
Several scientists in the field have cautioned me against assuming that figures in published data are reproducible. Yet does that mean the field is worthless? Not at all. New bioprinting methods continue to be developed. The limits of achievement continue to expand. Humanity is developing a cadre of bioengineers who know how to work with this stuff and sometimes go on to found companies with their refined techniques.
It’s the ability to create skilled workers in new manufacturing and measurement techniques, skilled thinkers in some line of theory, that is an important product of science. Reproducibility is important, but that’s what you get after a lot of preliminary work to figure out how to work with the materials and equipment and ideas.
Looking forward to hearing about those vetting constraints! Thanks for keeping the conversation going :)
Imagine we can divide up the global economy into natural clusters. We’ll refer to each cluster as a “Global Project.” Each Global Project consists of people and their ideas, material resources, institutional governance, money, incentive structures, and perhaps other factors.
Some Global Projects seem “bad” on the whole. They might have directly harmful goals, irresponsible risk management, poor governance, or many other failings. Others seem “good” on net. This is not in terms of expected value for the world, but in terms of the intrinsic properties of the GP that will produce that value.
It might be reasonable to assume that Global Project quality is normally distributed. One point of possible difference is the center of that distribution. Are most Global Projects of bad quality, neutral, or good quality?
We might make a further assumption that the expected value of a Global Project follows a power law, such that projects of extremely low or high quality produce exponentially more value (or more harm). Perhaps, if Q is project quality and V is value, V=QN. But we might disagree on the details of this power law.
One possibility is that in fact, it’s easier to destroy the world than to improve the world. We might model this with two power laws, one for Q > 0 and one for Q < 0, like so:
V=Q3, Q >= 0
V=Q7, Q < 0
In this case, whether or not progress is good will depend on the details of our assumptions about both the project quality distribution and the power law for expected value:
The size of N, and whether or not the power law is uniform or differs for projects of various qualities. Intuitively, “is it easier for a powerful project to improve or destroy the world, and how much easier?”
How many standard deviations away from zero the project quality distribution is centered, and in which direction. Intuitively, “are most projects good or bad, and how much?”
In this case, whether or not average expected value across many simulations of such a model is positive or negative can hinge on small alterations of the variables. For example, if we set N = 7 for bad projects and N = 3 for good projects, but we assume that the average project quality is +0.6 standard deviations from zero, then average expected value is mildly negative. At project quality +0.7 standard deviations from zero, the average expected value is mildly positive.
Here’s what an X-risk “we should slow down” perspective might look like. Each plotted point is a simulated “world.” In this case, the simulation produces negative average EV across simulated worlds.
And here is a Progress Studies “we should speed up” perspective might look like, with positive average EV.
The joke is that it’s really hard to tell these two simulations apart. In fact, I generated the second graph by altering the center point of the project quality distribution 0.01 standard deviations to the right relative to the first graph. In both case, a lot of the expected value is lost to a few worlds in which things go cataclysmically wrong.
One way to approach a double crux would be for adherents of the two sides to specify, in the spirit of “if it’s worth doing, it’s worth doing with made up statistics,” their assumptions about the power law and project quality distribution, then argue about that. Realistically, though, I think both sides understand that we don’t have any realistic way of saying what those numbers ought to be. Since the details matter on this question, it seems to me that it would be valuable to find common ground.
For example, I’m sure that PS advocates would agree that there are some targeted risk-reduction efforts that might be good investments, along with a larger class of progress-stimulating interventions. Likewise, I’m sure that XR advocates would agree that there are some targeted tech-stimulus projects that might be X-risk “security factors.” Maybe the conversation doesn’t need to be about whether “more progress” or “less progress” is desirable, but about the technical details of how we can manage risk while stimulating growth.
Yeah, I am worried we may be talking past each other somewhat. My takeaway from the grantmaker quotes from FHI/OpenPhil was that they don’t feel they have room to grow in terms of determining the expected value of the projects they’re looking at. Very prepared to change my mind on this; I’m literally just going from the quotes in the context of the post to which they were responding.
Given that assumption (that grantmakers are already doing the best they can at determining EV of projects), then I think my three categories do carve nature at the joints. But if we abandon that assumption and assume that grantmakers could improve their evaluation process, and might discover that they’ve been neglecting to fund some high-EV projects, then that would be a useful thing for them to discover.
Your previous comment seemed to me to focus on demand and supply and note that they’ll pretty much always not be in perfect equilibrium, and say “None of those problems indicate that something is wrong”, without noting that the thing that’s wrong is animals suffering, people dying of malaria, the long-term future being at risk, etc.
In the context of the EA forum, I don’t think it’s necessary to specify that these are problems. To state it another way, there are three conditions that could exist (let’s say in a given year):
Grantmakers run out of money and aren’t able to fund all high-quality EA projects.
Grantmakers have extra money, and don’t have enough high-quality EA projects to spend it on.
Grantmakers have exactly enough money to fund all high-quality EA projects.
None of these situations indicate that something is wrong with the definition of “high quality EA project” that grantmakers are using. In situation (1), they are blessed with an abundance of opportunities, and the bottleneck to do even more good is funding. In situation (2), they are blessed with an abundance of cash, and the bottleneck to do even more good is the supply of high-quality projects. In situation (3), they have two bottlenecks, and would need both additional cash and additional projects in order to do more good.
No matter how many problems exist in the world (suffering, death, X-risk), some bottleneck or another will always exist. So the simple fact that grantmakers happen to be in situation (2) does not indicate that they are doing something wrong, or making a mistake. It merely indicates that this is the present bottleneck they’re facing.
For the rest, I’d say that there’s a difference between “willingness to work” and “likelihood of success.” We’re interested in the reasons for EA project supply inelasticity. Why aren’t grantmakers finding high-expected-value projects when they have money to spend?
One possibility is that projects and teams to work on them aren’t motivated to do so by the monetary and non-monetary rewards on the table. Perhaps if this were addressed, we’d see an increase in supply.
An alternative possibility is that high-quality ideas/teams are rare right now, and can’t be had at any price grantmakers are willing or able to pay.
In particular, I think it implies the only relevant type of “demand” is that coming from funders etc., whereas I’d want to frame this in terms of ways the world could be improved.
My position is that “demand” is a word for “what people will pay you for.” EA exists for a couple reasons:
Some object-level problems are global externalities, and even governments face a free rider problem. Others are temporal externalities, and the present time is “free riding” on the future. Still others are problems of oppression, where morally-relevant beings are exploited in a way that exposes them to suffering.Free-rider problems by their nature do not generate enough demand for people to do high-quality work to solve them, relative to the expected utility of the work. This is the problem EA tackled in earlier times, when funding was the bottleneck.
Even when there is demand for high-quality work on these issues, supply is inelastic. Offering to pay a lot more money doesn’t generate much additional supply. This is the problem we’re exploring here.
The underlying root cause is lack of self-interested demand for work on these problems, which we are trying to subsidize to correct for the shortcoming.
I can see how you might interpret it that way. I’m rhetorically comfortable with the phrasing here in the informal context of this blog post. There’s a “You can...” implied in the positive statements here (i.e. “You can take 15 years and become a domain expert”). Sticking that into each sentence would add flab.
There is a real question about whether or not the average person (and especially the average non-native English speaker) would understand this. I’m open to argument that one should always be precisely literal in their statements online, to prioritize avoiding confusion over smoothing the prosody.
Thanks for that context, John. Given that value prop, companies might use a TB-like service under two constraints:
They are bottlenecked by having too few applicants. In this case, they have excess interviewing capacity, or more jobs than applicants. They hope that by investigating more applicants through TB, they can find someone outstanding.
Their internal headhunting process has an inferior quality distribution relative to the candidates they get through TB. In this case, they believe that TB can provide them with a better class of applicants than their own job search mechanisms can identify. In effect, they are outsourcing their headhunting for a particular job category.
Given that EA orgs seem primarily to lack specific forms of domain expertise, as well as well-defined project ideas/teams, what would an EA Triplebyte have to achieve?
They’d need to be able to interface with EA orgs and identify the specific forms of domain expertise that are required. Then they’d need to be able to go out and recruit those experts, who might never have heard of EA, and get them interested in the job. They’d be an interface to the expertise these orgs require. Push a button, get an expert.
That seems plausible. Triplebyte evokes the image of a huge recruiting service meant to fill cubicles with basically-competent programmers who are pre-screened for the in-house technical interview. Not to find unusually specific skills for particular kinds of specialist jobs, which it seems is what EA requires at this time.
That sort of headhunting job could be done by just one person. Their job would be to do a whole lot of cold-calling, getting meetings with important people, doing the legwork that EA orgs don’t have time for. Need five minutes of a Senator’s time? Looking to pull together a conference of immunologists to discuss biosafety issues from an EA perspective? That’s the sort of thing this sort of org would strive to make more convenient for EA orgs.
As they gained experience, they would also be able to help EA orgs anticipate what sort of projects the domain experts they’d depend upon would be likely to spring for. I imagine that some EA orgs must periodically come up with, say, ideas that would require some significant scientific input. Some of those ideas might be more attractive to the scientists than others. If an org like this existed, it might be able to tell those EA orgs which ones the scientists are likely to spring for.
That does seem like the kind of job that could productively exist at the intersection of EA orgs. They’d need to understand EA concepts and the relationships between institutions well enough to speak “on behalf of the movement,” while gaining a similar understanding of domains like the scientific, political, business, philanthropic, or military establishment of particular countries.
An EA diplomat.
Great thoughts, ishaan. Thanks for your contributions here. Some of these thoughts connect with MichaelA’s comments above. In general, they touch on the question of whether or not there are things we can productively discover or say about the needs of EA orgs and the capabilities of applications that would reduce the size of the “zone of uncertainty.”
This is why I tried to convey some of the recent statements by people working at major EA orgs on what they perceive as major bottlenecks in the project pipeline and hiring process.
One key challenge is triangulation. How do we get the right information to the right person? 80000 Hours has solved a piece of this admirably, by making themselves into a go-to resource on thinking through career selection from an EA point of view.
This is a comment section on a modestly popular blog post, which will vanish from view in a few days. What would it take to get the information that people like you, MichaelA, and many others have, compile it into a continually maintained resource, and get it into the hands of the people who need it? Does that knowledge have a shelf life long enough to be worth compiling, yet general enough to be worth broadcasting, and that is EA-specific enough to not be available elsewhere?
I’m primarily interested here in making statements that are durably true. In this case, I believe that EA grantmakers will always need to have a bar, and that as long as we have a compelling message, there will consequently always be some people failing to clear it who are stuck in the “zone of uncertainty.”
With this post, I’m not trying to tell them what they should do. Instead, I am trying to articulate a framework for understanding this situation, so that the inchoate frustration that might otherwise result can be (hopefully) transmuted into understanding. I’m very concerned about the people who might feel like “bycatch” of the movement, caught in a net, dragged along, distressed, and not sure what to do.
That kind of situation can produce anger at the powers that be, which is a valid emotion. However, when the “powers that be” are leaders in a small movement that the angry person actually believes in, it could be more productive to at least come to a systemic understanding of the situation that gives context to that emotion. Being in a line that doesn’t seem to be moving very fast is frustrating, but it’s a very different experience if you feel like the speed at which it’s moving is understandable given the circumstances.
Good thoughts. I think this problem decomposes into three factors:
Should there be a bar, or should all EA projects get funded in order of priority until the money runs out?
If there’s a bar, where should it be set, and why?
After the bar is set, when should grantmakers re-examine its underlying reasoning to see if it still makes sense under present circumstances?
My argument actively argues that we should have a bar, is agnostic on how high the bar should be, and assumes that the bar is immobile for the purposes of the reader.
At some point, I may give consideration to where and how we set the bar. I think that’s an interesting question both for grant makers and people launching projects. A healthy movement would strive for some clarity and consensus. If neophytes could more rapidly gain skill in self-evaluation relative to the standards of the “EA grantmaker’s bar,” without killing the buzz, it could help them make more confident choices about “looping out and back” or persevering within the movement.
For the purposes of this comment section, though, I’m not ready to develop my stance on it. Hope you’ll consider expanding your thoughts in a larger post!
I agree, I should have included “or a safe career/fallback option” to that.
My sense is that Triplebyte focuses on “can this person think like an engineer” and “which specific math/programming skills do they have, and how strong are they?” Then companies do a second round of interviews where they evaluate Triplebyte candidates for company culture. Triplebyte handles the general, companies handle the idiosyncratic.
It just seems to me that Triplebyte is powered by a mature industry that’s had decades of time and massive amounts of money invested into articulating its own needs and interests. Whereas I don’t think EA is old or big or wealthy enough to have a sharp sense of exactly what the stable needs are.
For a sense of scale, there are almost 4 million programmers in the USA. Triplebyte launched just 5 years ago. It took millions of people working as programmers to generate adequate demand and capacity for that service to be successful.
All in all, my guess is that what we’re missing is charismatic founder-types. The kind of people who can take one of the problems on our long lists of cause areas, turn it into a real plan, pull together funding and a team (of underutilized people), and make it go.
Figuring out how to teach that skill, or replace it with some other foundation mechanism, would of course be great. It’s necessary. Otherwise, we’re kind of just cannibalizing one highly-capable project to create another. Which is pretty much what we do when we try to attract strong outside talent and “convert” them to EA.
Part of the reason I haven’t spent more time trying to found something right off the bat is that I thought EA could benefit more if I developed a skillset in technology. But another reason is that I just don’t have the slack. I think to found something, you need significant savings and a clear sense of what to do if it fails, such that you can afford to take years of your life, potentially, without a real income.
Most neophytes don’t have that kind of slack. That’s why I especially lean on the side of “if it hurts, don’t do it.”
I don’t have any negativity toward the encouragement to try things and be audacious. At the same time, there’s a massive amount of hype and exploitative stuff in the entrepreneurship world. This “Think of the guy who wrote Winzip! He made millions of dollars, and you can do it too!” line that business gurus use to suck people in to their self-help sites and Youtube channels and so on.
The EA movement had some low-hanging fruit to pick early on. It’s obviously a huge win for us to have great resources like 80k, or significant organizations like OpenPhil. Some of these were founded by world-class experts (Pete Singer) and billionaires, but some (80k) were founded by some young audacious people not too far out of grad school. But those needs, it seems to me, are filled. The world’s pretty rich. It’s easier to address a funding shortfall or an information shortfall, than to get concrete useful direct work done.
Likewise in the business world, it’s easier to find money for a project and outline the general principles of how to run a good business, than to actually develop and successfully market a valuable new product. There’s plenty of money out there, and not a ton of obvious choices to spend it on. Silicon Valley’s looking for unicorns. We’re looking for unicorns too. There aren’t many unicorns.
I think that the “EA establishment’s” responsibility to neophytes is to tell them frankly that there’s a very high bar, it’s there for a reason, and for your own sake, don’t hurt yourself over and over by failing to clear it. Go make yourself big and strong somewhere else, then come back here and show us what you can do. Tell people it’s hard, and invite them back when they’re ready for that kind of challenge.
Triplebyte’s value proposition to its clients (the companies who pay for its services) is an improved technical interview process. They claim to offer tests that achieve three forms of value:
More predictive of success-linked technical prowess
Convenient (since companies don’t have to run the technical interviews themselves)
If there’s room for an “EA Triplebyte,” that would suggest that EA orgs have at least one of those three problems.
So it seems like your first step would be to look in-depth at the ways EA orgs assess technical research skills.
Are they looking at the same sorts of skills? Are their tests any good? Are the tests time-consuming and burdensome for EA orgs? Alternatively, do many EA orgs pass up on needed hires because they don’t have the short-term capacity to evaluate them?
Then you’d need to consider what alternative tests would be a better measurement of technical research prowess, and how to show that they are better predictive of success than present technical interviews.
It would also be important to determine the scale of the problem. Eyeballing this list, there’s maybe 75 EA-related organizations. How many hires do they make per month? How often does their search fail for lack of qualified candidates? How many hours do they spend on technical interviews each time? Will you be testing not for EA-specific for general research capacity (massively broadening your market, but also increasing the challenge of addressing all their needs)?
Finally, you’d need to roll that up into a convenient, trustiworthy and reliable package that clients are excited to use instead of their current approach.
This seems like a massive amount of work, demanding a strong team, adequate funding and prior interest by EA orgs, and long-term commitment. It also sounds like it might be really valuable if done well.
Figuring out how to give the right advice to the right person is a hard challenge. That’s why I framed skilling up outside EA as being a good alternative to “banging your head against the wall indefinitely.” I think the link I added to the bottom of this post addresses the “many paths” component.
The main goal of my post, though, is to talk about why there’s a bar (hurdle rate) in the first place. And, if readers are persuaded of its necessity, to suggest what to do if you’ve become convinced that you can’t surpass it at this stage in your journey.
It would be helpful to find a test to distinguish EAs who should keep trying from those who should exit, skill up, and return later. Probably one-on-one mentorship, coupled with data on what sorts of things EA orgs look for in an applicant, and the distribution of applicant quality, would be the way to devise such a test.
A team capable of executing a high-quality project to create such a test would (if I were an EA fund) definitely be worthy of a grant!