Ingredients for creating disruptive research teams
This post tries to answer the question of what qualities make some research teams more effective than others. I was particularly interested in learning more about “disruptive” research teams, i.e. research teams that have an outsized impact on (1) the research landscape itself (e.g. by paving the way for new fields or establishing a new paradigm), and/or (2) society at large (e.g. by shaping technology or policy). However, I expect the conclusions to be somewhat relevant for all research teams.
Research seems to have become increasingly important within the effective altruism community. In the past few years, GPI was founded, FHI started growing significantly, and Open Phil is expanding its research capacity. Will MacAskill even called effective altruism a “research program”. From this perspective, we should be both interested in creating new fields of research, or at least substantially influencing existing ones, as well as impacting society.
I did some of the research presented here as part of my work at the Berlin-based Effective Altruism Foundation (EAF), a research group and grantmaker dedicated to preventing suffering in the long-term future. Thanks to Jonas Vollmer, Jan Dirk Capelle, Max Daniel, and Alfredo Parra for valuable comments on an early draft of this post.
I looked at the two most comprehensive and rigorous academic studies on productive research teams I could find after a shallow review of the available literature (one literature review, Bland & Ruffin (1992), and one meta-analysis, Hülseger, Anderson & Salgado (2009)). Unfortunately, I could not find similarly comprehensive studies of disruptive research teams in particular. I complemented this with seven case studies of research teams I picked based on my own non-systematic judgment that they have been particularly disruptive. These are the RAND Corporation, the Sante Fe Institute, the Palo Alto Research Center (PARC), Bell Labs, Skunk Works, the Los Alamos Laboratory, and the partnership of Kahneman & Tversky.
The following are my key findings based on this research:
Particularly disruptive research teams always seem to contain a significant number of excellent researchers and even those who are not brilliant are very capable. Teams seem to benefit from cognitive diversity but not demographic diversity.
Disruptive research teams seem to benefit from a purposeful vision that describes the kind of change they want to affect in the world. While more concrete goals are probably helpful, they seem difficult to set in this context.
Leaders likely have an outsized impact on how productive and disruptive a research group is. In almost all cases, relevant research expertise seems to required for such a role. For some teams, a second administrative leadership role seems to be helpful for securing resources and managing external relationships.
Research teams seem more likely to realize their full disruptive potential if the researchers do not have to do anything but research and have easy access to all the resources they need.
Individual researchers in disruptive teams seem to thrive when given a large degree of autonomy, i.e., when they’re allowed to pursue projects and collaborations as they see fit. Instead of imposing metrics or incentives, it seems to work best to give them considerable freedom to work outside of usual incentive structures.
To facilitate internal communications outside of formal structures, teams seem to benefit from shared spaces that allow for these exchanges to occur. Establishing a shared physical space that encourages interaction seems to be most important.
Psychological safety, i.e., the feeling that voicing controversial ideas or dissent will not cause abandonment or loss of status, seems to be an important factor for making interactions between researchers particularly fruitful. It’s not clear to me how exactly this can be achieved.
Disruptive research teams seem to be fairly small, probably such that team members still know each other sufficiently well for them to feel comfortable voicing controversial ideas and dissent. I’d suspect this to be less than 15 people, but would not be very surprised if this was number was around 100 after all.
Having and executing an impactful theory of change for how research findings translate into real-world impact seems to be important.
High-quality communication with external stakeholders seems to matter. High-intensity in-person exchange with people working on similar problems seems to be most valuable.
Once salaries reach market rate, praise, recognition, and perceiving one’s work to have an impact seem to be more relevant as rewards.
After discussing the key findings, I go on to list learnings for my own organization and review the considered evidence in more detail.
When talking about “teams”, I’m referring to a somewhat independent functional unit with distinct leadership, vision, and research direction. So such a team might well exist within a larger (research) organization. For instance, a university would not count as a research team whereas a particular lab would.
Academic literature on disruptive research teams
After some search on google scholar, I found there to be little academic research on disruptive research teams, let alone comprehensive reviews. There is a recent paper by Wu, Wang & Evans which presented solid evidence that disruptive teams tend to be smaller. This is in line with my impression of similar previous research, mainly in the field of disruptive innovation. However, they don’t cite other work on disruptive research teams.
Faced with this lack of research on disruptive research teams, I broadened the scope to high-performance research groups in general. This is either operationalized with subjective ratings or metrics like paper and patent count, which is sometimes adjusted for impact. So it might still capture some of the disruptive nature I was interested in. I set out to find the most relevant meta-studies in the field to save time and focus on the most robust results. I first looked at the studies Max Dalton covers in his literature view and expanded from there by looking at the references of these papers. I also performed a few searches on google scholar to make sure I hadn’t missed anything. Ultimately, I settled on focusing on two studies:
Bland & Ruffin (1992): Characteristics of a Productive Research Environment. Literature Review (“BR” from now on). They looked at teams across sectors and fields. Based on a literature search for articles on productive research teams in relevant journals between 1963 and 1990, they included ~80 studies in their review. The resulting list of relevant factors was iteratively compiled from scratch (see relevant section for more details).
Hülseger, Anderson & Salgado (2009): Team-Level Predictors of Innovation at Work. A Comprehensive Meta-Analysis Spanning Three Decades of Research (“HAS” from now on). They investigated innovation in the workplace at the team level, pre-selecting 15 variables as potential factors. Based on a literature search, they ran a meta-analysis on 104 independent studies over 30 years from before April 2007: N 50,096 (see relevant section for more details).
While BR is fairly old, Bill Dunn from the Oxford Learning Institute claims that their findings have held up since then. HAS is more recent, but they looked at innovation for teams in general (in work contexts) instead of dedicated research teams. This makes the study less applicable for the purpose of this investigation. I still included it because it’s very comprehensive and I expect the lessons to be transferable to a significant extent.
Even after doing this work, I still don’t feel very well versed in this field of study and might well have missed something. I have the general sense that studying groups is hard. Many constructs seem fuzzy to me and it’s difficult to run experiments in this context. So this field does not seem particularly reliable to me.
Case studies of disruptive research teams
I included case studies for several reasons. The academic literature I could find was not concerned with disruptive teams in particular. With case studies, I could pick out teams whose research specifically either pioneered fields or profoundly changed society through their work. There might be differences between the very best groups and merely good groups. Meta-analyses, in particular, are at risk of overemphasizing aspects which are easy to quantify.
There are downsides to relying on case studies which the reader should be aware of:
Very small sample size, i.e. anecdotal evidence;
There is no control group, so it’s hard to tell which factors are causal, or at least independent (might be the case for academic studies as well);
It’s hard to determine and isolate the relevant factors, especially since the most important characteristics might not lend themselves to a good narrative.
My selection was not very systematic and mostly involved asking people which research groups they thought have been influential, looking into how new research fields had been pioneered, and reading books which seemed to focus on this topic. Organizing Genius by Bennis and Biedermann, in particular, was a helpful starting point. This approach likely selected for teams with particularly great (and visible) societal effects. This struck me as acceptable, given that I was looking for teams with a profound influence on society. It likely excludes those with less traceable effects.
Ultimately, I ended up looking into the following seven groups: the RAND Corporation, the Santa Fe Institute, Xerox PARC, Bell Labs, Lockheed’s Skunk Works, the Los Alamos Laboratory, and the collaboration of Kahneman & Tversky.
RAND Corporation: Founded in 1948, they started out as a national security think tank. Their work directly shaped US nuclear strategy and their research pioneered fields like systems analysis and rational choice theory.
Sante Fe Institute: Founded in 1984 they’re an independent research institute which pioneered the multidisciplinary study of complex adaptive systems.
Palo Alto Research Center (PARC): Started by Xerox in 1970, they pushed personal distributed computing into the form we know it today in the course of fewer than ten years, mainly by developing the Xerox Alto. I focused on the Computer Science Lab (CSL) in particular.
Bell Labs: Bell Labs served as the research lab of AT&T during their monopoly years as the sole telephone network provider, mainly between 1925 and 1982. Among their many innovations: the transistor, information theory, the laser, Unix, C, C++. I focus on the basic science division of Bell Labs. However, I’m torn whether to count them as a single team. While the different units of the division differed substantially in their approaches and focus areas, they were still united by the vision of Bells Labs.
Skunk Works: Founded in 1943 by Kelly Johnson as the special projects division of Lockheed, they developed extremely advanced aircraft over decades that has provided the US with crucial strategic advantages (mainly via intelligence gathering and stealth).
Los Alamos Laboratory (Project Y): This secret lab was established by the Manhattan Project to design and build the first atomic bombs. They finished three bombs within just 30 months, developing two distinct designs. I’m conflicted whether to count the Los Alamos Laboratory as a single team or not. They did have a singular goal, leadership, and vision. However, the two bomb designs were developed and built by distinct teams.
Kahneman & Tversky: These two Israeli psychologists successfully collaborated over several decades and through their work challenged the rational choice model of human behavior, kickstarting behavioral economics among other paradigms.
This selection is to some extent idiosyncratic, but I think there is a good case for the inclusion of each group in terms of accomplishments. It’s more likely that I overlooked groups which deserve to be included. Other groups I considered but didn’t include are the Cowles Foundation, the Institute for Advanced Study, the MIT Lincoln Laboratory, the Cavendish Laboratory, the Institut des hautes études scientifiques, and the MIT Media Lab. The main reason for their exclusion was the lack of available sources on the inner workings of these institutions. It’s entirely possible that there are great primary sources, interviews, or similar distributed resources that one might be able to use to expand on this post.
I didn’t integrate these two strands very systematically. I drew out the relevant lessons from each part and tried to synthesize them. I indicate what resources my conclusions are based on.
In this section, I list the ingredients for disruptive research teams that I believe matter based on the research I’ve done. I don’t discuss in detail factors which strike me as irrelevant. While there might be additional ingredients which I haven’t identified, I do think I cover the most important levers. Instead, I believe that vague categories are the most likely flaw of my analysis since they might allow each reader to simply fill them with their own preconceived ideas of what they mean. I have attempted to make each item as concrete as possible, but this is very hard when aggregating evidence from diverse sources, case studies in particular. I considered adding subjective credences for each section, but I concluded that doing so would have led readers to assume more precision than is actually there, given the fuzzy nature of many of the factors.
I do not make the strong claim that all particularly disruptive research teams combine all of the ingredients I list to a very large extent. Rather, I make the weaker claim that the best disruptive research teams likely combine many of these ingredients to a significant degree. I also discuss their relative importance where I think it matters. Further, I don’t claim that a research team with these ingredients will be among the best disruptive teams. I think it’s likely that outside factors like luck and timing play a significant role.
This list is compiled from the perspective of trying to create disruptive research teams as opposed to trying to identify disruptive research teams. I believe that these tasks are subtly different, and require slightly different instruments. So whenever possible I tried to identify the underlying factors, as opposed to the surface level appearances. For instance, BR list “distinctive culture” as an important factor and all teams I studied did seem to share a special atmosphere, reminiscent of early-stage start-ups and similar to the description HAS provide for “task orientation”: They were all fairly exclusive and isolated groups, sometimes even secret, which bred the sense of being on a special and important mission. Often, they developed their own idiosyncratic terms and rituals. They all shared extremely high standards for their work, a commitment to having the best idea prevail, a joint sense of ownership for the entire project, and an openness to counterintuitive and weird ideas. The teams also seemed to have enjoyed themselves immensely despite the extremely hard work. However, I think that’s largely an epiphenomenon of all the other factors being in place. So I decided not to list it.
The factors I do list almost certainly interact with each other. Where I have a particular reason to believe that they do so in some specific way, I have tried to point this out throughout the text.
You can compare my list of ingredients to the list drawn up by BR (see relevant section for more details) and the list distilled by Bennis and Biederman in Organizing Genius (see appendix). There are significant similarities.
Particularly disruptive research teams always seem to contain a significant number of excellent researchers and even those who are not brilliant are still extraordinarily capable. Teams seem to benefit from cognitive diversity but the evidence for demographic diversity is limited.
In the case study accounts, the individual capability of the researchers was emphasized again and again. In the words of Bob Taylor (PARC): “Never hire ‘good’ people because ten good people together can’t do what a single great one can do.” Leaders of these groups sought to recruit the best people in their field and were often themselves very capable individuals. At a time when air travel was very expensive, they often flew around the country to persuade particular individuals to join their team. RAND, the Santa Fe Institute, Bell Labs, PARC cultivated relationships with the best departments in the country and organized events to scout talent. This emphasis on hiring the best is also mentioned in BR (as part of “Concentration on recruitment and selection”). HAS did not investigate this factor, so can’t tell us much here. I’m aware that there is some scholarly debate on whether group intelligence is mainly determined by factors governing the interaction of team members or individual intelligence. On my very shallow reading of the evidence, I could go either way. What stood out most to me is that the experimental tasks used in these studies (footnote 4) seem fairly easy and not representative of the challenges faced by disruptive research teams. So I think they do not give a lot of reason to update my prior judgment that individual intelligence matters a lot once tasks become sufficiently hard. Given this and the evidence from the case studies, I’m very confident that even perfect group interaction cannot make up for individual capability below a certain level. It’s not clear to me where this level is exactly, but I have a hard time believing it’s below the 80th percentile of Ph.D. holders in a field like physics; presumably much higher for less rigorous fields.
I have not looked in detail at the evidence on what exact capabilities make for an excellent member of disruptive research teams. Based on my impressions from reading about these groups, high general mental ability seems to matter a lot. Other qualities might be extraordinary curiosity and willingness to collaborate. Bennis claims they tend to be “deep generalists” as opposed to specialists. This fits my impression from the case studies. Often, they seem to be young and optimistic.
There is some evidence that cognitive diversity across team members matters for disruptive teams. HAS find a modest correlation of job-relevant diversity on team-level innovation (ρ=.240, 95% CI=[.044, .436]). BR seem to agree that this is a beneficial factor. I also had this impression from the Santa Fe Institute in particular. I’m uncertain with regard to the other teams though. So overall I’m not very confident in this conclusion. Demographic diversity does not seem to be very important. Neither HAS nor BR find a positive relationship. The case studies support the opposite conclusion if anything. However, one has to bear in mind the structural disadvantages faced by women and minorities at the time, in particular in the professions and environments that these groups operated in. So I’m not drawing any conclusions from this fact.
Disruptive research teams seem to benefit from a clear vision that describes the kind of change they want to affect in the world. My best guess is that it’s not enough to put bright people in a room together; they need a joint purpose, and likely they ended up in that same room because of that.
At the highest level, a shared purposeful vision gives direction to the group. Here are a few example visions from the case studies:
“Find out how to achieve and maintain US supremacy by implementing an appropriate nuclear strategy.” (RAND)
“Build the office of the future.” (PARC)
“Make the telecommunications network better or cheaper.” (Bell Labs)
Since visions tend to be very abstract, they can be complemented by more concrete goals. However, I suspect that this is likely more difficult for disruptive teams focused on making new discoveries.
The evidence is clear that a clear vision catalyzes innovation. Both BR and HAS find that vision is important for productive research teams. HAS find the strongest correlation of all factors they looked at (ρ=.493, 95% CI=[.355, .631]), with the main effect on the level of the team, not the individual. Every research team I looked at also had a strong vision. Almost all of them had anecdotes about the effects of the vision on individual team members, both before and after joining the team. It also fits my intuition about effective teams in general. The Institute for Advanced Study (IAS) might be a counterexample. They literally just put the brightest minds of the generation into the same building, and they still seem to have a lot of insights. I vaguely remember the lack of shared vision being put forward as a criticism of the IAS, but I could not find the source for this.
A vision seems to serve multiple functions. First, it seems to coordinate the efforts of everybody on the team. While each member works very autonomously, the vision should serve as the yardstick for deciding which questions or projects are worthwhile. Second, it instills a sense of purpose in each team member, i.e. it should make clear to everybody why the work and “sacrifice” is worth it. Third, and relatedly, it serves as a recruitment tool, attracting those who want to make that vision a reality.
I’m uncertain what exactly makes for a compelling vision. My impression is that prosocial is better than not prosocial, concrete is better than vague, and urgency helps. I’d expect the extensive literature on setting a vision and mission from a nonprofit or business perspective to be at least somewhat helpful for answering this question.
When it comes to concrete organization-wide goals, the evidence is mixed. HAS find a modest correlation for goal interdependence, the degree to which team members have to rely on each other for achieving their goals (ρ=.276, 95% CI=[.118, .434]). BR mention this when discussing the factor “clear goals that serve a coordinating function”. The case studies give a mixed impression: Skunk Works and Los Alamos Laboratory clearly would have scored very high on this dimension. PARC, RAND, and the Santa Fe Institute might have done so for some of its projects. I find this hard to answer for Bell Labs and Kahneman & Tversky. Overall, the evidence seems to suggest to me that this type of goal-setting is beneficial when feasible but not as important as a vision and it might simply not be possible for blue-skies research. Engineering challenges seem to be probably particularly well-suited for this. Perhaps group projects with very concrete output goals (and deadlines), e.g., drafting a comprehensive research agenda, can still capture some of these benefits.
Leadership likely has an outsized impact on how productive and disruptive a research group is. BR list leadership as an important factor, pointing out that leaders have a large influence on many other group factors, including many listed here. So to the extent that these factors matter and are a function of leadership, leadership matters. This makes a lot of intuitive sense and is echoed in the case studies. The leaders received a lot of attention and were credited to a significant extent with the founding and success of the group. I should note, however, that individual stories obviously lend themselves particularly well to the narrative style of the case study accounts. HAS did not investigate this factor, so doesn’t provide any evidence either way. My conclusion that leaders to exert significant influence is driven to a large extent by the intuitive plausibility with the evidence supporting this view.
Leaders need to be able to shape the factors listed here in a positive way. According to BR, the best leaders have a research background, even if they don’t do a lot of research themselves in their leadership role. This seems to have several advantages: earning them the necessary respect from the team; allowing them to give substantive support to their research staff; utilizing their research network for recruitment and collaborations; articulating a compelling vision. This fits with the case studies I looked at. Bob Taylor at PARC was the only leader who did not have any research experience. While he still seems to have been steeped in the research, I read repeatedly that his lack of research experience did hold him back.
I’m very uncertain about other beneficial qualities or skills. Some of the case studies stressed the importance of leadership when it comes to mediating conflict. Bob Taylor at PARC apparently was very good at this. He would often encourage team members to move from what he called “Type 1 disagreements” to “Type 2 disagreements”. In the latter, each side passes the Ideological Turing Test of the other side. At the Los Alamos Laboratory, Robert Oppenheimer successfully mediated many interpersonal conflicts which were jeopardizing the success of the mission. My impression is that such conflicts might be more common in disruptive research teams because they tend to attract individuals who are disagreeable and the groups themselves tend to be high-intensity environments. The importance of this seems to depend on the specific team in question to a large extent.
For groups which are part of some larger organization, in particular one which is not obviously aligned with the goals of the group, a second administrative leader can apparently be important. Their role is to continuously defend the relative independence of the research team from the larger organization, secure resources, and fend off the creeping bureaucracy. They build connections necessary for disseminating the research. This is solely based on some of the case studies. Examples are Leslie Groves at the Manhattan Project, Jerry Elkind at PARC, and Frank Collbohm (together with Curtis LeMay) at the RAND Corporation. Overall, this strikes me as a useful division of responsibility in some cases.
It seems very, very plausible that a research team is more likely to realize its full disruptive potential if the researchers do not have to do anything but research. So it’s important that the resources necessary for any research projects are abundant and easily accessible. That includes technical and operational support staff as well as learning opportunities. Researchers should be freed from trivial or bureaucratic tasks as much as possible. That being said, all of this probably won’t transform a mediocre group into an excellent one.
This makes a lot of prima facie sense and is backed up by the evidence as well as my impression of academics complaining about having to spend time on bureaucracy, teaching, writing grant applications and reports. BR list “accessible resources” and “assertive participative governance” as important factors, the latter referring to researchers having significant say over the rules that govern them. HAS find a moderate to strong correlation for “support for innovation” which includes both material and immaterial support (ρ=.470, 95% CI=[.407, .533]). It was also a theme of the case studies I looked at: while the environment that the groups worked in was often not very comfortable, they still had everything they needed or sought out ingenious ways of getting what they needed. RAND decided to open their office 24⁄7 to accommodate different working schedules. The team at PARC built their own PDP-10 when Xerox prevented them from buying one. Leslie Groves worked tirelessly to get all the equipment that the team in Los Alamos needed, including expensive early computers. It’s perhaps worth noting that Feynman has criticized this kind of freedom in the context of the Institute for Advanced Study (IAS). However, I don’t give this much weight without having looked at the IAS in more detail. There is also the general notion that constraints stimulate creativity but the overall evidence seems to point in the other direction.
Autonomy & self-organization
Within the constraints of the vision (and organizational goals), individual researchers in disruptive groups seem to thrive when given a large degree of autonomy. That means there is little formal organizational structure and they’re allowed to work and collaborate in a largely self-directed manner. Instead of having metrics or incentives, it seems to work best to encourage researchers to set their own goals and to give them considerable freedom to work outside of usual incentive structures like “publish or perish” or quick commercialization.
Collaboration seems to happen organically as opposed to being imposed from the top. So researchers are usually not restricted to work on particular projects, but given the freedom to flock to the most interesting and important work. Correspondingly, the hierarchy of the team is often flat, with all researchers reporting to the leader of the team. This seems to be most easily implemented by sufficiently small teams (see section on small team size).
BR list “decentralized organization” and “assertive participative governance” as important factors. The former refers to flat hierarchies and self-organization through peer interaction. The latter refers to researchers being involved in organizational decision-making. “Decentralized organization” in particular seems to support my characterization. Out of the concepts investigated by HAS “task orientation” comes closest to what I have in mind. Among other things, it is supposed to measure “task reflexivity[,] which refers to the process in which the team reflects upon the team’s objectives, strategies, and procedures”. However, this is only tangentially related and task orientation includes a few other constructs, so I only consider this very, very weak evidence.
At first glance, the case studies paint an ambiguous picture. The Santa Fe Institute and PARC were non-hierarchical and self-organized in the way I described above. I’m uncertain about RAND. Researchers seemed to have had a lot of freedom to accept and reject assignments and could work on things they generally thought interesting. There was some department structure from what I can tell. Overall, my impression is that they had a large degree of autonomy. Bell Labs, Skunk Works, and Los Alamos Laboratory had fixed teams and hierarchies. However, Bell Labs and Los Alamos Laboratory both had over 3,000 employees, making some kind of hierarchy inevitable. Still, Bell Labs famously gave lots of freedom to its basic science teams to pursue research as they saw fit. They also encouraged cross-project collaboration from what I can tell. So viewed at the right level, Bell Labs still exemplified this attribute to some extent. Los Alamos Laboratory and Skunk Works were both engineering projects with very tangible goals that required a lot of coordination to be achieved. My best guess is that a hierarchical structure makes more sense in such cases. Flat hierarchies also seem to prevent status conflicts between members, something that Los Alamos Laboratory and Bells Labs often struggled with. One account of the Santa Fe Institute claims that one downside of this approach is that responsibility is sometimes diffused which can lead to delays or suboptimal processes. Overall, my assessment is that disruptive research teams tend to do better with flat hierarchies and a lot of autonomy if they are sufficiently small.
Metrics, goal-setting, and incentives
I did not find much evidence on the merit of metrics and individual goal-setting. According to BR, goals are useful and best set by researchers themselves in line with the vision and in collaboration with leadership and peers. However, they also point out that excessive autonomy can be detrimental, for junior researchers in particular. They describe this as a state in which the organization and its leadership provide no organizational goals or research direction. No case study account mentioned metrics or said anything on goal-setting. It’s noteworthy, however, that PARC leadership demanded exemption from commercialization pressures for at least five years from Xerox. They argued that short-term incentives would prevent work on more disruptive innovations. Instead, they wanted the lab to be evaluated after ten years. My impression is that academic tenure serves a similar function. It exempts academics from the pressure to conform and deliver results quickly (“publish or perish”) such that they are free to work on ideas that are controversial or take a lot of time to develop. After briefly looking for more evidence, the only source on this question I could find seems to suggest that metrics or incentives are not suitable for basic research. In some cases, individual goals or milestones might still make sense though. This also summarizes my all-things-considered view fairly well.
Spaces for interaction
Despite the importance of autonomy, there is good evidence that internal communication is beneficial for productive research teams. BR list access to human resources and “frequent communication”, which includes internal and external communication, as important factors. HAS also find a moderate correlation for “internal communication” (ρ=.358, 95% CI=[.228, .488]). I got a similar impression from the case studies. Interactions seem to have been important for improving ideas, if not for having them (also see section on small team size). The teams spent a lot of time together talking about their research, also in informal settings. It’s hard to tell how helpful this really was, but it seems plausible.
I’m not sure what the right kinds of interactions are. Since collaboration and exchange are not imposed (see section on autonomy & self-organization), the right interactions are supposed to emerge organically anyway from what I can tell. So groups need to create spaces that allow for these exchanges to occur. The most basic ones are a shared physical space and shared “psychological spaces”. I’m less convinced of policies designed to force interactions but they might be helpful in some cases.
Shared physical space
It seems to be important that researchers share the same physical space. Many of the groups I looked at put emphasis on having their group in the same location, even designing the office space such that researchers were more likely to interact with one another (e.g. RAND, Bell Labs, Santa Fe Institute). BR also find that physical proximity likely plays a role. I want to note that all this evidence comes from a time when electronic communication was far less advanced. However, I do think that physical proximity has benefits that are extremely hard to recreate electronically, serendipitous meetings and exchanges for instance.
Shared “psychological spaces”
In addition to physical proximity, there are also several hints that point toward the benefits of shared “psychological spaces” for lack of a better term. BR’s emphasize interactions with “conceptually close” peers, i.e., people who inhabit a similar intellectual space. HAS find that goal interdependence, which refers to the degree that team members rely on each other for achieving their goals, also correlates moderately with innovation (ρ=.276, 95% CI=[.118, .434]). They hypothesize that this interdependence leads to engagement, collaboration, and mutual feedback, which in turn stimulates idea generation. Overall, I find it plausible that interactions are more likely if researchers are intellectually or psychologically entangled in some way. However, I’d expect this to follow naturally to a large extent simply from having a shared vision.
There are also examples of concrete policies for facilitating interactions: Bell Labs had an “open door” policy, i.e., it was expected of even the most senior researchers to engage with others if they had questions or required their expertise. Gertner points to several instances where this led to ideas that otherwise would not have happened. PARC had their famous “Dealer Meeting” each Tuesday morning. Bob Taylor would start with 15 minutes of housekeeping, followed by 45 minutes led by either an internal researcher or a guest. They could use this time however they saw fit, hence the name. Usually, somebody would present an idea and invite feedback. This was the only mandatory meeting at PARC and the accounts I read claim that this provided space for fruitful idea exchange and feedback. Overall, the evidence remains anecdotal. Such policies are probably useful in some cases and, therefore, worth experimenting with.
Psychological safety describes the feeling that voicing controversial ideas or dissent will not cause abandonment or loss of status. This seems to be an important factor for making interactions between researchers particularly fruitful. Unfortunately, it’s not clear to me how exactly it is created.
There are several findings that gesture in this direction. BR seem to include this in their construct of “positive group climate” which they list as an important factor. HAS investigated “participative safety”, which includes “intragroup safety” (very closely related to psychological safety) as one of two subcomponents, and find a small but significant correlation, which does not seem to generalize well across contexts though (ρ = .148, 80% Credibility Interval = [–.113, .410], 95% Confidence Interval = [.080, .216]). They don’t have data on intragroup safety specifically. HAS also investigated cohesion which they describe as “interpersonal attraction, task commitment, and group pride”. They hypothesize that this attachment to the group also contributes to the psychological safety necessary for innovation. For this construct, they find a moderate correlation with innovation (ρ=.307, 95% CI=[.179, .435]). So the academic evidence seems to support psychological safety as an important factor. This is echoed in the case studies which often described the groups as very tight-knit and marked by deep feelings of mutual respect (e.g. RAND, PARC, Kahneman & Tversky, Los Alamos Laboratory).
This open and safe space can also lead to conflict though. Some of the groups I looked at seem to have had very harsh intellectual atmospheres. Ideas would be publicly eviscerated and subjected to grueling criticism (e.g., at PARC and RAND). However, HAS seems to suggest that neither task conflict (ρ=.067, 95% CI=[–.134, .268]) nor relationship conflict (ρ=–.092, 95% CI=[–.252, .068]) correlate with innovation. So it’s not clear to me what role this kind of conflict plays and how to reconcile it with psychological safety since it seems to point in the opposite direction. One explanation could be that such conflicts are helpful in teams which exhibit deep mutual respect amongst their members and destructive in all other teams. Even if psychological safety and conflict cannot be reconciled, I find the evidence for psychological safety more compelling.
However, it’s not clear to me how this shared psychological state is created. I should note that I have not looked at independent evidence on this question. Based on the above evidence, group pride and public knowledge of everybody’s commitment to the shared vision might matter. I also find it intuitively plausible that psychological safety requires the self-esteem of the members not to be bound up with group membership. Otherwise, conflict might trigger insecurity and prevent members from speaking up. The case studies might also be instructive: At PARC, everybody was involved in the hiring process. Applicants had to give a talk in front of the entire team and hiring decisions were made near-unanimously. This might have created a filter that only people which had the respect of everybody could pass. There is probably also a role for leadership: At Los Alamos Laboratory, Oppenheimer is credited with making everybody feel like their work was vital to the success of the joint endeavor. At Skunk Works, Johnson, their leader, gave everybody who joined the sense that they had been handpicked and the best person for the job, instilling in them a sense of excellence.
Small team size
I’m conflicted about team size. Different factors seem to push in opposite directions. I weakly believe that the ideal team size for disruptive research teams is such that team members still know each other sufficiently well for them to feel comfortable voicing controversial ideas and dissent. I’d suspect this to be less than 15 people, but would not be very surprised if this was number was around 100 after all.
The evidence on this seems to be somewhat conflicting. BR find that research productivity tends to increase with the size of the group. HAS find that team size correlates positively with team innovation (ρ=.259, 95% CI=[.157, .360]) and weakly negatively with individual innovation (ρ=–.101, 95% CI=[–.253, .051]), the overall correlation being weakly positive (ρ=.172, 95% CI=[.078, .266]). However, neither looked at disruptive teams in particular. As I mentioned earlier, Wu, Wang & Evans present solid evidence that disruptive teams tend to be smaller. They find this at every level, i.e., teams of nine tend to be more disruptive than teams of ten, teams of two more so than teams of three, and individual researchers more so than teams of two. However, their notion of “team” refers to the number of co-authors on an article or on a patent, which is different from how I have conceptualized it so far. So they don’t capture the influence that a team, more broadly considered, might have on individual researchers. The other factors I have looked at in this post seem to suggest that there are benefits to interaction and exchange when done right.
I’m uncertain what conclusions to draw from the case studies. Members of PARC, RAND (in the early days), and the Santa Fe Institute (in the early days) seem to have cherished the intimate atmosphere of being part of a small team. The small size seems to allow for a higher level of intimacy which could be important for fruitful interactions. This was certainly also the case for Kahneman & Tversky. Skunk Works also tried to keep their team really small, in part to reduce bureaucratic overhead. Bell Labs and the Los Alamos Laboratory were much larger (>3,000 employees), and both managed to be incredibly innovative at the same time. However, as I mentioned before, it’s not clear that one should count these large organizations as single teams, Bell Labs in particular. Their units were much smaller, two to five people from what I could tell based on the sources that I read.
Overall, I would tentatively conclude that smaller teams tend to be more disruptive than larger teams. This is mainly driven by the Wu et al. study which I would weigh more heavily than BR and HAS, given that they’re more focused on disruption. However, I do not think that this evidence warrants the conclusion that researchers should work on their own without a team or that teams of two are better than teams of three. The other factors I have looked at seem to suggest that there are benefits to being part of the right team. However, I suspect these returns probably decrease at some point before turning negative when the team gets too big as indicated by the case studies. My rough model of this is the following: researchers are more likely to have disruptive ideas on their own, but benefit from the exchange with others up to the point where team size prevents the level of intimacy required for fruitful exchange and cumulative skepticism toward disruptive ideas prevents people from voicing or pursuing them. I don’t know where this point lies. Dunbar’s number might give an informed upper bound but I’d intuitively expect it to be much lower. My best guess is around 15, but that’s really just a wild guess.
Impactful theory of change
For research to have an impact on the world, it has to influence people outside of the team. Since this does not happen automatically, it’s important to develop a theory for how this will come about, a theory of change, and to execute it well. As Steve Jobs put it: “Real artists ship.” Having an innovative idea or product is not sufficient if you are unable to bring it to market, i.e., shipping it. I find it plausible that the theory of change acts as a force multiplier for the research, i.e., it may reduce its impact to 0, amplify its impact hundredfold (compared to other potential theories of change), or even turn its impact negative in some cases. Following this model, one should try to find the theory of change with the highest expected value.
For research groups, a theory of change determines what change they want to bring about: changing the paradigm in their field, improving policy, helping develop a technology? It means figuring out who they need to influence to affect that change: other academics in their field, policy-makers, companies? It means figuring out how their research will have the best chance of reaching and influencing this group: publishing academic papers in journals, giving talks at conferences, networking, sending accessible summaries to the relevant individuals with an offer to meet? Gesturing vaguely toward the marketplace of ideas is not a theory of change (see below for some examples for how the groups from the case studies succeeded or failed to bring about change or read this article by Aaron Schwartz).
For some situations there exist theory of change templates. For instance, for an academic group trying to influence their field, the most prestigious journals and conferences of the field are probably their best bet. This does not require elaborate planning. I should note, however, that for disruptive research the case might not be as straightforward even in this case since the old paradigm usually does not go quietly. For many situations, it is not as clear-cut to begin with and it probably makes sense to develop a clear theory of change. For research teams dedicated to effective altruism, this is probably even more important since they presumably care more about changing the world than other research groups.
I have a strong intuition that this matters a lot. It seems obvious to me that how disruptive a particular insight is, depends a lot on who learns about this insight in what way. This can differ greatly between different theories of change and their execution. The evidence is mixed. Neither BR nor HAS mention this as an important factor: HAS did not investigate it in the first place, and since BR did not start out with any hypotheses, it’s unclear why they didn’t include it. The case studies, however, paint a very clear picture. From the start, RAND wrote their reports as contractors for the US government, the military in particular. Their insights reached the right people and carried significant weight. The Santa Fe Institute purposefully set itself up as a visiting institution so that visiting scholars would spread the ideas of the SFI at their home institutions upon their return. Xerox famously failed to exploit the breakthrough technologies developed by PARC and were ultimately scooped by Apple and Microsoft. Bell Labs had an established development pipeline from basic science to development to manufacturing which translated their scientific insights into mass-produced components of the telecommunications network. Skunk Works and the Los Alamos Laboratory had clear products they were working toward on behalf of the US government. Kahneman & Tversky seem to have thought strategically about the journals in which to publish, how to frame their discoveries, and how to reach people outside their narrow field. So overall, I remain convinced that this matters.
External input and feedback
High-quality communication with external stakeholders seems to matter. Based on the case studies, high-intensity in-person exchange with people working on similar problems seems to be most valuable.
BR list communication with external stakeholders as an important factor. HAS find a moderate to strong correlation (ρ=.475, 95% CI=[.380, .570]). Unfortunately, neither make very clear what kind of communication at which frequency is desirable. The closest we get is BR favoring communication with peers at other schools or institutions (as opposed to colleagues from other departments at the same institution). The case studies provide more detail: PARC, for instance, regularly invited guests to host Dealer Meetings (see section on spaces for interaction). Despite being very secretive, RAND and the Los Alamos Laboratory had a roster of prominent advisors and consultants. They seem to have visited regularly to help with specific challenges. The Santa Fe Institute is a full-blown visiting institution with rotating scholars. Bell Labs’ open door policy facilitated personal communication between departments. Most of these seem to point toward short high-intensity in-person interactions with people working on similar problems. However, it’s plausible that other forms are simply not reported because they’re less noteworthy. Other forms of communication (e.g., video chat, voice memos, real-time chat) were probably also less accessible during the relevant times. In contrast to the other teams, Skunk Works and Kahneman & Tversky seem to have been very insular. I’m very uncertain about the latter though.
The evidence from the case studies broadly fits my model of why and how much communication with external stakeholders is valuable but I’m still not confident that this is the right view. According to this model, there is a balance to be struck between maintaining an atmosphere for counterintuitive ideas to thrive and receiving outside input and feedback from the outside (also see section on small team size). Put differently, excessive outside perspectives can smother new ideas while too little exposure to outside perspectives might stifle creativity or leave existing ideas unconstrained or unrefined. My intuition is that regular high-intensity exchange balances this trade-off best. I also suspect that the value of outside perspectives becomes less important to the extent that there is fast, unambiguous, real-world feedback: regardless of what other people thought, the planes developed by Skunk Works either took off or they didn’t; the atomic bombs either exploded or they didn’t; the transistors either worked or they didn’t.
There are also non-epistemic factors for why interaction with outside stakeholders might matter. Since they likely improve the network of the team, they may help increase the influence of one’s ideas (see section on impactful theory of change) and aid recruitment and collaboration.
Salary does not seem to matter much beyond market rate. Instead, immaterial rewards like praise and perceiving one’s work as impactful seem to matter more. BR support this view. This also fits with my impression from the case studies and my prior judgment. People who signed up for these teams were rarely interested in making a lot of money but strongly motivated to make a difference as part of a great group. They often describe this time as the most fulfilling of their entire lives, hinting at what HAS call “cohesion”. I’d expect non-monetary rewards to be a strong correlate of a purposeful vision (and culture), and, therefore, a dependent factor one cannot easily “manufacture”.
Learnings for the Effective Altruism Foundation
Based on the findings above, these are the most important takeaways for our research team at the Foundational Research Institute (FRI) as I see them:
We should continue to apply a high bar for hiring researchers. Since research on risks of astronomical suffering is still in its infancy, we cannot provide a lot of guidance and researchers must be capable to pursue relevant research autonomously and with a strong degree of self-imposed rigor. We should continue to involve all of our research staff in the hiring process, especially for research positions.
Currently, we have staff who either excel at leadership or at research but nobody who combines both skill sets. We would likely benefit significantly from such an addition to our team. Since hiring somebody like that strikes me as particularly difficult, we should probably focus on existing staff gaining more experience in one domain or the other. This has made me more confident that it was a good idea for several researchers at FRI to seek external mentorship by collaborating with other researchers, visiting other EA research organizations, and enrolling in Ph.D. programs.
We should continue to provide our research staff with as much freedom and operational support as possible. I have also become slightly less convinced that seeking academic positions or affiliations short of tenure is a good idea because the academic system seems particularly ill-suited for disruptive research, given the prevailing incentive structure and amount of obligations outside of research. Since this mainly confirms a view I already held before, it is not a strong update.
Currently, many of our researchers work remotely which seems to have higher costs than I previously thought. As a consequence, I have become more convinced that we should try to create a research office geared toward the needs of our research staff. This would make it more likely that they are willing to relocate and reap the benefits from a shared physical space.
We should invest more time into creating psychological safety for our research staff. I’m not yet sure how to best proceed here. This might involve some shallow research into the academic literature on this topic, asking other research groups about their experiences with this, speaking to our researchers about their needs in this area, or experimenting with different formats for retreats.
It was worth it to invest time into developing a theory of change, i.e., thinking about how exactly our research would lead to real-world changes when it comes to AI designs and deployment. We should consider refining our existing work in this area. I might write a dissemination framework for our research to make sure our research insights reach the right audience. This would likely inform further steps as well.
Organizing research workshops with other organizations focused on similar questions is worth it. We should also look into other formats of high-intensity in-person interaction.
In-depth look at the evidence
Bland & Ruffin (1992): Characteristics of a Productive Research Environment. Literature Review
They set out to answer the question: What environment makes research groups produce more work of higher quality? So the study is focused on group level attributes, not the characteristics of individual researchers. It’s not clear what definition of “research group” they used. They included groups from different research fields and different sectors (e.g. industry and academia).
They conducted a literature search for articles in relevant journals between 1963 to 1990 and only included sources which they rated as “good” or “excellent” based on their methodology. It’s not clear how many they excluded, but it seems like they ultimately looked at ~80 studies.
Since it’s a literature review and not a meta-analysis, they did not synthesize the studies quantitatively, but reviewed the sources and iteratively compiled a list of relevant factors. They did not start out with a list of hypothesized factors but created one from scratch.
They identify 12 group-level characteristics of high-performance research groups. Below I provide a summary:
1. Clear goals that serve a coordinating function
Organizational goals make it clear how each person can contribute with their work. They provide necessary constraints within which the individuals can autonomously and creatively pursue what they believe to be the most important questions. Personal goals, in turn, are set with organizational goals in mind and in collaboration with the leader and peers. Excessive autonomy (lack of organizational goals and collaborative goal-setting) is particularly bad for new researchers and requires strong personal motivation for researchers to maintain high productivity.
2. Research emphasis
The organization that the group is embedded in should value research.
3. Distinctive culture
In the article, it’s left vague what they mean by this. It seems to be tied up with a shared sense of purpose or mission and values, e.g. academic freedom. They emphasize that such a culture has to be proactively maintained through leadership, role modeling, and rituals. It’s probably hard to generalize since the group’s culture should be distinctive.
4. Positive group climate
This seems to be characterized by open discussions of disagreements and openness to new ideas which are debated on merit as opposed to status. They hint at mutual respect, positive relationships, and supportive behavior as requirements for that.
5. Assertive participative governance
In the article, it’s left vague what they mean by this, but they emphasize that this seems to be a particularly consistent finding. After consulting some more sources, this seems to refer to researchers being involved in decision-making on goals and policies, being allowed to self-organize to a significant extent, and encouraging new ideas and suggestions on governance.
6. Decentralized organization
There should be very few levels of hierarchy. Coordination should be achieved through clear organizational goals, participative governance, and peer interactions.
7. Frequent communication
This refers to peer-to-peer interaction within the research group and to researchers at other groups. They describe multiple modes of exchange (including informal ones), but the focus should be on the research. One study reports the most productive researchers dedicate between 25% to 40% of their time to communication.
8. Accessible resources, particularly human
Human resources are most important. This applies in particular to able, productive, and “conceptually close” peers. They can give useful feedback, suggest ideas, and set useful norms. Easy access is best maintained via physical proximity. Sufficient support staff and funding is also important such that the research staff can focus on their core work. Interestingly, they report that the perception of accessible resources matters more than actually accessible resources.
9. Sufficient size, age, and diversity of the research group
Productivity tends to increase with the size of the group. Cognitive/disciplinary diversity seems to be a positive factor.
10. Appropriate rewards
Once salaries reach market rate, praise, recognition, and perceiving one’s work to have an impact seem to be more relevant reward factors which encourage productivity.
11. Concentration on recruitment and selection
In productive groups, everybody invests time into recruitment and hiring (e.g. suggesting potential hires, investing time into vetting). They focus on hiring the best in the field.
12. Leadership with research expertise and skill in both initiating appropriate organizational structure and using participatory management practices
They report that the leader should be (perceived as) a highly skilled scientist. That makes it easier for them to support their staff, facilitate exchange with other groups, and recruit from their network. It also fosters the necessary respect from the group that makes leadership possible in the first place. Leadership is probably a particularly critical factor as its quality influence all other factors listed.
Strategies for establishing in maintaining these characteristics
They suggest several policies/actions for:
Housing group members in close proximity;
Providing mechanisms for frequent communication within and between groups;
Selecting accomplished researchers as leaders;
Training people in participative governance, since this does not come easy and people often misjudge the extent to which they practice this;
Proactively attending and monitoring the “soft” aspects;
Calling on active and successful researchers to help with recruitment.
Hülseger, Anderson & Salgado (2009): Team-Level Predictors of Innovation at Work. A Comprehensive Meta-Analysis Spanning Three Decades of Research
They investigated innovation in the workplace at the team level. They understand innovation to refer to idea generation as well as subsequent implementation. Innovation was investigated both at the level of the individual as well as the team. Measurements of innovation they included are self-assessment, independent assessment (e.g. supervisor rating, peer rating, expert rating), and objective measurement (e.g. patent count). They pre-selected 15 team-level variables as possible contributors to innovation (see table below). These are almost exclusively assessed via self-report. They did not consider variables at the level of the individual. They acknowledge the potential for bias when it comes to self-report.
After a literature search revealed ~500 potentially relevant studies looking at these variables, they included 104 independent studies over 30 years from before April 2007: N 50,096. Then they conducted a quantitative meta-analysis.
They report the following for each variable: k = number of studies; N = total sample size for all studies combined, based on number of individual participants for all analyses except for team innovation, where it is based on number of teams; r = sample size weighted average observed correlation; S2r = sample size weighted observed variance of correlations; ρ = average corrected correlation (corrected for sampling and measurement error in the predictor and criterion); SDρ = standard deviation of ρ; % VE = variance accounted for by artifacts; 80% CV = 10% lower and 90% upper limits of 80% credibility interval; 95% CI = 2.5% lower and 97.5% upper limits of 95% confidence interval.
ρ = .493, 80% CV = [.130, .856], 95% CI = [.355, .631] (k = 17, N = 4,638)
This construct refers to a common understanding of team goals and a high commitment to these objectives.
It seems to have a considerably larger effect on innovation at the team level than the individual level.
ρ = .475, 80% CV = [.326, .624], 95% CI = [.380, .570] (k = 7, N = 2,719)
This construct seems to refer both to communication with members of the same organization but a different team, as well as with people outside the organization.
It’s not clear to me which type of behavior is being rated as particularly positive.
Support for innovation
ρ = .470, 80% CV = [.222, .717], 95% CI = [.407, .533] (k = 39, N = 15,604)
This construct refers to material as well as immaterial support for innovation. Material: equipment, funding, support staff. Immaterial: recognition, norms, openness to new ideas. Unfortunately, the findings do not differentiate between different types of support.
It seems to have a considerably larger effect on innovation at the team level than the individual level.
ρ = .415, 80% CV = [.050, .780], 95% CI = [.280, .550] (k = 18, N = 4,688)
To describe this construct, it’s worth quoting the relevant section of the paper in full here: “Task orientation, which has also been called climate for excellence, describes “a shared concern with excellence of quality of task performance in relation to shared vision or outcomes” (West, 1990, p. 313). Teams high on this dimension are striving for the highest standards of performance achievable. This is evidenced by mutual monitoring and feedback and by regular appraisals of ideas and performance. Task orientation subsumes the subconstruct task reflexivity, which refers to the process in which the team reflects upon the team’s objectives, strategies, and procedures, and evaluates each other’s work to improve team effectiveness and outcomes. This in turn is meant to lead to the exploration of opposing opinions and the consideration of alternatives and thereby to improve the quality of decisions and ideas. In a similar vein, the theory of team adaptation considers mutual performance monitoring and feedback, as well as critical reflection on team goals, to be important functions of plan execution, which is one of the processes involved in adaptive and innovative team performance. Further, Shalley (2002) pointed out that task orientation is equivalent to intrinsic motivation, which has been stressed as a prerequisite for creativity at the individual level of analysis.”
It seems to have a considerably larger effect on innovation at the team level than the individual level.
ρ = .358, 80% CV = [.064, .652], 95% CI = [.228, .488] (k = 13, N = 3,356)
This construct refers to “regular, high-quality communication”. However, ultimately this merely reflect a high rating by team members of this variable and it’s not clear what led them to do so.
ρ = .307, 80% CV = [.042, .573], 95% CI = [.179, .435] (k = 11, N = 3,588)
“Cohesion refers to the commitment of team members to their work team and their desire to maintain group membership.” This is supposed to create the required psychological safety which allows individuals to take the risk of suggesting new ideas.
ρ = .276, 80% CV = [.070, .482], 95% CI = [.118, .434] (k = 5, N = 1,174)
This refers to the degree to which team members have to rely on others for achieving their goals.
ρ = .172, 80% CV = [–.101, .444], 95% CI = [.078, .266] (k = 28, N = 1,835)
While the correlation is significant, this finding does not seem to generalize well across contexts.
Looking at team-level and individual-level innovation separately, they find a more positive relationship with team innovation ([.157, .360]), reaching the generalizability threshold they had set, and a slightly negative correlation with individual innovation.
ρ = .155, 80% CV = [–.220, .530], 95% CI = [.004, .306] (k = 15, N = 5,243)
This construct refers to the diversity of team members on job-relevant attributes, “such as function, profession, education, tenure, knowledge, skills, or expertise”. This is supposed to track diversity of cognitive resources.
While the correlation is significant, this finding does not seem to generalize well across contexts. It shows a higher correlation (.240) for team-level innovation alone.
ρ = .148, 80% CV = [–.113, .410], 95% CI = [.080, .216] (k = 37, N = 23,146)
This constructs subsumes two separate components: participation in decision making and intragroup safety. The former is self-explanatory. The latter refers to a perceived atmosphere of trust and mutual support within the team.
While the correlation is significant, this finding does not seem to generalize well across contexts. The findings don’t differentiate between the two subcomponents.
ρ = .067, 80% CV = [–.394, .527], 95% CI = [–.134, .268] (k = 13, N = 2,841)
This construct refers to task-related disagreements such as substantive differences in viewpoints, ideas, and opinions.
The correlation is not significant.
ρ = .040, 80% CV = [–.193, .274], 95% CI = [–.157, .237] (k = 4, N = 977)
This construct refers to the degree to which team members have to rely on others for carrying out their tasks. I’m uncertain how exactly this differs from goal interdependence.
The correlation is not significant.
ρ = .020, 80% CV = [–.309, .349], 95% CI = [–.143, .183] (k = 10, N = 4,262)
The correlation is ot significant.
ρ = –.092, 80% CV = [–.325, .141], 95% CI = [–.252, .068] (k = 6, N = 1,304)
This construct refers to confict stemming from interpersonal disagreements, causing negative emotions.
The correlation is not significant.
ρ = –.133, 80% CV = [–.468, .203], 95% CI = [–.318, .052] (k = 8, N = 3,634)
This construct refers to the diversity of team members on non-task-related attributes, “such as age, gender, or ethnicity”.
The correlation is not significant.
Main source: Abella: Soldiers of Reason
Brief summary: RAND was founded in 1948 as a contract research organization for the US Air Force. Since then they have developed very close ties to the entire US national security infrastructure. Their analyses and reports have significantly shaped US defense policy during the Cold War and the Vietnam War. By now their research portfolio also includes areas unrelated to national security and they provide services for non-government organizations. Most information is from the first three decades of RAND.
What they achieved: RAND had a profound influence on the early nuclear strategy of the US and made large academic contributions as a byproduct of their work. However, the ultimate value of this work is contested. Abella argues that they promoted a callous attitude toward civilian casualties, had a misguided view of the Soviet Union, and their analyses were ignorant of human factors. I don’t have the expertise to evaluate the ultimate merit of Abella’s claims. Their inclusion is warranted on the basis of their large influence.
They pioneered the field of nuclear strategy and had an enormous influence on the US strategy during the early years of the Cold War. From what I can tell they still exert considerable influence on national security matters.
They developed systems analysis and made large contributions to rational choice theory. Arrow discovered his impossibility theorem through his work at RAND. All major early game theorists (e.g. von Neumann, Nash, Schelling) worked with RAND at one point or another, further developing the field. RAND also seems to have made important contributions in Artificial Intelligence, mainly by Newell, Shaw, and Simon (Minsky from MIT was a RAND consultant). They also seem to have pushed the frontiers of early computing.
Thirty-two recipients of the Nobel Prize, primarily in the fields of economics and physics, have been associated with RAND at some point in their career (including as advisors and contractors).
What I learned about disruptive research teams:
Early RAND had a clear vision, fueled by a shared and fairly unquestioned ideology: find out how to achieve and maintain US supremacy via implementing an appropriate nuclear strategy.
They tried hard to find and involve the best scientists available. In the beginning, they hosted conferences and summer schools in the relevant fields to find and attract talented researchers. They hired across a vast range of disciplines.
The secret nature of their work reinforced a very strong degree of confidence and “specialness” in this already tight-knit group of young and brilliant scientists.
The role as a government contractor provided a very clear path to enormous impact for any output, despite very few publications.
Researchers at RAND displayed a lot of independent thinking and the intellectual atmosphere was fairly harsh. Abella describes so-called “murder board” meetings to try to poke holes into new ideas and projects. Still, work was collaborative and documents were shared liberally for comments.
RAND offices were designed to force researchers to interact (though the source didn’t say how). They also regularly met outside of work in informal settings.
RAND offices were kept open day and night to accommodate different schedules and long hours.
From what I’ve read, it’s unclear to me what role leadership at RAND played. The relationship between General Curtis LeMay and Frank Collbohm, Founder and 20-year president of RAND, was certainly crucial for its founding. It also secured RAND’s privileged position and access. I had a much harder time judging whether there was significant research leadership. As far as I can tell there was no single research visionary. Still, people like John Davis Williams, Albert Wohlstetter, and Herman Kahn certainly shaped the research practices and priorities to a significant extent. It appears they did so mainly by force of personality and perceived excellence, even without being formally in charge.
Santa Fe Institute
Brief summary: The Santa Fe Institute (SFI) is an independent, nonprofit theoretical research institute dedicated to the multidisciplinary study of complex adaptive systems, including physical, computational, biological, and social systems. It was founded in 1984 by George Cowan and a number of scientists also affiliated with the Los Alamos National Laboratory (which had grown out of the Manhattan Project). All of them shared skepticism about the reductionism present in many scientific disciplines at the time. It has very few permanent research staff but hosts a great number of visiting scholars, workshops, and summer schools. I looked mainly at the first few years of work at SFI.
What they achieved: The SFI pioneered the study of complex adaptive systems which has since evolved into a major scientific field. The two accounts I read agree that this would have happened regardless, but that the SFI sped up this development significantly. Individual achievements of the SFI are harder to isolate and attribute since it relies so heavily on visiting scholars, and its main contribution is arguably fusing similar ideas from different individuals into a field of inquiry.
What I learned about disruptive research teams:
While they do not have a very narrow vision, their research focus is still very clear.
Cowan, the main driver behind the founding of the SFI: “You have to persuade very good people that this is an important thing to do. And by the way, I’m not talking about a democracy. I’m talking about the top one-half of one percent. An elite.” The initial group of founders included several Nobel laureates (Murray Gell-Mann, Kenneth Arrow, Phil Anderson). Waldrop gives the following reasons for this focus on top talent: (1) “Big names” were important to legitimize an early and controversial field. (2) Interdisciplinary work requires high raw intelligence and scientific rigor. (3) Work in a new field requires the ability to ask good questions without much guidance (what effective altruists tend to refer to as “disentanglement”). In addition, they require researchers to be open-minded and willing to collaborate.
The SFI does not have any formal hierarchy. Research groups and projects are entirely self-organized. As a drawback, decision-making is sometimes delayed or relevant individuals are left out of the process because nobody feels responsible.
The offices are deliberately designed to facilitate interaction between researchers (e.g. meeting areas are large, inviting, and furnished with movable couches, tables, and chairs). They assign each researcher to an office with somebody from a different discipline or project to facilitate learning, exchange, and the generation of new ideas.
Having many visiting scholars seems to have two main effects: (1) there is a larger pool of ideas, methods, and questions to draw from, which prevents one clear research agenda from forming (which has upsides and downsides); (2) visitors spread ideas from the SFI to their home institutions.
There is some indication that there’s a special atmosphere associated with knowing every researcher present at the institute because it creates the required intimacy and psychological safety to voice ideas freely.
Palo Alto Research Center (PARC) at Xerox
Brief summary: The Palo Alto Research Center (PARC) is a research group founded by Xerox in 1970 to bring the company into the emerging computing market. Its main contributions to the field were made during the 1970s, after which many of the most scientists moved on to other companies. During that time it had three labs, the two prominent being the Computer Science Lab (de facto led mainly by Bob Taylor) and the System Science Lab (where Alan Kay provided the main vision). This is also the time I focused on.
What they achieved: They fully developed personal distributed computing as we know it today, mainly in form of the Xerox Alto (including computer-generated bitmap graphics and the WYSIWYG text editor Bravo), many aspects of the modern graphical user interface (GUI) and the Ethernet. They also pioneered laser printing and popularized object-oriented programming via Smalltalk. It’s worth noting that a number of relevant ideas and technologies already existed, but PARC managed to fuse them together into a coherent whole. In the end, Xerox and PARC failed to bring a lot of these innovations to market (with the exception of the laser printer).
What I learned about disruptive research teams (Dominic Cummings also has a post about this and ARPA-style funding):
Note: Most of this is based on the Computer Science Lab at PARC.
At PARC, both the Computer Science Lab (CSL) and the System Science Lab (SSL), the so-called Learning Research Group (LRG) in particular, had a strong vision articulated by the respective de facto leaders (Bob Taylor and Alan Kay). One was the “the offices of the future”, basically distributed personal computing, and the other one was “developing computing into something which kids can use” (I’m paraphrasing here). Taylor and Kay didn’t impose these visions but hired people who (broadly) shared them, or could be inspired to share them at least. While not everybody was fully on board in the beginning, the leaders continually communicated and explained it (e.g. by sharing relevant literature), winning everybody over eventually.
They only hired the best. Taylor famously said: “Never hire ‘good’ people because ten good people together can’t do what a single great one can do.” He leveraged his network from his time as a funder of computer research at ARPA. Everybody on the team was involved in hiring. Team members could suggest people to consider for hiring. Applicants had to give a presentation in front of the entire team, after which they got subjected to grueling Q&A. Everybody had a say in hiring decisions which had to be made near-unanimously. Therefore, the recruitment process selected for people everybody got along with and wanted to collaborate with. Collaboration was the norm.
It’s possible to have a leader who is not themselves an excellent researcher (Taylor). He saw his role mainly as recruiting, managing conflict, making disagreement productive by expecting people to be able to pass the Ideological Turing Test for the other side and protecting the project from encroaching bureaucracy. He managed “in and down”. He seems to have been bad and managing “up and out”, i.e. handling relationships within Xerox or external partners. It’s worth noting that many people from PARC are still a bit stumped how exactly Taylor achieved this. What’s clear is that he enjoyed immense respect from everybody on his team.
They used what they developed. This seems hard in many cases, but it was a feature of a lot of early computing research.
Xerox provided an abundance of funding and resources. They also gave them a lot of autonomy and leeway. When Xerox didn’t allow them to buy a PDP-10 from a competitor, the people at CSL just built a clone themselves.
CSL had a flat hierarchy. Everybody reported directly to Taylor. This may have prevented unhealthy status dynamics among people with a lot of ego. Relatedly, researchers were given a lot of autonomy (within the constraints of the vision) and they were largely self-organized. They could choose which project to work on such that people flocked to the most interesting projects.
Only the so-called Dealer Meeting was mandatory at CSL. This was a weekly 60-minute all-hands in-person meeting. The first 15 minutes were dedicated to housekeeping by Taylor. The remaining 45 minutes were dedicated to a presentation by a team member or guest. They had full control over how to use that time (hence the name). The intellectual atmosphere seems to have been fairly aggressive. For instance, it was common (and accepted) that some researchers would shout “bullshit” and lecture the speaker on why they were wrong.
“Real artists ship.” (Steve Jobs), i.e. true innovators should not stop with an idea, but make sure that their idea has a tangible impact in the real world. In contrast, leadership at CSL, at PARC, and at Xerox failed to capitalize on what they had developed. To a significant extent, this seems to have been due to a culture of arrogance, lack of a good connector between PARC and Xerox, as well as lack of expertise at Xerox when it came to computing.
They picked their location in Palo Alto strategically based on where they thought the greatest computer scientists at the time would want to live.
Waldrop argues that part of the reason that PARC was successful was the timing: it brought together a group of great people who had grown up in an extremely fertile computing culture and community (fostered mainly by ARPA grants). Kay: “In the history of art, the most powerful work in any genre is done not by the adults who invent it, but by the first generation who grow up in it. Think of perspective painting during the Renaissance. Well, we were that generation. We were the kids who’d had our PhDs paid for by ARPA.”
Brief summary: Bell Labs served as the research lab of AT&T during their monopoly years as the sole telephone network provider, mainly between 1925 and 1982. They were charged with developing new technologies to improve the (tele)communications network operated by AT&T. From the beginning, they had a strong commitment to basic research.
What they achieved: Research conducted at Bell Labs can be rightfully credited with ushering in the information age as we know it today. Similar developments would probably have been made in any case, but my impression is that Bell Labs did speed up these innovations considerably. They did pioneering work in solid-state physics, culminating in the development of the transistor, a crucial technology for digital computing. Shannon developed information theory while at Bell Labs and did groundbreaking work on cryptography. They also developed radio astronomy, the laser, and the charge-coupled device (CCD), contributed to fiber-optic technology, and built the first photovoltaic cells. They contributed to early computing: designing the Unix operating system and the programming languages C, C++, and S. They are responsible for sending the first telecommunications satellite into orbit and developing cellular telephony. Nine Nobel Prizes have been awarded for work completed at Bell Laboratories. The Turing Award (also known as the Nobel Prize of computing) has been won three times by Bell Labs researchers.
What I learned about disruptive research teams:
They had a very clear vision: making the telecommunications network better or cheaper.
They aggressively tried to hire the best scientists from university departments around the country, e.g. sending out recruiters to campuses regularly. One manager even sent a letter to a scientist he wanted to hire once per year to ask him to join Bell Labs. He eventually succeeded.
Within the constraints of this vision, they gave researchers (doing basic science) extraordinary amounts of freedom. They could pursue their research autonomously and without many managerial directives.
They had a clear development pipeline: the basic research division was tasked with blue skies research in potentially relevant areas. The applied research division took insights from basic research inside and outside of Bell Labs and tried to work out how to make them work for AT&T. The development division took those ideas and turned them into products which could be mass-produced. The manufacturing division manufactured.
Bell Labs placed great emphasis on the exchange of ideas, deliberately designing their building to force researchers to interact with one another. Many inventions and technologies were the results of collaboration.
They also prized learning: there were study groups and seminars to enable researchers to learn relevant material. There was also an explicit open door policy, i.e. it was encouraged to approach other researchers with questions and ideas if they had relevant expertise.
Skunk Works (Lockheed)
Brief summary: Skunk Works is the informal name of the special project division of Lockheed Martin. It was founded in 1943 by Kelly Johnson, who later handed over the reins to Ben Rich. It operates as a contractor to the US Air Force and the CIA.
What they achieved: Over many decades they developed extremely advanced and original military aircraft that provided the US with crucial strategic advantages during the Cold War, primarily via improved observation capabilities. They’re famous for extremely lean and fast development and production. In 1943, they built America’s first jet-propelled military aircraft, the P-80 Shooting Star. In 1955, they built the U-2 spy plane for the CIA, intended to conduct observation missions over the Soviet Union. It was able to fly at altitudes which made it impossible to reach by Soviet interception fighter planes. Its overflights led to the discovery of missiles on Cuba and were responsible for dispelling both the “bomber gap” myth and the “missile gap” myth. After U-2 missions became impossible, the Skunk Works designed the SR-71 Blackbird during the 60s, a Mach-3+ aircraft that flew at such high speeds and altitudes that it was again impossible to intercept. The engineering challenges for this plane were considerable. In 1976⁄77 they developed the first stealth aircraft which seems to have provided a considerable strategic advantage during the first Gulf War.
What I learned about disruptive research teams:
Skunk Works employees were recruited from the best Lockheed engineers. Johnson kept the team very small but instilled a sense of excellence in every new employee who then tried to live up to this expectation.
Johnson, in particular, is described as a visionary and great leader who was an excellent engineer with extremely high standards and strong convictions. He tried hard to fight both Lockheed and military bureaucracy and didn’t require internal reports except for very high-stakes matters.
The contract nature of their work provided a clear and impactful theory of change for their innovations. Good relationships with external stakeholders were crucial for that.
The secrecy of their work bred a sense of brotherhood.
They worked in very close quarters: there was little separation between designers, engineers, and builders. They tried to involve each division during the whole process to create useful feedback loops.
The fast and unambiguous feedback of engineering considerably speeds up research.
Los Alamos Laboratory (Project Y)
Rhodes: The Making of the Atomic Bomb (Part 2 & 3)
Brief summary: The Los Alamos Laboratory, also known as Project Y, was a secret laboratory located in Los Alamos, New Mexico. It was established by the Manhattan Project during World War II to design and build the first atomic bombs. I focused on the time of Robert Oppenheimer’s tenure as its first director, from 1943 to December 1945.
What they achieved: They built the first atomic bombs within just 30 months. During this time, they developed two functional and distinct bomb designs: (1) “Little Boy” was a gun-type fission weapon using enriched uranium. (2) “Fat Man” was an implosion-type fission weapon using plutonium. Concurrently, with far fewer resources, they did early research on the so-called “Super”, a fusion bomb. After the war, the continuation of this research would lead to the development of hydrogen bombs.
What I learned about disruptive research teams:
In the Los Alamos Primer it states very clearly: “The object of the project is to produce a practical military weapon in the form of a bomb in which the energy is released by a fast-neutron chain reaction in one or more of the material known to show nuclear fission.” Everybody considered this an extremely important undertaking, given the specter of Nazi Germany developing atomic weapons first. Many of the scientists had Jewish ancestry, some had even fled Europe. This was evidenced by the fact that people did whatever was needed to get the job done—be it mundane or even dangerous tasks.
Oppenheimer invested significant effort into persuading the best scientists he could find for the project. The assembled group included Nobel laureates (Enrico Fermi, Isidor Isaac Rabi) as well as several future Nobel laureates (Felix Bloch, Emilio Segrè, Richard Feynman, Hans Bethe, Luis Walter Alvarez, John Hasbrouck Van Vleck). John von Neumann consulted on the project.
Leslie Groves, Oppenheimer’s superior, is usually described as an extremely capable administrator. He managed “up” and “out”, kept himself out of the science, and pushed the entire Manhattan Project forward by making quick decisions and making sure all of the different parts came together. Oppenheimer himself also proved to be a very capable leader, even though he had not managed such a large group of people before. He combined strong technical skills with excellent social skills. He knew many people on the team extremely well and managed conflicts between even the most difficult members very well. He was a role model for many, signaled to everybody that they mattered to the success of the project, and people drew strength from his moral authority, dedication, and confidence. People worked hard as to not disappoint him.
Oppenheimer had to push Groves for centralization of bomb development and free exchange of ideas within Project Y because Groves had serious security concerns. However, this seems to have been a crucial step for the development of the weapons by speeding up feedback loops and increasing coordination between different strands of the project. To accommodate the security concerns, the lab was isolated in the New Mexican desert.
Project Y was hierarchically organized, which did cause problems at times because members felt mistreated or their egos were bruised. However, with a staff size of over 3,500 people in 1945, some kind of hierarchy seems necessary.
The secret nature, isolation, and importance of the project created strong bonds between all the team members. They spent most of their free time together.
Leslie Groves worked tirelessly to provide all the resources necessary to lead the bomb project to success. This notably did not include comfortable living conditions for the people working on the project.
The fast and unambiguous feedback of engineering considerably speeds up research.
Kahneman & Tversky
Brief summary: Kahneman and Tversky are two Israeli psychologists who successfully collaborated over several decades.
What they achieved: They successfully challenged the rational choice model of human behavior in psychology and economics through their work on heuristics, biases, and prospect theory. They kickstarted behavioral economics through their influence on Richard Thaler and are responsible for similar developments in other fields such as medicine, sports, and business. Kahneman received the Nobel Prize in economics for their work.
What I learned about disruptive research teams:
Kahneman and Tversky were both very intelligent, but each provided a unique perspective and skill to their partnership: Kahneman was good at noticing problems and confusions and was unconstrained by existing paradigms. Tversky was well-versed in mathematics and just an extremely fast mind. Fun fact: in 1978, Stanford University decided to make Tversky an offer for a lifetime appointment within eight hours of learning he was looking for a job.
They had great respect for one another.
They sought out opportunities which freed them up from university duties (e.g. Oregon Research Institute).
They proactively thought about how to make their research have an impact (e.g. trying to design curricula, doing work for the military, developing persuasive framings, branching out into biases in medical science).
Bennis: Characteristics of “Great Groups”
In Organizing Genius, Bennis distills fifteen characteristics of what he calls “Great Groups”. He looked at the Disney Company, PARC, the Bill Clinton election campaign of ’92, Skunk Works (Lockheed), Black Mountain College, and Los Alamos Laboratory (the “Manhattan Project”). This does not only include research teams but is still somewhat informative. These are my notes from the last chapter in which he summarizes the lessons he draws from them.
Greatness starts with superb people.
It’s not about good people, but great ones. The difference can be large and as Bob Taylor said: “You can’t pile up enough good people to make a great one.”
The ideal people are deep generalists, not narrow specialists, i.e. they know enough to employ the tools of various disciplines, but are unconstrained by the conventions of those disciplines. They should be concerned with solving problems first and foremost.
The excellence of each member leads everybody to stretch themselves and perform to their maximum because they want to belong.
Great Groups and great leaders create each other.
I have a hard time parsing what he means here.
Every Great Group has a strong leader.
He describes the best leader as a “pragmatic dreamer with an original but attainable vision”. This serves as an important beacon for recruitment and focuses the work of the team.
They make decisions quickly, deal decisively with problems, and inspire confidence in the face of problems and setbacks.
They are not creators, but curators, i.e. they direct the excellent work of others and choose the best solutions and ideas.
They earn the respect of the team through their integrity and commitment.
The leaders of Great Groups love talent and know where to find it.
The leader recruits people better than themselves and often utilize their excellent networks. Ideally, recruitment is not focused on a narrow discipline or background.
Great Groups are full of talented people who can work together.
Members don’t have to be friendly or social, but at least willing to collaborate to achieve the shared goal. This requires at least mutual respect.
Great Groups think they are on a mission from God.
The groups believe that they’re doing something extremely meaningful and monumental, made concrete in a clear and strong vision. This allows for considerable sacrifice.
Communicating this sense is an important tool for recruitment.
Every Great Group is an island—but an island with a bridge to the mainland.
The groups are often physically isolated from the rest of the world, and even invent their own language, traditions, and rituals.
Despite bare surroundings, they still have a lot of fun, even exhibiting giddiness.
Great groups see themselves as winning underdogs.
They often contrast themselves with some kind of established competitor.
Great Groups always have an enemy.
Feelings of war and competition can tap into additional resources.
People in Great Groups have blinders on.
During the time of the Great Group members usually care for little else. The mission is all they want to talk about, even outside of work.
Great Groups are optimistic, not realistic.
Members are often young people who haven’t tested or seen their limits yet. This enables them to have unbounded optimism and enthusiasm.
The difficulty of the challenge adds to the joy.
In Great Groups, the right person has the right job.
In such groups, members should be given roles which they love, such that they can excel in whatever they’re doing.
Such talented people are usually not interchangeable and must be given their niche to thrive.
The leaders of Great Groups given them what they need and free them from the rest.
Members need a worthy challenge, colleagues who stimulate and challenge them, focus on what’s important, and the right tools.
They should not have to deal with trivial or bureaucratic tasks, and usually don’t need a fancy environment.
Great Groups need a great flow of information within the team.
The leader must relieve individual stress and resolve conflicts between members.
Each member needs a lot of autonomy to achieve their goals.
Great Groups ship.
Such successful collaborations are dreams with deadlines, i.e. they usually want to create something concrete.
Great work is its own reward.
Note that this is my loose definition. There is significant scholarly debate on what an appropriate and rigorous operationalization of the term should look like. ↩︎
Thanks to Max Daniel for bringing this group to my attention. This report in the 1999 edition of the Notices Of The American Mathematical Society paints a picture very similar to the other groups I investigated. However, I did not find more in-depth accounts. ↩︎
What they call “task orientation” might gesture in this direction, but the connection is thin. ↩︎
Woolley et al. (2010) find that the average social sensitivity of group members, the equality in distribution of conversational turn-taking, and the proportion of females in the group are most predictive for group intelligence. In a replication attempt, Bates & Gupta (2017) find that individual intelligence of the team members account for most of the variance in group intelligence (80%). Woolley et al. have almost twice the number of participants, but the replication is probably less likely to be the result of publication bias. ↩︎
To my knowledge Bennis coined this term to refer to people who have acquired considerable expertise in multiple domains and can comfortably apply the tools of these disciplines whenever appropriate. He argues that this allows them to be open to new findings and approaches because they’re less constrained by the received wisdom of their disciplines”, while still being able to apply the best tools they offer to solve the novel problems they face. ↩︎
In this section, I will only list the mean correlation as well as the 95% confidence interval. If you’re interested in further information, please refer to the relevant parts of the next section. ↩︎
There is a case to be made that tribal psychology can be harnessed for this by casting an outside group as an enemy to be defeated (see Appendix). While this might work in some instances, I do not think this is a wise strategy in the long term. ↩︎
They analyzed more than 65 million papers, patents, and software products from 1954 to 2014, looking at the relationship between the number of authors and disruptiveness. They don’t provide a correlation coefficient but the findings seem unambiguous, given their notion of disruptiveness. They conclude: “In summary, we report a universal and previously undocumented pattern that systematically differentiates the contributions of small and large teams in the creation of scientific papers, technology patents and software products. Small teams disrupt science and technology by exploring and amplifying promising ideas from older and less-popular work. Large teams develop recent successes, by solving acknowledged problems and refining common designs.” ↩︎
Their factor “frequent communication” includes communication with external stakeholders. ↩︎
From the paper: “Credibility intervals indicate whether the corrected correlation can be generalized or whether it is situation specific (i.e., whether it varies between different organizational settings). Thus, this interval conveys information on the variability of individual correlations. From the width of the credibility interval, it can be inferred whether moderators are operating. In the case of both positive and negative mean corrected correlations (ρ), generalizability can be inferred if the credibility interval does not include zero.” ↩︎
I omitted most references for readabillity. ↩︎