Why and how to assess expertise

tyleraltermanFeb 14, 2016, 1:43 AM

25 points

After my previous hippy dippy post, I figured I’d balance the scale with one that is more content-heavy. The following post is about a skill that will be important for many effective altruists to learn: expertise assessment. The thoughts below are from over a hundred hours of things like designing EA Outreach’s hiring process, interviewing lots of job candidates, reviewing 1000+ EA Global applications, and creating the Pareto Fellowship evaluation pipeline.

Why learn how to assess expertise?

Improving the world will involve drawing upon many knowledge and skill domains. Barring development of the skill-learning chair from the Matrix, your capacity is too limited to master all of these domains.

Thus, you must rely upon experts.

However, there are challenges to expert identification. Firstly, there are many who masquerade as experts. This masquerading may be deliberate—e.g., probably most fortune tellers—or due to poor self-evaluation—e.g., many college sophomores. Secondly, there are many experts who are not fully aware of their expertise. This will often be true for “intuitive” experts, e.g., experts at charisma or experts at implicitly modeling other people. Thirdly, there are entire domains in which the base-rate of true expertise in the domain’s phenomena amongst even highly credentialed people is quite low. From both my personal experience in a number of labs, and from coming across lots of findings like this one, I can tell you cognitive science is one such example of a field. This will also be the case in many other social sciences, financial forecasting, consulting, etc. This means that someone’s job title or the letters “PhD” in someone’s email signature are often (usually?) not sufficient markers of expertise. (This is probably the most common trap for those seeking expert consultation.)

Thus, you must learn how to assess expertise.

Here are some circumstances when having expertise assessment tools is incredibly useful:

You are asking someone for important information (e.g., as when Open Philanthropy Project consults policy experts for its US Policy investigations)
You are figuring out where to donate (e.g., you want to see whether a given organization actually has the ability to do the world-improvement activities that they are intending to do).
You are evaluating a potential new hire (e.g., you want to see whether a candidate for a marketing position is in fact good at marketing)

The paradox of expertise assessment

A sociologist friend of mine told me a story about a Chinese emperor named Qin Shi Huang. As the story goes, Qin Shi Huang sought to live forever. So, he recruited a multitude of advisors to find the hidden secret to immortality. Unfortunately, the men he recruited had no ability to assess which magicians or alchemists might be experts in immortality elixirs. Furthermore, the emperor had no ability to assess which advisors might be experts in assessing experts in immortality elixirs. (Note even further that there are no true experts in the relevant domains here. This may remind you of fields such as technological forecasting.) In the end, as these things tended to go, a lot of advisors got executed.

This has a lot in common with Meno’s Paradox. Here is Meno’s paradox rephrased as a paradox of expertise assessment:

If you are not an expert in domain D, it is impossible to independently assess whether someone is an expert in D.
You are not an expert in D.
Therefore, it’s impossible for you to independently assess whether someone is an expert in D.

This argument is probably false. Particularly, premise 1:

If there exist DGM—domain-general markers that tell you whether someone may be an expert—then it is possible to independently assess whether someone is an expert in D, even if you are not an expert in D.
There exist DGM.
Therefore it is possible to independently assess whether someone is an expert in D, even if you are not an expert in D.

To follow I will describe some domain general markers for assessing expertise across domains.

I’ve combined them into couple “back-pocket” methods—ones you can use to evaluate people you meet at conferences or dinners, or for people who you deliberately seek out. They are definitely not exhaustive, but they should cover most cases. I’ve listed the first back-pocket method below. The second back-pocket method I’d rather not spread widely, since it is much more game-able. If you’d be interested in it, feel free to email me at tyler@centreforeffectivealtruism.org. I also have a much more heavy duty spreadsheet-based method if you have a particularly high-stake expertise assessment task (e.g., you’re choosing which global poverty team to join for the next several years, or which animal welfare organization to give an enormous amount of money to.)

Necessary conditions for expertise: the P-I-F-T method

The claim: each of the four criteria below are necessary (but, importantly, not sufficient) conditions for expertise. You may find them to be pretty obvious, but ask yourself: do I explicitly assess for these things while engaging experts? If not, you may run the risk of, e.g., hiring the wrong people or acting on faulty information.

To gain fluency in assessing each of the following conditions, I recommend the following process:

Write down a list of examples where it is critical for you to trust either your own or someone else’s knowledge or skills. These are examples where either you are the expert or you are counting on an expert.
Think of ways in which you can apply the criteria below to the expert.

Furthermore, each of the examples below will test your intuitions on expertise through questions at the end of them. Please try to answer the questions before looking at the answer.

P: Processing of relevant information

Is the person performing detailed mental operations upon relevant data, beyond the ones which non-experts perform? Is this the sort of processing which would plausibly yield expertise in the relevant domain?

Examples

{You want to invite a philosophy expert to speak at your conference.}

Brad and Will encounter an argument. Brad, the analytical philosophy novice, assesses whether it feels intuitively true. Will, the analytical philosophy expert not only assesses whether the argument feels intuitively true, but also assesses its conceptual clarity and hidden premises. Who do you think has the better marker of expertise? Why?

Answer: Brad lacks a robust process; Will does not. Invite Will.

{You want to learn persuasion.}

Steve and Hilary must persuade Rochelle the French bureaucrat to expedite their visa process. Both Steve and Hilary note that Rochelle is wearing delightful chartreuse earrings. Steve, the persuasion novice, runs the mental process of noting that Rochelle is a person, and that people generally respond well to friendliness. Hilary, the persuasion expert, notes that Rochelle’s chartreuse earrings are the only flamboyant clothing items that anyone in the office is wearing. Based on a large sample size of processed experience now stored in her system one, she guesses that the chartreuse earrings could signify that Rochelle wants to distinguish herself from her fellow bureaucrats—to show that she is not just another bureaucrat. Based on this, Hilary makes a gamble to say things which shows that she appreciates Rochelle’s uniqueness—e.g., “You’re the friendliest-looking embassy employee I’ve ever met!” Who do you think has the better marker of expertise? Why?

Steve lacks a robust process; Hilary does not. Learn from Hilary.

{You are a foundation program officer deciding who to give a grant to.}

Martha and Leanne are both published neuroscientists who study the lateral geniculate nucleus. You have already read their grant proposals, and they seem to be of comparable quality. Even though you are not an expert in geniculate nuclei, let alone lateral ones, you must choose one person to give a grant to. Thus, you must decide who is the potentially more revolutionary scientist. Both Martha and Leanne seem to have broad knowledge of most published work on the topic. They appear to be equally intelligent. However, Leanne takes the novel approach of gem-mining dynamical models in physics and epidemiology that seem to describe similar phenomena in the lateral geniculate nucleus. She also spends a lot of time devising thought experiments and free-associating around tricky questions. These methods seem a bit unusual, but in your grant-making experience, you’ve found that—all else equal—scientists who use unusual methods tend to produce more innovative work. Who do you think has the better marker of expertise? Why?

You bet on Leanne over Martha. In this case, both Martha and Leanne probably have robust processes. However, Martha lacks a process that yields revolutionary expertise; Leanne more likely does not. You probably made the right bet.

Ways of assessing

I. Ask questions which will reveal the mental processes of experts, such as:

a. “Before you fix a computer, what’s your general diagnostic process like?” (For a computer repair specialist)

b. “The constructs of confidence and self-efficacy seem very similar. How do you tell the difference between them?” (For an expert in social psychology)

c. “How does the quality of Richard Dawkin’s work compare to that of other evolutionary biologists? Do you have any critiques?” (For an expert in evolutionary biology)

d. “Let’s say I want to stage a magic trick in an extremely crowded room. How would I do that? What about in a room with very loud music?” (For an expert party magician)

II. Find out whether they’ve been part of a job, program, or mentorship what would have equipped them with special mental processes.

I: Interaction with relevant information sources

Is the person regularly interfacing with relevant data? Is this the sort of data that an expert would plausibly engage? Note that merely encountering relevant sources is not sufficient. The expert needs to have paid attention to these sources, as the first example will illustrate.

Examples

{You want to hire a graphic designer.} Bob and Maria are walking down a city street. Bob, a graphic design novice, pays no attention to the signs and advertisements along the side of the street, even though they are within his field of vision. Maria, an expert, pays full attention to these things. She notes the lack of spatial alignment amongst elements in the dry cleaner’s sign. As she passes a Louis Vuitton ad at the bus stop, she ogles the beautiful ball serifs of the Bauer Bodoni bold italic typeface (incidentally, my favorite font!). Color schemes, geometry, and visual flows all jump out at her as objects onto themselves. Who do you think has the better marker of expertise? Why?

Bob encounters relevant data, but does not engage it; Maria both encounters and engages relevant data. Hire Maria.

{You want to learn how to fundraise.}

Cassandra is a quantitative finance expert but a novice at fundraising. Jake is an expert at fundraising. Jake is constantly immersing himself in fundraising case studies, talking to other experts, and meeting with funders. Cassandra, on the other hand, interfaces with sources like mathematical models of markets. Who do you think has the better marker of expertise? Why?

Cassandra does not interface with relevant enough information sources; Jake does do so. Consult Jake.

{You want to improve the effectiveness of your team.}

Brian is a Princeton academic who claims to be an expert in team effectiveness. The evidence: He has analyzed 1000 small family businesses and has been published multiple times in Science. Miranda does not claim to be an expert in team effectiveness, but several people have suggested that she might be. The evidence: she is the rare type of venture capitalist who formerly founded a successful startup, ran a large company, and now sits on nonprofits boards and invests in companies of all sizes (and has a winning track record doing so). Who do you think has the better marker of expertise? Why?

Unless your organization is a small family business, Brian has probably not interfaced with relevant information sources. Miranda, on the other hand, has engaged a wide variety of organizations. There is a good chance that her ideas about team effectiveness might be higher quality, since she will likely have abstracted organization-general lessons from a more diverse sample. This is a more difficult case than the other two, but if faced with a decision between the two, I would consult Miranda instead of Brian.

Ways of assessing

I. Ask questions which will reveal what sorts of information they engage, such as:

a. “Tell me about individual cases in your management experience.” (For a manager you might hire)

b. “What sorts of things do you pay attention to when you’re at an event?” (For an event director)

c. “Roughly how many pieces do you edit in an average month?” (For an editor)

d. “Which papers would you recommend reading to understand the cutting edge in hyperbolic geometry?” (For an expert in hyperbolic geometry)

II. Find out whether they’ve been part of a job, program, or mentorship what would have given them strong samples of relevant information.

III. See how fluently they can generate examples of phenomena in the domain. The more examples they can generate, the better.

F: Feedback with relevant metrics

Does the person have (or have they had) feedback loops that help them accurately calibrate whether they are increasing their expertise or making accurate judgements?

In domains where reality does not give good feedback, they need to have a set of well-honed heuristics or proxy feedback methods to correct for better output if the result is going to be reliably good (this goes for, e.g., philosophy, sociology, long-term prediction). In domains where reality can give good feedback, they don’t necessarily need well-honed heuristics or proxy feedback methods (e.g., massage, auto repair, swordfighting, etc.). All else equal, superior feedback loops have the following attributes (idealized versions below):

Speed (you learn about discrepancies between current and desired output quickly after taking an action so you can course-correct)
Frequency (the feedback loop happens frequently, giving you more samples to calibrate on)
Validity (the feedback loop is helping you get closer to the output you actually care about)
Reliability (the feedback loop consistently returns similar discrepancies in response to you taking similar actions)
Detail (the feedback loop gives you a large amount of information about the difference between current and desired output)
Saliency (the feedback loop delivers attentionally or motivationally salient feedback)

Examples

{You want to predict technology timelines} Julie and Kate both claim to be experts in technological forecasting. When you ask Julie how she calibrate her predictions, she replies, “Mainly, I just have sense for these sorts of things. But I also do things like monitor Google Trends, read lots of articles on technology, and ask lots of people what they think will happen. I’ve been doing this for 20 years.” She then points to a number of successful predictions she’s made. When you ask the same question to Kate, she replies, “Well, in the short term, it’s been shown that linear models of technological progress are the best, so I tend to use those to calibrate on the timespan of 1-3 years. If I make longer term predictions, I try to tell as many stories as possible for how those predictions may be false. Then I try to make careful arguments that rule out these stories. Furthermore, I always check whether my predictions diverge substantially from other technological forecasters. If they do, I try to figure out why. I’ve also identified a number of technological forecasters who have consistently good track records, and I study their methods, evidence, and predictions carefully. Finally, whenever one of my predictions turn out to be false, I spend about a week figuring out whether there is any general principle to be learned to guard against being wrong in the future.” Who do you think has the better marker of expertise? Why?

Technological forecasting is a domain in which reality doesn’t provide strong feedback, so you need proxy feedback. Julie does not have good proxy feedback while Kate does have relatively decent proxy feedback methods. Barring special information about Julie, Kate’s predictions are likely to be more reliable, all else equal.

{You want to choose a piano teacher}

Both Ned and Megan are piano teachers. Of the two, Ned is a much better pianist, having won many awards and played at Carnegie Hall many times. You ask both Ned and Megan how they can tell whether their teaching is working for a given student. Ned replies that he simply looks at the outcomes: if a student practices under him for several years, they become much better. “Basically, I show them how to play scales and pieces well, and then I check in about once every other week to make sure they are practices the drills I showed them.” Megan replies with a detailed set of ways she can note rate of progress and how she adjusts her teaching accordingly. “For example, I know whether a student has ‘chunked’ a given chord through the following method: I stand behind the piano and quickly turn around a piece of paper with a chord on it and time how many milliseconds it takes for a student to react and play the chord. Also, I each week I ask them to honestly report on whether they feel as if the chord is still a series of notes or whether it feels more like ‘one note.’ This indicates that the chord has become a ‘gestalt’ in the students mind. Another example: whenever a student makes an error while playing a piece, I mark the corresponding area in the sheet music. Eventually, I can then tell what types of errors a student generally makes by analyzing the darkest areas on various pieces—the places with the most pen marks.” Megan continues to tell you similar examples. Who do you think has the better marker of expertise? Why?

In this case, while Ned may be the better pianist, he may not be the relative expert at teaching piano. It would seem he lacks relevant feedback loops to tell him whether he is successful at teaching. While he notes that his students improve over time, he is not entertaining the possibility that they may have improved counterfactually over time without his intervention.

{You want to hire a manager}

Both Todd and Greg have applied for a manager position at your organization. You ask each of them about their process for monitoring the rate at which their teams are making progress on goals. Todd: “I have everyone on a system where I can monitor the amount of Pomodoros each person is completing. If certain team members are lagging behind in their amount of Pomodoros, I give them a pep talk, after which the amount tends to go back up.” Greg: “I have each team member set daily subgoals. Then I look at two things: (a) whether these subgoals tend to align to the broader goals and (b) whether they are achieving the subgoals they set for themselves. If a team member is lagging behind in (a) or (b), I give them a pep talk, after which they tend to perform better.”

In this case, both Todd and Greg have decent feedback loops. However, Todd’s feedback loop is more likely to fall victim to Goodhart’s law. In other words, though his method might be high in reliability, the measurePomodoro-maximization might accidentally become the target, even though the intended target is goal completion. Greg’s feedback loop is higher in validity, in that it measures the target he actually cares about more tightly.

Ways of assessing

I. Ask questions which will reveal the details of their feedback loops (and whether they have them), such as:

a. “Let’s say I’m already a proficient coder, but I want to learn how to code at the level of a master. What sorts of problems might I practice on to move from proficiency to mastery? Are there any textbooks I should read?” (For a software engineer)

b. “In what ways do people typically stumble when they try to improve at data analysis?” (For a data analyst)

c. “How do tell whether a marketing campaign is working?” (For a professional marketer)

d. “Can you tell me a bit about how you learn?”

II. Find out whether they’ve been part of a job, program, or mentorship what would have given them strong feedback loops.

III. Sometimes, people with tacit expertise will not be able to articulate their feedback loops. Analyze whether reality provides robust feedback in their domain. For example, a bike-rider might not be able to describe the feedback loops through which they learned bike-riding. However, reality automatically provides feedback in the domain by causing novice bike-riders to fall over, until they accumulate enough procedural knowledge to balance on two wheels.

T: Time spent on the above

This one is the most straightforward of all the necessary conditions for expertise. (Thus, I won’t go into much detail.) Simply: An expert needs to have spent enough time processing and interacting with the relevant data with robust feedback loops.

Ask: Has this expert put a plausibly sufficient amount of time into learning or using the skill in order to gain expertise?

For some skills, like using a spoon, there is a short latency between beginnerhood and expertise. For others, like having well-calibrated political views, there is quite a long latency. Accordingly, you can probably trust the average claim about spoon-use and should be suspicious of the average claim about politics.

There you have it: the PIFT method for assessing basic conditions for expertise. Here is what the underlying model looks like:

One easy way to remember the method: “If the person claims to be an expert and is not, say, ‘pift!’”

If it seems that I have misidentified or failed to identify a necessary condition for expertise, please let me know!

What links here?

tyleraltermanFeb 14, 2016, 1:43 AM

25 points

29 comments12 min readEA link

Expertise

TaraMacAulay Feb 14, 2016, 12:44 PM
14 points
0 ∶ 0

Great post—identifying experts and, in particular, comparing expertise between similar candidates is exceptionally difficult, using even a rough model seems likely to greatly improve our ability to undertake this task.

While it seems possible to make some progress on the problem of independently assessing expertise, I want to stress that we should still expect to fail if we proceed to do so entirely independently, without consulting a domain expert—Great! - now we have a simpler problem—how do we identify the best domain expert who can help us build a framework for assessing candidates?

Tyler’s model seems somewhat helpful here, and adding the components from John’s model improves it again. My prior approach was a simpler one, but shares some characteristics. I usually look for evidence of exceptional accomplishments that are rare or unprecedented, and ignore most examples of accomplishments which are difficult or competitive but common. Peer recognition is also a good barometer, more so if you ask people who are field insiders but have a merely casual acquaintance with the person in question. In the case of picking an expert who can help me identify predictors of expertise in their field, I’m less concerned with my ability to rate and compare their level of expertise with other top-level experts, as it’s fairly low cost to seek out the opinions of multiple experts.

When we were considering hiring a digital marketer, I sought input from 4 people who I will call experts, doing so dramatically improved my ability to pick the best candidates from the pool. I tested my predictions against the experts by rating applications for the top 5 candidates myself, then getting the domain expert to rank them and compare scores, watching them doing so. Watching the expert evaluate other candidates helped me pick out further elements which were not in their original verbal model. This part seems qute different than Tyler’s approach, as it is about identifying domain-specific expertise, rather than searching for domain general predictors of expertise, however it seems important to mention. I worry that neglecting to seek out domain-specific predictors would lead to a poorer outcome.

I also want to tease apart the question of attaining domain level expertise versus having a good process for generating expertise. I imagine that it is possible for those who have a good process (these people would, I imagine, score well using Tyler’s model) to become experts more quickly. I imagine there is another class of experts who have decades of experience, rich implicit models and impressive achievements, but who would struggle to present concise, detailed answers if you asked them to share their wisdom. I suspect that quiet observation of such a person in their work environment, rather than asking them questions, would yield a better measure of their level of expertise, but this requires considerable skill on the part of the observer.

I’d love to think about this more, looking forward to trying on your framework and playing around with it.
- RomeoStevens Feb 14, 2016, 8:30 PM
  3 points
  0 ∶ 0
  Parent
  
  The method of observing experts and turning their heuristics into a simple scale is well supported in the forecasting literature (don’t have a quick cite handy unfortunately).
  - John_Maxwell Feb 16, 2016, 12:50 AM
    3 points
    0 ∶ 0
    Parent
    
    Some related research.
    - RomeoStevens Feb 17, 2016, 2:38 AM
      2 points
      0 ∶ 0
      Parent
      
      This as well.
- tyleralterman Feb 14, 2016, 6:11 PM
  1 point
  0 ∶ 0
  Parent
  
  While it seems possible to make some progress on the problem of independently assessing expertise, I want to stress that we should still expect to fail if we proceed to do so entirely independently, without consulting a domain expert
  
  Right, I should have mentioned this. Your job is much, much easier if you can identify a solid “seed” expert in the domain with a few caveats:
  - If the seed expert becomes your primary input to expertise identification, you should be confident that their expertise checks are good. I’m tempted to think that the skill of domain-specific expertise identification correlates strongly with expertise in that domain, but not perfectly. This will be especially true in fields where there are lots of persuaders who have learned how to mimic signs of expertise.
  - Keep domain-specific expertise base-rates in mind, as mentioned above. In domains where the expertise base-rate is low (e.g., sociology), you will need to run many more expertise checks on the seed expert than usual, and will have a harder time finding a passable expert in the first place.
  - In fields where results are not easily verifiable (e.g., sociology again), it will be more difficult to identify a seed expert. Also, these seed experts will often have a hard time identifying revolutionary forms of expertise, since they might look like crackpots. (As opposed to, say, math, where there are cases of people who prima facie look like crackpots being nonetheless hired as professors, since their results are reliably verifiable.)
  - In fields with high variance, you may be able to find a passable seed expert who cannot consistently identify experts who are much, much better than they are.
  - In fields with poorly networked knowledge, seed experts will be much less helpful. I can imagine this being the case for fields like massage therapy, where I expert there to be fewer journals and conferences.
- tyleralterman Feb 14, 2016, 6:23 PM
  0 points
  0 ∶ 0
  Parent
  
  
  I imagine there is another class of experts who have decades of experience, rich implicit models and impressive achievements, but who would struggle to present concise, detailed answers if you asked them to share their wisdom. I suspect that quiet observation of such a person in their work environment, rather than asking them questions, would yield a better measure of their level of expertise, but this requires considerable skill on the part of the observer.
  
  Indeed: tacit experts. The way I assess this now is basically by looking at indirect signs around the potential tacit expert (e.g., achievements is a good one, as is evidence of them having made costly tradeoffs in the past to develop their expertise (a weaker sign).) If anyone develops tools for directly assessing tacit experts, please let me know.
  
  I’d also be very interested if anyone has ideas for how to learn the skills of tacit experts, once you’ve identified them.
  - Gleb_T Feb 17, 2016, 2:19 AM
    0 points
    0 ∶ 0
    Parent
    
    One idea for learning the skills of tacit experts that I found works is to copy their behaviors regarding the domain, without necessarily understanding the reasons behind their behaviors.
    
    It sounds strange to us as people who are very intellectually-oriented and seek to understand the reasons behind why something works. I know it did to me when I first tried to do it. Moreover, there is a danger of copying behaviors that are incidental and do not lead to the desired outcome. Still, given that tacit experts often don’t know themselves why they do well at what they do, simply copying their behaviors seems to work.
    - Owen Cotton-Barratt Feb 17, 2016, 1:13 PM
      0 points
      0 ∶ 0
      Parent
      
      What domains have you found this to work in?
      - Gleb_T Feb 17, 2016, 9:58 PM
        0 points
        0 ∶ 0
        Parent
        
        One domain is social behavior. Emulating the social behavior of people who have high charisma has proved beneficial for me in improving my own charisma, even if the people with high charisma could not explain their own charisma.
- tyleralterman Feb 14, 2016, 6:19 PM
  0 points
  0 ∶ 0
  Parent
  
  
  I tested my predictions against the experts by rating applications for the top 5 candidates myself, then getting the domain expert to rank them and compare scores, watching them doing so.
  
  Ah! This sounds like a great feedback mechanism for one’s expert assessment abilities. I’m going to steal this. =)
- tyleralterman Feb 14, 2016, 6:16 PM
  0 points
  0 ∶ 0
  Parent
  
  
  Tyler’s model seems somewhat helpful here, and adding the components from John’s model improves it again.
  
  +1 - you definitely want to use more signs than the ones I mentioned above to be confident that you have identified sufficient marker of expertise. The ones listed above are only intended to be necessary markers. A good way of generating markers beyond the necessary ones: think about a few people who you can confidently say are experts. What do they have in common? (Please send me any cool markers you’ve come up with! My own list has over 30 now, and it doesn’t seem like ceiling has been hit.)
RomeoStevens Feb 14, 2016, 8:46 AM
11 points
0 ∶ 0

First of all, thanks a lot for spending the time to turn this into a model and polishing it enough to be shared.

Issue: It seems like the model might have trouble filtering people who have detailed but wrong models. I encounter this a lot in the nutrition literature where very detailed and technical models with complex evidence from combinations of in vitro, animal, and some human studies compete against outcome measuring RCTs. As near as I can tell, an expert with a detailed but wrong model can potentially get by 3 of the 4 filters, PI and T. They will have a harder time with F, but my current guess is that the vast majority of experts fail F, because that is where you have loaded most of the epistemic rigor. Consider how rarely (if ever) you have heard a response like the example given for F from real life researchers. You might say “all is well, the vast majority fail and the ones left are highly reliable.” It seems to me however that we must rely on the lower quality evidence from people failing the F filter all the time, simply because in the vast majority of cases there is little to no evidence really passing muster and yet we must make a decision anyway.

Side note: in my estimation The Cambridge Handbook of Expertise would lend support for most of the “work” here being done by F, as opportunities for rapid, measurable feedback is one of the core predictors of performance they point to.

Potential improvement: Rather than a binary pass fail for experts we should like a metric that grades the material they present. Even crude metrics outperform estimates that do not use metrics according to the forecasting literature. Cochrane’s metric for risk of bias, for example, is simply a list of 5 common sources of bias which the reviewer grades as low, high, or unclear, with a short summary of the reasoning. A very simple example would be rating each of the PIFT criteria similarly. This gives some path forward for improvement over time as well: whether or not a low or high score in a particular dimension is actually predicting subsequent expert performance.

I hope you interpret detailed feedback as a +1 and not too punishing. I am greatly encouraged by seeing work on what I consider core areas of improving the quality of EA research.
- tyleralterman Feb 14, 2016, 6:27 PM
  2 points
  0 ∶ 0
  Parent
  
  
  The Cambridge Handbook of Expertise
  
  How worthwhile do you think it would be for someone to read the handbook?
  - RomeoStevens Feb 14, 2016, 8:51 PM
    0 points
    0 ∶ 0
    Parent
    
    I think a skim/outline is worthwhile. It includes lots of object level data which isn’t a great use of time.
- tyleralterman Feb 14, 2016, 6:30 PM
  1 point
  0 ∶ 0
  Parent
  
  
  Potential improvement: Rather than a binary pass fail for experts we should like a metric that grades the material they present.
  
  Agreed. I tried to make it binary for the sake of generating good examples, but the world is much more messy. In the spreadsheet version I use, I try to assign each marker a rating from “none” to “high.”
- tyleralterman Feb 14, 2016, 6:27 PM
  1 point
  0 ∶ 0
  Parent
  
  
  Issue: It seems like the model might have trouble filtering people who have detailed but wrong models.
  
  100%. The model above is only good for assessing necessary conditions, not sufficient ones. I.e., someone can pass all four conditions above and still not be an expert.
John_Maxwell Feb 14, 2016, 11:05 AM
8 points
0 ∶ 0

Good post—I’m glad to see discussion of this topic. Here’s an alternative methodology that takes a more “black box” approach:
- Accomplishments—If someone is able to do something that others find difficult, this is evidence of expertise. Examples: If someone wins a chess tournament, this is evidence of chess expertise. If someone makes correct economic forecasts, this is evidence of economics expertise. (Notably, writing a popular book may mostly indicate expertise in writing popular books—I’ve heard of credentialed people who wrote popular books that were said to be misrepresentations according to field insiders.) Surprisingly, track records of meaningful accomplishments are often ignored in judging expertise.
- Ability × Time Studying Quality Sources—Given the existence of general intelligence) (one of the better replicated areas of psychology?), and other factors predicting general effectiveness, expertise at any intellectual task is evidence of the ability to acquire expertise at other intellectual tasks, given study time. Say I’ve worked with both Person A and Person B on a software development team, and I’m more impressed by the software Person A writes than the software Person B does (see Accomplishments). If I know that both Person A and Person B spent a year getting a master’s degree in a particular math subfield, and they have a disagreement about some aspect of that subfield, I’m more inclined to trust Person A than Person B.
- Recommendations/Transitive Judgements of Expertise—Once you’ve established someone as an expert, you can use their judgements on expertise as evidence about who else is an expert. This can be done recursively to expand your body of recognized experts. For example, Physicist A was part of the team that developed nuclear bombs (see Accomplishments). Physicist A is a professor at The University of X, and Physicist B graduated with a PhD from The University of X. Physicist A was on Physicist B’s doctoral committee and approved Physicist B’s PhD. Physicist C scored high on their GREs and landed a spot studying under Physicist B, eventually obtaining their doctorate (see Ability × Time Studying Quality Sources). Thus the development of the nuclear bomb, plus this chain of recommendations, causes me to believe that Physicist C is a physics expert.
This suggests a heuristic for determining the reliability of degrees in different academic fields: Check to see whether the field has tangible external accomplishments. The fact that physicists managed to invent nuclear bombs suggests to me that physicists have “real expertise”. I don’t know of a comparable achievement on the part of evolutionary psychologists, so I’m less sure evolutionary psychologists have “real expertise”. Although if they seem like intelligent people who have spent a long time thinking carefully about the topic, working from accurate & representative data, I will probably listen to them anyway (see Ability × Time Studying Quality Sources).

Recommendations become less trustworthy if you suspect the person making the recommendation is dishonest or has a conflict of interest. But this applies to listening to expert advice in general: There’s always the risk that a bona fide expert will lead you astray because they don’t like you and want to see you fail, they are more concerned with appearing socially desirable than telling the truth, or they are just having an off day. Most academically certified experts are certified by a group of people, so at that point you start looking at the possibility of bad departmental incentives and other group thinking failures.

I do think university degrees have decent predictive power in distinguishing expertise—universities are incentivized to correctly certify experts in order to maintain their brand, and universities often flaunt the accomplishments of their faculty & graduates (e.g. “We have X Nobel Prize winners on the faculty”) in order to build that brand.

More links:
- http://lesswrong.com/lw/9xs/feed_the_spinoff_heuristic/ - to invert this idea, in order to find someone who has expertise, try to figure out who would have an incentive to make themselves an expert?
- http://lesswrong.com/lw/4ba/some_heuristics_for_evaluating_the_soundness_of/
- http://lesswrong.com/lw/28i/what_is_bunk/
- http://lesswrong.com/lw/eck/how_to_tell_apart_science_from_pseudoscience_in_a/
- Eliezer offers some thoughts about identifying correct contrarians in this essay.
- tyleralterman Feb 14, 2016, 5:50 PM
  2 points
  0 ∶ 0
  Parent
  
  “Check to see whether the field has tangible external accomplishments.”
  
  This is a good one. I think you can decently hone your expertise assessment by taking an outside view which incorporates base-rates of strong expertise in the field amongst average practitioners, as well as the variance. (Say that five-times fast.) For example:
  - Forecasters: very low baserate, high variance
  - Doctors: high baserate, low-medium variance
  - Normal car repairpeople: medium baserate, low-medium variance (In this case, there is a more salient and practical ceiling to expertise. While a boxer might continuously improve her ability to box until she wins all possible matches (a really high ceiling), a repairperson can’t make a car dramatically “more repaired” than others. Though I suppose she might improve her speed at the process.)
  - Users of forks, people who walk, people who can recognize faces: high baserate, low variance
  - Mealsquares founders: enormously high baserate, extremely low variance =)
- John_Maxwell Sep 1, 2016, 7:31 AM
  0 points
  0 ∶ 0
  Parent
  
  I’m cross-posting this excerpt from Thinking Fast and Slow that’s relevant to the question of whether expertise is even possible in a given field. It seems in some cases you are better off using a statistical model.
Andrew_SB Feb 14, 2016, 8:42 AM
4 points
0 ∶ 0

“the measure Pomodoro-maximization might accidentally become the target, even though the intended target is goal completion.”

Nonsense.
- SoerenMind Feb 14, 2016, 1:50 PM
  8 points
  0 ∶ 0
  Parent
  
  Indeed. Utility is merely a vague proxy for pomodoros completed.
- tyleralterman Feb 14, 2016, 6:57 PM
  1 point
  0 ∶ 0
  Parent
  
  I’m glad you think it’s nonsense, since—in some strange state of affairs—a certain unnamed person has been crushing on the communal Pom sheet lately. =P
Stefan_Schubert Feb 14, 2016, 2:06 PM
2 points
0 ∶ 0

The ability to judge others’ competence is incredibly important for organisation effectiveness, and seems to have been quite neglected, e.g. in the rationalism community. I think one important heuristic is to:

a) Identify well-known biases (e.g. people seem to be biased in favour of attractive people).

b) Systematically try to notice whether you might have fallen prey for these biases, e.g. when recruiting. (This is obviously non-trivial, but one might try to come up with techniques which facilitate it. Getting input from others on one’s biases could be one effective if somewhat sensitive technique.)

c) If so, adjust your judgment of the competence of that person downards or upwards (depending on whether you’re positively or negatively biased).
- tyleralterman Feb 14, 2016, 6:58 PM
  0 points
  0 ∶ 0
  Parent
  
  Yup, this is an important thing to keep in the background of expert assessment.
Habryka [Deactivated]Feb 14, 2016, 2:15 AM
2 points
0 ∶ 0

Very happy to see an article on this, since I think EA will have to rely a lot on assessing expertise in domains we don’t know very well.

I think it’s outside of the scope of this article, and seems pretty hard to do, but I would be interested in how this breakdown stacks against empirical data (maybe we have something from the forecasting and expertise literature?), and also just in general see a bit more justification on why you chose this specific set of markers to look out for.

But in general, I am happy to see concrete instructional posts on important topics on the forum.
- Owen Cotton-Barratt Feb 14, 2016, 9:36 AM
  2 points
  0 ∶ 0
  Parent
  
  Agree with this question.
  
  In general, you’ve set yourself up for us to give you a hard time on this article, since it’s putting us in a frame of mind to question expertise, and even suggesting some tools for analysing that. But if we try to use the tools on you for the question of expertise in assessing expertise, it looks like you’re okay on ‘P’ and that we don’t have enough evidence on the rest.
  - tyleralterman Feb 14, 2016, 6:54 PM
    1 point
    0 ∶ 0
    Parent
    
    Well-observed! Here’s my guess on where I rank on the various conditions above:
    
    P—Process: Medium. I think my explicit process is still fairly decent, but my implicit processes still need work. E.g., I might perform well at identifying an expert if you gave me a decent amount of time to check markers with my framework, but I’m not fluent enough in my explicit models to do expertise assessments on the fly very well, Sherlock Holmes-style.
    I—Interaction: Medium. I’ve spent dozens of hours interacting with expertise assessment tasks, as mentioned in the article. However, for much of this interaction with the data, I did not have strong explicit models (I only developed the expert assessment framework last month.) Since my interaction with the data was not very model-guided for the majority of the time, it’s likely that I often didn’t pay attention to the right features of the data. So I may have been rather like Bob above:
    
    Bob, a graphic design novice, pays no attention to the signs and advertisements along the side of the street, even though they are within his field of vision. It may have been that lots of data relating to expertise was literally and metaphorically in my field of vision, but that I wasn’t focusing on it very well, or wasn’t focusing on the proper features.
    
    F—Feedback: Low. Since I’ve only had well-developed explicit models for about a month, I still have only gotten minor feedback on my predictive power. I have run a few predictive exercises—they went well by the n is still small. My primary feedback method has been to generate lots of examples of people I am confident have expertise and check whether each marker can be found in all the examples. I also did the opposite: generate lots of examples of people I am confident lack expertise, and check whether each marker is absent from all the examples. I also used normal proxy methods that one can apply to check the robustness of theories without knowing much about them. (E.g., are there logical contradictions?) I used a couple other methods (e.g., running simulations and checking whether my system 1 yielded error signals), but I’d need to write a full-length article about them for these to make sense. For now, I will just say that they were weak feedback processes, but useful ones. Overall, I looked for correlation between the various feedback methods.
    T—Time: Low-medium. I have probably spent more time training in specifically domain-general expertise assessment relative to most people in the world. But this is not saying much, since domain-general expertise assessment is not a thriving or even recognized field, as far as I can tell. Also, I have been only a small amount of time on the skill relative to the amount of training required to become skilled in domains falling into a similar reference class. (e.g., I think expertise assessment could be it’s own scientific discipline, and people spend years in order to gain sufficient expertise in scientific disciplines.)
Trudy Beerman Mar 11, 2023, 6:42 PM
1 point
0 ∶ 0

Is there any online tool you know that does an initial screening to narrow the pool invited for a deeper dive into expertise? Also, expertise is domain specific as I can be an expert at writing music but not singing, so my expertise in music (generally) may be low, but it could be high in lyric writing. So is anyone aware of a high-level screening tool?
Gleb_T Feb 17, 2016, 2:31 AM
0 points
0 ∶ 0

Since no one else brought this up yet, I thought it might be relevant to consider the Wisdom of Crowds here.

According to that book, in general the evaluations/recommendations of the best experts are no better than and often worse than a group containing one middle-level expert and several non-experts. So it might be worthwhile to compare the recommendation of an expert you think is great to the recommendation of a group with an expert you think is simply good. Just an idea, but something worth playing around with, I think.