Jan_Kulveit comments on Request for comments: EA Projects evaluation platform

Jan_Kulveit 20 Mar 2019 23:26 UTC
10 points
0 ∶ 0
To make the discussions more useful, I’ll try to briefly recapitulate parts of the discussions and conversations I had about this topic in private or via comments in the draft version. (I’m often coalescing several views into more general claim)
There seems to be some disagreement about how rigorous and structured the evaluations should be—you can imagine a scale where on one side you have just unstructured discussion on the forum, and on the opposite side you have “due diligence”, multiple evaluators writing detailed reviews, panel of forecasters, and so on.
My argument is: unstructured discussion on the forum is something we already have, and often the feedback project ideas get is just a few bits from voting, plus a few quick comments. Also the prevailing sentiment of comments is sometimes at odds with expert views or models likely used by funders, which may cause some bad surprises. That is too “light”. The process proposed here is closer to the “heavy” end of the scale. My reason is it seems easier to tune the “rigour” parameter down than up, and trying it on a small batch has higher learning value.
Another possible point of discussion is whether the evaluation system would work better if it was tied to some source of funding. My general intuition is this would create more complex incentives, but generally I don’t know and I’m looking for comments.
Some people expressed uncertainty if there is a need for such system. Some because they believe that there aren’t many good project ideas or projects (especially unfunded ones). Others expressed uncertainty if there is a need for such system, because they feel proposed projects are almost all good, there are almost no dangerous project ideas, and even small funders can choose easily. I don’t have good data, but I would hope having largely public evaluations could at least help everyone to be better calibrated. Also, when comparing the “EA startup ecosystem” with the normal startup ecosystem, it seems we are often lacking what is provided by lead investors, incubators or mentors.
- Habryka [Deactivated] 20 Mar 2019 23:48 UTC
  19 points
  0 ∶ 0
  Parent
  (Here are some of the comments that I left on the draft version of this proposal that I was sent, split out over multiple comments to allow independent voting):
  [Compared to an open setup where any reviewer can leave feedback on any project in an open setting like a forum thread] An individual reviewer and a board is much less likely to notice problems with a proposal, and a one-way publishing setup is much more likely to cause people to implement bad projects than a setup where people are actively trying to coordinate work on a proposal in a consolidated thread. In your setup the information is published in at least two locations, and the proposal is finalized and no longer subject to additional input from evaluators, which seems like it would very likely cause people to just take an idea and run with it, without consulting with others whether they are a good fit for the project.
  Setting up an EA Forum thread with good moderation would take a lot less than 20 hours.
  You are also planning to spend over 40 hours in evaluating projects for the first phase, which is quite costly, and you are talking about hiring people part-time. [...]
  From my perspective, having this be in the open makes it a lot easier for me and other funders in the space to evaluate whether the process is going well, whether it is useful, or whether it is actively clogging up the EA funding and evaluation space. Doing this in distinct stages, and with most of the process being opaque, makes it much harder to figure out the costs of this, and the broader impact it has on the EA community, moving the expected value of this into the net-negative.
  I also think that we already have far too many random EA websites that are trying to do a specialized thing, and not doing it super well. The whole thing being on a separate website will introduce a lot of trivial inconveniences into the process that could be avoided by just having all of it directly on the EA Forum.
  To put this into perspective, you are requesting about a total of 75h − 160h of volunteer labor for just this first round of evaluations, not counting the input from the panel which will presumably have to include already very busy people who can judge proposals. That in itself is almost a full month of work, and you are depleting a limited resource of the goodwill of effective-altruists in doing this, and an even more limited resource of the most busy people to help with initiatives like this.
  - Jan_Kulveit 21 Mar 2019 0:07 UTC
    1 point
    0 ∶ 0
    Parent
    As I’ve already explained in the draft, I’m still very confused by what
    An individual reviewer and a board is much less likely to notice problems with a proposal than a broad discussion with many people contributing would …
    should imply for the proposal. Do you suggest that steps 1b. 1d. 1e. are useless or harmful, and having just the forum discussion is superior?
    The time of evaluators is definitely definitely definitely not free, and if you treat them as free then you end up exactly in the kind of situation that everyone is complaining about. Please respect those people’s time.
    Generally I think this is quite strange misrepresentation of how I do value people’s time and attention. Also I’m not sure if you assume the time people spend arguing on fora is basically free or does not count, because it is unstructured.
    From my perspective, having this be in the open makes it a lot easier for me and other funders in the space to evaluate whether the process is going well, whether it is useful, or whether it is actively clogging up the EA funding and evaluation space. Doing this in distinct stages, and with most of the process being opaque, makes it much harder to figure out the costs of this, and the broader impact it has on the EA community, moving the expected value of this into the net-negative.
    Generally almost all of the process is open, so I don’t see what should be changed. If the complain is the process has stages instead of unstructured discussion, and this makes it less understandable for you, I don’t see why.
    - Habryka [Deactivated] 21 Mar 2019 0:44 UTC
      20 points
      0 ∶ 0
      Parent
      My overall sense of this is that I can imagine this process working out, but the first round of this should ideally just be run by you and some friends of yours, and should not require 100+ hours of volunteer time. My expectation is that after you spend 10 hours trying to actually follow this process, with just one or two projects, on your own or with some friends, that you will find that large parts of it won’t work as you expected and that the process you designed is a lot too rigid to produce useful evaluations.
    - Habryka [Deactivated] 21 Mar 2019 0:33 UTC
      13 points
      0 ∶ 0
      Parent
      As I’ve already explained in the draft, I’m still very confused by what [...] should imply for the proposal. Do you suggest that steps 1b. 1d. 1e. are useless or harmful, and having just the forum discussion is superior?
      I am suggesting that they are probably mostly superfluous, but more importantly, I am suggesting that a process that tries to separate the public discussion into a single stage, that is timeboxed at only a week, will prevent most of the value of public discussion, because there will be value from repeated back and forth at multiple stages in this process, and in particular value from integrating the step of finding a team for a project with the process of evaluating a proposal.
      To give you an example, I expect that someone will have an idea for a project that is somewhat complicated, and will write an application trying their best to explain it. I expect for the majority of projects the evaluators will misunderstand what the project is about (something I repeatedly experienced for project proposals on the LTF-Fund), and will then spend 2-5 hours writing a negative evaluation for a project that nobody thought was a good idea. The original person who proposed the project will then comment during the public discussion stage and try to clarify their idea, but since this process currently assigns most of the time for the evaluators and board members in the evaluation stage, there won’t be any real way in which he can cause the evaluators to reevaluate the proposal, since the whole process is done in batches and the evaluators only have that many hours set aside (and they already spend 2-5 hours on writing an evaluation of the proposal).
      On the other hand, if the evaluators are expected to instead participate mostly in a back-and-forth discussion over the course of a week, or maybe multiple weeks, then I think most likely the evaluators would comment with some initial negative impressions of the project which would probably be written in 5-10 minutes. The person writing the proposal would respond and clarify, and then the evaluator would ask multiple clarifying questions until they have a good sense of the proposal. Ideally, the person putting in the proposal would also be the person interested in working on it, and so this back-and-forth would also allow the evaluator to determine whether this person is a good fit for the project, and allow other people to volunteer their time to participate and help with the project. The thread itself would serve as the location for other people to find interesting projects to work on, and to get up to speed on who is working on what projects.
      ---
      I also think that assigning two evaluators to each project is a lot worse than assigning evaluators in general and allowing them to chime in when they have pre-existing models for projects. I expect that if they don’t have pre-existing models in the domain that a project is in, an evaluator will find it almost impossible to write anything useful about that project, without spending many hours building basic expertise in that domain. This again suggests a setup where you have an open pool of proposals, and a group of evaluators who freely choose which projects to comment on, instead of being assigned individual projects.
      - Jan_Kulveit 21 Mar 2019 1:12 UTC
        4 points
        0 ∶ 0
        Parent
        I don’t understand why you assume the proposal is intended as something very rigid, where e.g. if we find the proposed project is hard to understand, nobody would ask for clarification, or why you assume the 2-5h is some dogma. The back-and-forth exchange could also add to 2-5h.
        With assigning two evaluators to each project you are just assuming the evaluators would have no say in what to work on, which is nowhere in the proposal.
        Sorry but can you for a moment imagine also some good interpretation of the proposed schema, instead of just weak-manning every other paragraph?
        Habryka [Deactivated] 21 Mar 2019 1:39 UTC
        18 points
        0 ∶ 0
        Parent
        I am sorry for appearing to be weak-manning you. I think you are trying to solve a bunch of important problems that I also think are really important to work on, which is probably why I care so much about solving them properly and have so many detailed opinions about how to solve them. While I do think we have strong differences in opinion on this specific proposal, we probably both agree on a really large fraction of important issues in this domain, and I don’t want to discourage you from working in this domain, even if I do think this specific proposal is a bad idea.
        Back to the object level: I think as I understand the process, the stages have to necessarily be very rigid because they require the coordination of 5+ volunteers, a board, and a set of researchers in the community, each of which will have a narrow set of responsibilities like writing a single evaluation or having meetings that need to happen at a specific point in time.
        I think coordinating that number of people gives naturally rise to very rigid structures (I think even coordinating a group of 5 full-time staff is hard, and the amount of structure goes up drastically as individuals can spend less time), and your post explicitly says that step 1.c, is the step in which you expect back and forth with the person who proposed the project, making me think that you do not expect back and forth before that stage. And if you do expect back-and-forth before that stage, then I think it’s important that you figure out a way to make that as easy as possible, and given the difficulty of coordinating large numbers of people, I think if you don’t explicitly plan for making it easy, it won’t happen and won’t be easy.
        Jan_Kulveit 21 Mar 2019 2:07 UTC
        3 points
        0 ∶ 0
        Parent
        I don’t see why continuous coordination of a team of about 6 people on slack would be very rigid, or why people would have very narrow responsibilities.
        For the panel, having some defined meeting and evaluating several projects at once seems time and energy conserving, especially when compared to the same set of people watching the forum often, being manipulated by karma, being in a way forced to reply to many bad comments, etc.
    - Habryka [Deactivated] 21 Mar 2019 0:38 UTC
      12 points
      0 ∶ 0
      Parent
      Generally almost all of the process is open, so I don’t see what should be changed. If the complain is the process has stages instead of unstructured discussion, and this makes it less understandable for you, I don’t see why.
      One part of the process that is not open is the way the evaluators are writing their proposals, which is as I understand it where the majority of person-time is being spent. It also seems that all the evaluations are going to be published in one big batch, making it so that feedback on the evaluation process would take until the complete next grant round to be acted on, which is presumably multiple months into the future.
      The other process that is not open are these two stages:
      1d. A panel will rate the proposal, utilizing the information gathered in phases b. and c., highlighting which part of the analysis they consider particularly important. (90m / project)
      1e. In case of disagreement among the panel, the question will get escalated and discussed with some of the more senior people in the field.
      I expect the time of the panel, as well as the time of the more senior people in the field are the most valuable resources that could be wasted by this process, and the current process gives very little insight into whether that time is well-spent or not. In a simple public forum setup, it would be easy to see whether the overall process is working, and whether the contributions of top people are making a significant difference.
      - Jan_Kulveit 21 Mar 2019 0:50 UTC
        2 points
        0 ∶ 0
        Parent
        With the first part, I’m not sure what would you imagine as the alternative—having access to evaluators google drive so you can count how much time they spent writing? The time estimate is something like an estimate how much it can take for volunteer evaluators—if all you need is in the order of 5m you are either really fast or not explaining your decisions.
        I expect much more time of experts will be wasted in forum discussions you propose.
        Habryka [Deactivated] 21 Mar 2019 1:27 UTC
        8 points
        0 ∶ 0
        Parent
        I think in a forum discussion, it’s relatively easy to see how much someone is participating in the discussion, and to get a sense of how much time they spent on stuff. I am not super confident that less time would be wasted in the forum discussions I am proposing, but I am confident that I and others would notice if lots of people’s time was wasted, which is something I am not at all confident about for your proposal and which strongly limits the downside for the forum case.
        Jan_Kulveit 21 Mar 2019 1:56 UTC
        3 points
        0 ∶ 0
        Parent
        On the contrary: on slack, it is relatively easy to see the upper bound of attention spent. On the forum, you should look not on just the time spent to write comments, but also on the time and attention of people not posting. I would be quite interested how much time for example CEA+FHI+GPI employees spend reading the forum, in aggregate (I guess you can technically count this.)
        Habryka [Deactivated] 21 Mar 2019 2:04 UTC
        4 points
        0 ∶ 0
        Parent
        *nods* I do agree that you, as the person organizing the project, will have some sense of how much time has been spent, but I think it won’t be super easy for you to communicate that knowledge, and it won’t by default help other people get better at estimating the time spent on things like this. It also requires everyone watching to trust you to accurately report those numbers, which I do think I do, but I don’t think everyone necessarily has reason to.
        I do think on Slack you also have to take into account the time of all the people not posting, and while I do think that there will be more time spent just reading and not writing on the EA Forum, I generally think the time spent reading is usually worth it for people individually (and importantly people are under no commitment to read things on the EA Forum, whereas the volunteers involved here would have a commitment to their role, making it more likely that it will turn out to be net-negative for them, though I recognize that there are some caveats where sometimes there are controversial topics that cause a lot of people to pay attention to make sure that nothing explodes).
    - Habryka [Deactivated] 21 Mar 2019 0:41 UTC
      1 point
      0 ∶ 0
      Parent
      Nevermind.
- Habryka [Deactivated] 21 Mar 2019 0:10 UTC
  15 points
  0 ∶ 0
  Parent
  To respond more concretely to the “due diligence” vs. unstructured discussion section, which I think refers to some discussion me and Jan had on the Google doc he shared:
  I think the thing I would like to see is something that is just a bit closer towards structured discussion than what we currently have on the forum. I think there doesn’t currently exist anything like an “EA Forum Project discussion thread” and in particular not one that has any kind of process like
  “One suggestion for a project per top-level comment. If you are interested in working on a project, I will edit the top-level post to reflect that you are interested in working on it. If you want to leave a comment anonymously, please use this form.”
  I think adding a tiny bit of process like this will cause there to be valuable discussion, will actually be better at causing good projects to be funded and for teams to start working on it, and is much less effort to set up than the process you are proposing here.
  I am also worried that this process, even though it is already 7 stages long and involves at least 10 people, only covers less than half of the actual pipeline towards causing people to work on projects. I know that you want to explicitly separate the evaluation of projects from the evaluation of teams working on those project, but I don’t think you can easily do that.
  I think 90% of the time whether a project is good or bad depends on the team that wants to work on it, which is something that you strongly see reflected in the startup and investment world. It’s extremely rare for a VC to fund or even evaluate a project without knowing what team is working on it, and I think you will find that any evaluation that doesn’t include the part of matching up the teams with the projects will find that that part will quickly block any progress on this.
  - Aaron Gertler 🔸 21 Mar 2019 7:10 UTC
    16 points
    0 ∶ 0
    Parent
    I share Habryka’s concern for the complexity of the project; each step clearly has a useful purpose, but it’s still the case that adding more steps to a process will tend to make it harder to finish that process in a reasonable amount of time. I think this system could work, but I also like the idea of running a quick, informal test of a simpler system to see what happens.
    Habryka, if you create the “discussion thread” you’ve referenced here, I will commit to leaving at least one comment on every project idea; this seems like a really good way to test the capabilities of the Forum as a place where projects can be evaluated.
    (It would be nice if participants shared a Google Doc or something similar for each of their ideas, since leaving in-line comments is much better than writing a long comment with many different points, but I’m not sure about the best way to turn “comments on a doc” into something that’s also visible on the Forum.)
    - Habryka [Deactivated] 21 Mar 2019 22:03 UTC
      10 points
      0 ∶ 0
      Parent
      I am currently quite busy, so only 50% on me finding the time to do it, but I will seriously look into making the time for this. I am also happy to chat to anyone else who wants to do this, and help out both with setting it up, and to participate in the thread.
    - Jan_Kulveit 21 Mar 2019 22:55 UTC
      7 points
      0 ∶ 0
      Parent
      FWIW, part of my motivation for the design, was
      1. there may be projects, mostly in long-term, x-risk, meta- and outreach spaces, which are very negative, but not in an obvious way
      2. there may be ideas, mostly in long-term and x-risk, which are infohazard
      The problem with 1. is most of the EV can be caused by just one project, with large negative impact, where the downside is not obvious to notice.
      It seems to me standard startup thinking does not apply here, because startups generally can not go way bellow zero.
      I also do not trust arbitrary set of forum users to handle this well.
      Overall I believe the very lightweight unstructured processes are trading some gain in speed and convenience in most cases for some decreased robustness in worst cases.
      In general I would feel much better if the simple system you want to try would avoid projects in long-term, x-risk, meta-, outreach, localization, and “searching for cause X” areas.
- Habryka [Deactivated] 20 Mar 2019 23:40 UTC
  13 points
  0 ∶ 0
  Parent
  Here are some of the comments that I left on the draft version of this proposal that I was sent (split out over multiple comments to allow independent voting):
  I continue to think that just having an open discussion thread, with reviewers participating in the discussion with optional private threads, will result in a lot more good than this.
  Based on my experience with the LTF-Fund, I expect 90% of the time there will be one specific person who you need a 5 minute judgement from to judge a project, much more than you need a 2-5h evaluation. This makes an open setup where all evaluators can see all applications and provide input on the things they are particularly suited to contribute to a lot more valuable than an assignment process.
  A simple invite-only, or fully open, discussion is also much easier to test than a more elaborate evaluation system, and I think you are overestimating the risk from infohazards and PR risk after some initial screening.
  I do think it is important to allow reviewers to be completely anonymous when participating in the discussion.
  After some more discussion:
  My perspective is more that the simple intervention of:
  “Create an EA Forum projects thread, with some incentive for people to leave reviews of projects”
  should be tried before you do something as complicated as this. I agree that the resulting incentives can be messy, but I expect we will get a lot more data and information on what is important than we would by spending 20-50 hours of competent-person time on producing reviews plus setting up a vetting process, plus setting up a website, plus setting up a panel, plus setting up an infohazard policy before we try the obvious solution to the problem that takes 5 hours to implement.
  [...]
  I am pretty excited about someone just trying to create and moderate a good EA Forum thread, and it seems pretty plausible to me that the LTF fund would be open to putting something in the $20k ballpark into incentives for that
  - Jan_Kulveit 21 Mar 2019 0:52 UTC
    7 points
    0 ∶ 0
    Parent
    I would be curious about you model why the open discussion we currently have does not work well—like here, where user nonzerosum proposed a project, the post was heavily downvoted (at some point to negative karma) without substantial discussion of the problems. I don’t think the fact that I read the post after three days and wrote some basic critical argument is a good evidence for an individual reviewer and a board is much less likely to notice problems with a proposal than a broad discussion with many people contributing would.
    Also when you are making these two claims
    Setting up an EA Forum thread with good moderation would take a lot less than 20 hours.
    ...
    I am pretty excited about someone just trying to create and moderate a good EA Forum thread, and it seems pretty plausible to me that the LTF fund would be open to putting something in the $20k ballpark into incentives for that
    at the same time I would guess it probably needs more explanation from you or other LTF managers.
    Generally I’m in favour of solutions which are quite likely to work as opposed to solutions which look cheap but are IMO likely worse.
    I also don’t see how complex discussion on the forum with the high quality reviews you imagine would cost 5 hours. Unless, of course, the time and attention of the people who are posting and commenting on the forum does not count. If this is the case, I strongly disagree. The forum is actually quite costly in terms of time, attention, and also emotional impacts on people trying to participate.
    - Habryka [Deactivated] 21 Mar 2019 1:23 UTC
      33 points
      0 ∶ 0
      Parent
      I also don’t see how complex discussion on the forum with the high quality reviews you imagine would cost 5 hours.
      I think an initial version of the process, in which you plus maybe one or two close collaborators, would play the role of evaluators and participate in an EA Forum thread, would take less than 5 hours to set up and less than 15 hours of time to actually execute and write reviews on, and I think would give you significant evidence about what kind of evaluations will be valuable and what the current bottlenecks in this space are.
      I would be curious about you model why the open discussion we currently have does not work well—like here, where user nonzerosum proposed a project, the post was heavily downvoted (at some point to negative karma) without substantial discussion of the problems.
      I think that post is actually a good example of why a multi-stage process like this will cause a lot of problems. I think the best thing for nonzerosum to do would have been to create a short comment or post, maybe two to three paragraphs, in which he explained the basic idea of a donor list. At this point, he would have not been super invested in it, and I think if he had posted only a short document, people would have reacted with openness and told him that there has been a pretty long history of people trying to make lots of EA donor coordination platforms, and that there are significant problems with unilateralist curse-like problems. I think the downvotes and negative reaction came primarily from people perceiving him to be prematurely charging ahead with a project.
      I do think you need some additional incentive for people to actually write up their thoughts in addition to just voting on stuff, which is why a volunteer evaluator group, or maybe some kind of financial incentive, or maybe just some kind of modifications to the forum software (which I recognize is not something you can easily do but which I have affordances for), is a good idea. But I do think you want to be very hesitant to batch the reviews too much, because as I mentioned elsewhere in the thread, there is a lot of value from fast feedback loops in this evaluation process, as well as allowing experts in different domains to chime in with their thoughts.
      And we did see exactly that. I think the best comment (next to yours) on that post is Ben West’s comment and Aaron Gertler’s comments that were both written relatively soon after the post was written (and I think would have been written even if you hadn’t written yours) and concisely explained the problems with the proposal. I don’t think a delay of 2-3 days is that bad, and overall I think nonzerosum successfully received the feedback that the project needed. I do think I would like to ensure that people proposing projects feel less punished by doing so, but I think that can easily be achieved by establishing a space in which there is common knowledge that a lot of proposals will be bad and have problems, and that a proposal being proposed in that space does not mean that everyone has to be scared that someone will rush ahead with that proposal and potentially cause a lot of damage.
      If I understood your setup correctly, it would have potentially taken multiple weeks for nonzerosum to get feedback on their proposal, and the response would have come in the form of an evaluation that took multiple hours to write, which I don’t think would have benefited anyone in this situation.