I don’t know what I think about the reasonableness of these specific evaluations, about how useful this sort of evaluation approach is, or about whether I’d like to see more of this sort of thing in future and exactly what form it should take. (To be clear, I literally just mean “I don’t know”, rather than meaning “I think this all sucks, but I’m being polite.”) But I think it’s plausible that this or something like it would be very valuable and should be scaled up substantially, so I think exploring the idea at least a bit is definitely worthwhile in expectation.
I’d be interested to hear roughly how long this whole process took you (or how long it took minus writing the actual post, or something)? This seems relevant to how worthwhile and scalable this sort of thing is.
(Of course, the process may become much faster as the people doing it become more experienced, better tools or templates for it are built, etc. But it may also become slower if one aims for more rigour / less pulling things out of thin air. In any case, I think how long this early attempt took should give at least a rough idea.)
I also had a bunch of reactions that aren’t especially important since they’re focused on specific points about each evaluation, rather than on the basic methods and how this sort of analysis can be useful. I’ll split them into seperate comments.
Recently Nuño asked me to do similar (but shallower) forecasting for ~150 project ideas. It took me about 5 hours. I think I could have done the evaluation faster but I left ~paragraph-long comments on like ⅓ to ½ projects and sentence long comments on most others; I haven’t done any advanced modeling or guesstimating.
I’d be interested to hear roughly how long this whole process took you (or how long it took minus writing the actual post, or something)? This seems relevant to how worthwhile and scalable this sort of thing is.
Maybe an afternoon for the initial version, and then two weeks of occasional tweaks. Say 10h to 30h in total? I imagine that if one wanted to scale this, one could get it to 30 mins to an hour for each estimate.
I think that that seems promisingly fast to me, given that this was an early attempt and could probably be sped up (holding quality/rigour constant) by experience, tools, templates, etc. So that updates me a bit further towards enthusiasm about this general idea.
I’d also note that the larger goals are to scale in non-human ways. If we have a bunch of examples, we could:
1) Open this up to a prediction-market style setup, with a mix of volunteers and possibly inexpensive hires. 2) As we get samples, some people could use data analysis to make simple algorithms to estimate the value of many more documents. 3) We could later use ML and similar to scale this further.
So even if each item were rather time-costly right now, this might be an important step for later. If we can’t even do this, with a lot of work, that would be a significant blocker.
Overall thoughts
Thanks, I found this post interesting.
I don’t know what I think about the reasonableness of these specific evaluations, about how useful this sort of evaluation approach is, or about whether I’d like to see more of this sort of thing in future and exactly what form it should take. (To be clear, I literally just mean “I don’t know”, rather than meaning “I think this all sucks, but I’m being polite.”) But I think it’s plausible that this or something like it would be very valuable and should be scaled up substantially, so I think exploring the idea at least a bit is definitely worthwhile in expectation.
I’d be interested to hear roughly how long this whole process took you (or how long it took minus writing the actual post, or something)? This seems relevant to how worthwhile and scalable this sort of thing is.
(Of course, the process may become much faster as the people doing it become more experienced, better tools or templates for it are built, etc. But it may also become slower if one aims for more rigour / less pulling things out of thin air. In any case, I think how long this early attempt took should give at least a rough idea.)
I also had a bunch of reactions that aren’t especially important since they’re focused on specific points about each evaluation, rather than on the basic methods and how this sort of analysis can be useful. I’ll split them into seperate comments.
Recently Nuño asked me to do similar (but shallower) forecasting for ~150 project ideas. It took me about 5 hours. I think I could have done the evaluation faster but I left ~paragraph-long comments on like ⅓ to ½ projects and sentence long comments on most others; I haven’t done any advanced modeling or guesstimating.
Maybe an afternoon for the initial version, and then two weeks of occasional tweaks. Say 10h to 30h in total? I imagine that if one wanted to scale this, one could get it to 30 mins to an hour for each estimate.
I think that that seems promisingly fast to me, given that this was an early attempt and could probably be sped up (holding quality/rigour constant) by experience, tools, templates, etc. So that updates me a bit further towards enthusiasm about this general idea.
I’d also note that the larger goals are to scale in non-human ways. If we have a bunch of examples, we could:
1) Open this up to a prediction-market style setup, with a mix of volunteers and possibly inexpensive hires.
2) As we get samples, some people could use data analysis to make simple algorithms to estimate the value of many more documents.
3) We could later use ML and similar to scale this further.
So even if each item were rather time-costly right now, this might be an important step for later. If we can’t even do this, with a lot of work, that would be a significant blocker.
https://www.lesswrong.com/posts/kMmNdHpQPcnJgnAQF/prediction-augmented-evaluation-systems