Hi Michael, sorry for coming back a bit late. I was flying out of SF
For costs, Iâm going to stand strongly by my number here. In fact, I think it should be $26k. I treated everyone the same and counted the value of their time, at their suggested rate, for the amount of time they were doing their work. This affected everyone, and I think it is a much more accurate way to measure things. This affected AI Species and Doom Debates quite severely as well, more so than you as well as others. I briefly touched on this in the post, but Iâll expand here. The goal of this exercise isnât to measure money spent vs output, but rather cost effectiveness per resource put in. If time is put in unpaid, this should be accounted for since it isnât going to be unpaid forever. Otherwise, there will be âgamedâ cost effectiveness, where you can increase your numbers for a time by not paying yourself. Even if you never planned to take funding, you could spend your time doing other things, and thus there is still a tradeoff. Itâs natural/ânormal for projects at the start to be unpaid and apply for funding later, and for a few months of work to go unpaid. For example, I did this work unpaid, but if I am going to continue to do CEA for longtermist stuff, I will expect to be paid eventually.
In your case, your first TikTok was made on June 15, and given that you post ~1/âday, I assume that you basically made the short on the same day. Given I made your calculations on Sept 9â10, thatâs 13 weeks. In your Manifund post, you are asking for $2k/âweek, and thus I take that to be your actual cost of doing work. Iâm not simply measuring your âwork done on the grantâ and just accepting the time you did for free beforehand.
2. Iâm happy to take data corrections. Undoubtedly, I have made some mistakes since not everyone responded to me, data is a bit messy, etc.
A) For the FLI podcast, I ran numbers for the whole channel, not just the podcast. That means their viewer minutes are calculated over the whole channel. They havenât gotten back to me yet so I hope to update their numbers when they do. I agree that their metrics have a wide error bar.
B) I was in the process of coming up with better estimates of viewer minutes based on video/âpodcast length but I stopped because people were responding to me, and I thought it better to just use accurate numbers. I stand by this decision, though I acknowledge the tradeoff.
C) If a video has âinflatedâ views due to paid advertising, thatâs fine. It shows up in the cost part of cost-effectiveness. For example, Cognitive Revolution, who does boosts their videos/âadvertising, thatâs part of their costs. I donât think its a problem that some viewers are paid for, maybe they see a video they otherwise wouldnât have. Thatâs fine. I also acknowledge that others may feel differently about how this translates to impact. That said, no, this wonât reduce QAVM/â$ to simply the cost of ads. Ads just donât work very well without organic views.
3. For Rational Animations showing up low on the list, the primary reason for this is that they spend a boatload of money; nobody else comes close. Iâm not saying thatâs bad. Its just a fact. They spend more than everyone else combined. Since, I am dividing by dollars, they get a lower VM/â$ and thus QAVM/â$.
If you wish, you can simply look at VM/â$. They score low here too (8th, same as adjusted).
As for giving Robert Miles a high ranking, this came about because Austin really thought Dwarkesh was an AI safety YouTuber, and so I asked ~50 people different variants of the question âwho is your favourite AI safety creatorâ, âWhich AI safety YouTubers did/âdo you watchâ, etc. Itâs hard to overstate this; Robert Miles was the first person EVERYONE mentioned. I found this surprising since, well, his view counts donât bear that out. Furthermore, 3 people told me that they work in AI safety because of his videos. I think there is a good case that his adjustment factors should be far HIGHER, not lower.
4. Regarding your weights. I encourage people (and did so in the post) to give their own weights to channels. For this exercise, I watched a lot of AI safety content to get a sense of what is out there. My quality weights were based on this (and discussions with Austin and others). I encourage you to consider each weight separately. Austin added the âtotalâ quality factor at the end, I kinda didnât want it since I thought it could lead to this. For audience quality, I looked at things like TikTok viewership vs. YouTube/âApple Podcasts. For message fidelity, respectfully, youâre just posting clips of other podcasts and such and this just doesnât do a great job of getting a message across. For fidelity of message, everyone but Rob Miles got <0.5 since I am comparing to Rob. With a different reference video, I would get different results. For Qm, your number is very similar to others but even still, I found the message to not be the best.
Again, most of the quality factor is being done by audience quality and yes, shorts just have a far lower audience quality.
On the data errors, as expressed above, I donât think I made data errors. I get the sense, while reading this, that you feel I was âout to get youâ or something and was being willfully biased. I want to assure you that this wasnât the case. Lots of different creators have issues with how I decided to do this analysis, and in general, they wanted the analysis to be done in a way that would give them better numbers. I think thatâs human nature, partially, and also that they likely made their content with their assumptions in mind. In the end, I settled on this process as what I (and Austin) found to be the most reasonable, taken everything we learned into account. I am not saying my analysis is the be-all end-all and should dictate where Open Phil money goes tomorrow until further analysis is done.
I hope that explains/âanswers all your points. I am happy to engage further.
1) Feel free to use $26k. My main issue was that you didnât ask me for my viewer minutes for TikTok (EDIT: didnât follow up to make sure I give you the viewer minutes for TikTok) and instead used a number that is off by a factor of 10. Please use a correct number in future analysis. For June 15 - Sep 10, thatâs 4,150,000 minutes, meaning a VM/â$ of 160 instead of 18 (details here).
A) Your screenshots of google sheets say âFLI podcastâ, but you ran your script on the entire channel. And you say that the budget is $500k. Can you confirm what youâre trying to measure here? The entire video work of FLI? Just the podcast? If youâre trying to get the entire channel, is the budget really $500k for the entire thing? Iâm confused.
B) If you use accurate numbers for some things and estimate for others, Iâd make sure to communicate explicitly about which ones are which. Even then, when you then compare estimates and real numbers thereâs a risk that your estimates are off by a a huge factor (has happened with my TikTok numbers), which makes me question the value of the comparisons.
C) Let me try to be clearer regarding paid advertising:
If some of the watchtime estimates you got from people are (views * 33% of length), and they pay $X per view (fixed cost of ads on youtube), then the VM/â$ will be: [nb_views * (33% length) /â total_cost] = [ nb_views * 33% length] /â [nb_views * X] = [33% length /â X]. Which is why I mean itâs basically the cost of ads. (Note: I didnât include the organic views above because Iâm assuming theyâre negligible compared to the inorganic ones. If you want me to give examples of videos where I see mostly inorganic views, Iâll send you by DM).
For the cases where you got the actual watchtime numbers instead of multiplying the length by a constant or using a script (say, someone tells you they have Y amount of hours total on their channel), or the ads lead to real organic views, your reasoning around ads makes sense, though Iâd still argue that in terms of impact the engagement is the low /â pretty disastrous in some cases, and does not translate to things we care about (like people taking action).
3. I think the questions âwho is your favourite AI safety creatorâ or âWhich AI safety YouTubers did/âdo you watchâ are heavily biased towards Robert Miles, as he is (and has basically been for the past 8 years) the only âAI Safety Youtuberâ (like making purely talking head videos about AI Safety, in comparison, RA is a team). So I think based on these questions itâs quite likely heâd be mentioned, though I agree 50 people saying his name first is important data that needs to be taken into account.
That said, Iâm trying to wrap my head around how to go from your definition of âquality of audienceâ to âRobert Miles was chosen by 50 different people to be their favorite youtuber, as the first person mentionedâ. My interpretation is that youâre saying: 1) youâve spoken to 50 people who are people who work in AI Safety 2) they all mentioned Rob as the canonical Youtuber, so âthereforeâ A) Rob has the highest quality audience? (cf. you wrote in OP âThis led me to make the âaudience qualityâ category and rate his audience much higher.â)
My model for how this claim could be true that 1) you asked 50 people who you all thought were âhigh qualityâ audience 2) they all mentioned rob and nobody else (or rarely nobody else), so 3) you inferred âhigh quality audience â watches Robâ and therefore 4) also inferred âwatches Rob â high qualityâ?
4. Regarding weights, also respectfully, I did indeed look at them individually. You can check my analysis for what I think the TikTok individual weights should be here. For Youtube see here. Regarding your points:
I have posted in my analysis of tiktok a bunch of datapoints that you probably donât have about the fact that my audience is mostly older high-income people from richer countries, which is unusually good for TikTok. Which is why I put 3 instead of your 2.
âyouâre just posting clips of other podcasts and such and this just doesnât do a great job of getting a message acrossâ â the clips that end up making the majority of viewer minutes are actually quite high fidelity since theyâre quite long (2-4m long) and get the message more crisply than the average podcast minute. Anyway, once you look at my TikTok analysis youâll see that I ended up dividing everything by 2 to have the max fidelity tiktok have 0.5 (same as Cognitive Revolution), which means my number is Qf=0.45 at the end (instead of your 0.1) to just be coherent with the rest of your numbers.
Qm: thatâs subjective but FWIW I myself only align to 0.75 to my TikTok and not 1 (see analysis)
âAgain, most of the quality factor is being done by audience quality and yes, shorts just have a far lower audience quality.â --> again, respectfully, from looking at your tables I think this is false. You rank the fidelity of TikTok as 0.1, which is 5x less than 4 other channels. No other channels except my content (TikTok & YT) has less 0.3. In comparison, if you forget about robâs row, the audience quality varies only by 3x between my Qa for TikTok and the rest. So no, the quality factor is not mainly done by audience quality.
On 1, with your permission, Iâd ask if I could share a screenshot of me asking you in DMs, directly, for viewer minutes. You gave me views, and thus I multiplied the average TikTok length and by a factor for % watched.
On A, yes, the FLI Podcast was perhaps the data point I did the most estimating for a variety of reasons I explained before.
On B, I think you can, in fact, find which are and arenât estimates though I do understand how itâs not clear. We considered ways of doing this without being messy. Ill try to make it more clear.
On C, how much you pay for a view is not a constant though. It depends a lot on organic views. And I think boosting videos is a sensible strategy since you put $ into both production costs (time, equipment, etc.) and advertisement. FIguring out how to spend that money efficiently is important.
On 3, many other people were mentioned. In fact, I found a couple of creators this way. But yes, it was extremely striking and thus suggested that this was a very important factor in the analysis. I want to stress that I do in fact, think that this matters a lot. When Austin and I were speaking and relying on comparisons, we thought his quality numbers should be much higher in fact, we toned it down though maybe we shouldnât have.
To give clarity, I didnât seek people out who worked in AI safety. Hereâs what I did to the best of my recollection.
Over the course of 3 days, I asked anyone I saw in Mox who seemed friendly enough, as well as Taco Tuesday, and sent a few DMs to acquaintances. The DMs I sent were to people who work in AI safety, but there were only 4. So ~46 came from people hanging out around Mox and Taco Tuesday.
I will grant that this lends to an SF/âAI safety bias. Now, Rob Milesâ audience comes heavily from Computerphile and such whose audience is largely young people interested in STEM who like to grapple with interesting academic-y problems in their spare time (outside of school). In other words, this is an audience that we care a lot about reaching. Itâs hard to overstate the possible variance in audience âqualityâ. For example, Jane Street pays millions to advertisers to get itself seen in front of potential traders on channels like Stand-Up Maths or the Dwarkesh podcast. These channels donât actually get that many views compared to others but they have a very high âaudience qualityâ, clearly, based on how much trading firms are willing to pay to advertise there. We actually thought a decent, though imperfect, metric for audience quality would just be a personâs income compared to the world average of ~12k. This meant the average american would have an audience quality of 7. Austin and I thought this might be a bit too controversial and doesnât capture exaxctly what we mean (we care about attracking a poor MIT CS student more than a mid-level real estate developer in Miami) but itâs a decent approximation.
Audience quality is roughly something like âthe people we care most about reaching,â and thus âpeople who can go into work on technical AI safetyâ seems very important.
Rob wasnât the only one mentioned, the next most popular were Cognitive Revolution and AI in context (people often said âAricâ) since I asked them to just name anyone they listen to/âwould consider an AI safety youtuber, etc.
On 4, I greatly encourage people to input their own weights, I specifically put that in the doc and part of the reason for doing this project was to get people to talk about cost effectiveness in AI safety.
On my bias: Like all human beings, Iâm flawed and have biases, but I did my best to just objectively look at data in what I thought the best way possible. I appreciate that you talked to others regarding my intentions.
Iâll happily link to my comments on Manifund 123 you may be referring to for people to see the full comments and perhaps emphasize some points I wrote
@ I want to quickly note that itâs a bit unfair for me to specifically only call you out on this or rather, that this is a thing I find with many AI safety projects. It just came up high on Manifund when I logged on for other reasons and saw donations from people I respect.
FWIW, I donât want to single you out, I have this kind of critique of many, many people doing AI safety work but this just seems like a striking example of it.
I didnât mean my comments to say âyou should return this moneyâ. Lots of grants/âspending in EA ecosystems I consider to be wasteful, ineffective etc. And again, apologies for singling you out on a gripe I have with EA funding.
Many people can tell you that I have a problem with the free-spending, lavish and often wasteful spending in the longtermist side of EA. I think I made it pretty clear that I was using this RFP as an example because other regrantors gave to it.
This project with Austin was planned to happen before you posted your RFP on Manifund (I can provide proof if youâd like).
I wasnât playing around with the weights to make you come out lower. I assure you, my bias is usually against projects I perceive to be âfree-spendingâ.
I think itâs good/ânatural to try to create separation between evaluators/âprojects though.
For context, you asked me for data for something you were planning (at the time) to publish day-off. Thereâs no way to get the watchtime easily on TikTok (which is why I had to do manual addition of things on a computer) and I was not on my laptop, so couldnât do it when you messaged me. You didnât follow up to clarify that watchtime was actually the key metric in your system and you actually needed that number.
Good to know that the 50 people were 4 Safety people and 46 people who hang at Mox and Taco Tuesday. I understand youâre trying to reach the MIT-graduate working in AI who might somehow transition to AI Safety work at a lab /â constellation. I know that Dwarkesh & Nathan are quite popular with that crowd, and I have a lot of respect for what Aric (& co) did, so the data you collected make a lot of sense to me. I think I can start to understand why you gave a lower score to Rational Animations or other stuff like AIRN.
Iâm now modeling you as trying to answer something like âhow do we cost-effectively feed AI Safety ideas to the kind of people who walk in at Taco Tuesday, who have the potential to be good AI Safety researchersâ. Given that, I can now understand better how you ended up giving some higher score to Cognitive Revolution and Robert Miles.
Hi Michael, sorry for coming back a bit late. I was flying out of SF
For costs, Iâm going to stand strongly by my number here. In fact, I think it should be $26k. I treated everyone the same and counted the value of their time, at their suggested rate, for the amount of time they were doing their work. This affected everyone, and I think it is a much more accurate way to measure things. This affected AI Species and Doom Debates quite severely as well, more so than you as well as others. I briefly touched on this in the post, but Iâll expand here. The goal of this exercise isnât to measure money spent vs output, but rather cost effectiveness per resource put in. If time is put in unpaid, this should be accounted for since it isnât going to be unpaid forever. Otherwise, there will be âgamedâ cost effectiveness, where you can increase your numbers for a time by not paying yourself. Even if you never planned to take funding, you could spend your time doing other things, and thus there is still a tradeoff. Itâs natural/ânormal for projects at the start to be unpaid and apply for funding later, and for a few months of work to go unpaid. For example, I did this work unpaid, but if I am going to continue to do CEA for longtermist stuff, I will expect to be paid eventually.
In your case, your first TikTok was made on June 15, and given that you post ~1/âday, I assume that you basically made the short on the same day. Given I made your calculations on Sept 9â10, thatâs 13 weeks. In your Manifund post, you are asking for $2k/âweek, and thus I take that to be your actual cost of doing work. Iâm not simply measuring your âwork done on the grantâ and just accepting the time you did for free beforehand.
2. Iâm happy to take data corrections. Undoubtedly, I have made some mistakes since not everyone responded to me, data is a bit messy, etc.
A) For the FLI podcast, I ran numbers for the whole channel, not just the podcast. That means their viewer minutes are calculated over the whole channel. They havenât gotten back to me yet so I hope to update their numbers when they do. I agree that their metrics have a wide error bar.
B) I was in the process of coming up with better estimates of viewer minutes based on video/âpodcast length but I stopped because people were responding to me, and I thought it better to just use accurate numbers. I stand by this decision, though I acknowledge the tradeoff.
C) If a video has âinflatedâ views due to paid advertising, thatâs fine. It shows up in the cost part of cost-effectiveness. For example, Cognitive Revolution, who does boosts their videos/âadvertising, thatâs part of their costs. I donât think its a problem that some viewers are paid for, maybe they see a video they otherwise wouldnât have. Thatâs fine. I also acknowledge that others may feel differently about how this translates to impact. That said, no, this wonât reduce QAVM/â$ to simply the cost of ads. Ads just donât work very well without organic views.
3. For Rational Animations showing up low on the list, the primary reason for this is that they spend a boatload of money; nobody else comes close. Iâm not saying thatâs bad. Its just a fact. They spend more than everyone else combined. Since, I am dividing by dollars, they get a lower VM/â$ and thus QAVM/â$.
If you wish, you can simply look at VM/â$. They score low here too (8th, same as adjusted).
As for giving Robert Miles a high ranking, this came about because Austin really thought Dwarkesh was an AI safety YouTuber, and so I asked ~50 people different variants of the question âwho is your favourite AI safety creatorâ, âWhich AI safety YouTubers did/âdo you watchâ, etc. Itâs hard to overstate this; Robert Miles was the first person EVERYONE mentioned. I found this surprising since, well, his view counts donât bear that out. Furthermore, 3 people told me that they work in AI safety because of his videos. I think there is a good case that his adjustment factors should be far HIGHER, not lower.
4. Regarding your weights. I encourage people (and did so in the post) to give their own weights to channels. For this exercise, I watched a lot of AI safety content to get a sense of what is out there. My quality weights were based on this (and discussions with Austin and others). I encourage you to consider each weight separately. Austin added the âtotalâ quality factor at the end, I kinda didnât want it since I thought it could lead to this. For audience quality, I looked at things like TikTok viewership vs. YouTube/âApple Podcasts. For message fidelity, respectfully, youâre just posting clips of other podcasts and such and this just doesnât do a great job of getting a message across. For fidelity of message, everyone but Rob Miles got <0.5 since I am comparing to Rob. With a different reference video, I would get different results. For Qm, your number is very similar to others but even still, I found the message to not be the best.
Again, most of the quality factor is being done by audience quality and yes, shorts just have a far lower audience quality.
On the data errors, as expressed above, I donât think I made data errors. I get the sense, while reading this, that you feel I was âout to get youâ or something and was being willfully biased. I want to assure you that this wasnât the case. Lots of different creators have issues with how I decided to do this analysis, and in general, they wanted the analysis to be done in a way that would give them better numbers. I think thatâs human nature, partially, and also that they likely made their content with their assumptions in mind. In the end, I settled on this process as what I (and Austin) found to be the most reasonable, taken everything we learned into account. I am not saying my analysis is the be-all end-all and should dictate where Open Phil money goes tomorrow until further analysis is done.
I hope that explains/âanswers all your points. I am happy to engage further.
1) Feel free to use $26k. My main issue was that you didnât
ask me for my viewer minutes for TikTok(EDIT: didnât follow up to make sure I give you the viewer minutes for TikTok) and instead used a number that is off by a factor of 10. Please use a correct number in future analysis. For June 15 - Sep 10, thatâs 4,150,000 minutes, meaning a VM/â$ of 160 instead of 18 (details here).A) Your screenshots of google sheets say âFLI podcastâ, but you ran your script on the entire channel. And you say that the budget is $500k. Can you confirm what youâre trying to measure here? The entire video work of FLI? Just the podcast? If youâre trying to get the entire channel, is the budget really $500k for the entire thing? Iâm confused.
B) If you use accurate numbers for some things and estimate for others, Iâd make sure to communicate explicitly about which ones are which. Even then, when you then compare estimates and real numbers thereâs a risk that your estimates are off by a a huge factor (has happened with my TikTok numbers), which makes me question the value of the comparisons.
C) Let me try to be clearer regarding paid advertising:
If some of the watchtime estimates you got from people are (views * 33% of length), and they pay $X per view (fixed cost of ads on youtube), then the VM/â$ will be: [nb_views * (33% length) /â total_cost] = [ nb_views * 33% length] /â [nb_views * X] = [33% length /â X]. Which is why I mean itâs basically the cost of ads. (Note: I didnât include the organic views above because Iâm assuming theyâre negligible compared to the inorganic ones. If you want me to give examples of videos where I see mostly inorganic views, Iâll send you by DM).
For the cases where you got the actual watchtime numbers instead of multiplying the length by a constant or using a script (say, someone tells you they have Y amount of hours total on their channel), or the ads lead to real organic views, your reasoning around ads makes sense, though Iâd still argue that in terms of impact the engagement is the low /â pretty disastrous in some cases, and does not translate to things we care about (like people taking action).
3. I think the questions âwho is your favourite AI safety creatorâ or âWhich AI safety YouTubers did/âdo you watchâ are heavily biased towards Robert Miles, as he is (and has basically been for the past 8 years) the only âAI Safety Youtuberâ (like making purely talking head videos about AI Safety, in comparison, RA is a team). So I think based on these questions itâs quite likely heâd be mentioned, though I agree 50 people saying his name first is important data that needs to be taken into account.
That said, Iâm trying to wrap my head around how to go from your definition of âquality of audienceâ to âRobert Miles was chosen by 50 different people to be their favorite youtuber, as the first person mentionedâ. My interpretation is that youâre saying: 1) youâve spoken to 50 people who are people who work in AI Safety 2) they all mentioned Rob as the canonical Youtuber, so âthereforeâ A) Rob has the highest quality audience? (cf. you wrote in OP âThis led me to make the âaudience qualityâ category and rate his audience much higher.â)
My model for how this claim could be true that 1) you asked 50 people who you all thought were âhigh qualityâ audience 2) they all mentioned rob and nobody else (or rarely nobody else), so 3) you inferred âhigh quality audience â watches Robâ and therefore 4) also inferred âwatches Rob â high qualityâ?
4. Regarding weights, also respectfully, I did indeed look at them individually. You can check my analysis for what I think the TikTok individual weights should be here. For Youtube see here. Regarding your points:
I have posted in my analysis of tiktok a bunch of datapoints that you probably donât have about the fact that my audience is mostly older high-income people from richer countries, which is unusually good for TikTok. Which is why I put 3 instead of your 2.
âyouâre just posting clips of other podcasts and such and this just doesnât do a great job of getting a message acrossâ â the clips that end up making the majority of viewer minutes are actually quite high fidelity since theyâre quite long (2-4m long) and get the message more crisply than the average podcast minute. Anyway, once you look at my TikTok analysis youâll see that I ended up dividing everything by 2 to have the max fidelity tiktok have 0.5 (same as Cognitive Revolution), which means my number is Qf=0.45 at the end (instead of your 0.1) to just be coherent with the rest of your numbers.
Qm: thatâs subjective but FWIW I myself only align to 0.75 to my TikTok and not 1 (see analysis)
âAgain, most of the quality factor is being done by audience quality and yes, shorts just have a far lower audience quality.â --> again, respectfully, from looking at your tables I think this is false. You rank the fidelity of TikTok as 0.1, which is 5x less than 4 other channels. No other channels except my content (TikTok & YT) has less 0.3. In comparison, if you forget about robâs row, the audience quality varies only by 3x between my Qa for TikTok and the rest. So no, the quality factor is not mainly done by audience quality.
On 1, with your permission, Iâd ask if I could share a screenshot of me asking you in DMs, directly, for viewer minutes. You gave me views, and thus I multiplied the average TikTok length and by a factor for % watched.
On A, yes, the FLI Podcast was perhaps the data point I did the most estimating for a variety of reasons I explained before.
On B, I think you can, in fact, find which are and arenât estimates though I do understand how itâs not clear. We considered ways of doing this without being messy. Ill try to make it more clear.
On C, how much you pay for a view is not a constant though. It depends a lot on organic views. And I think boosting videos is a sensible strategy since you put $ into both production costs (time, equipment, etc.) and advertisement. FIguring out how to spend that money efficiently is important.
On 3, many other people were mentioned. In fact, I found a couple of creators this way. But yes, it was extremely striking and thus suggested that this was a very important factor in the analysis. I want to stress that I do in fact, think that this matters a lot. When Austin and I were speaking and relying on comparisons, we thought his quality numbers should be much higher in fact, we toned it down though maybe we shouldnât have.
To give clarity, I didnât seek people out who worked in AI safety. Hereâs what I did to the best of my recollection.
Over the course of 3 days, I asked anyone I saw in Mox who seemed friendly enough, as well as Taco Tuesday, and sent a few DMs to acquaintances. The DMs I sent were to people who work in AI safety, but there were only 4. So ~46 came from people hanging out around Mox and Taco Tuesday.
I will grant that this lends to an SF/âAI safety bias. Now, Rob Milesâ audience comes heavily from Computerphile and such whose audience is largely young people interested in STEM who like to grapple with interesting academic-y problems in their spare time (outside of school). In other words, this is an audience that we care a lot about reaching. Itâs hard to overstate the possible variance in audience âqualityâ. For example, Jane Street pays millions to advertisers to get itself seen in front of potential traders on channels like Stand-Up Maths or the Dwarkesh podcast. These channels donât actually get that many views compared to others but they have a very high âaudience qualityâ, clearly, based on how much trading firms are willing to pay to advertise there. We actually thought a decent, though imperfect, metric for audience quality would just be a personâs income compared to the world average of ~12k. This meant the average american would have an audience quality of 7. Austin and I thought this might be a bit too controversial and doesnât capture exaxctly what we mean (we care about attracking a poor MIT CS student more than a mid-level real estate developer in Miami) but itâs a decent approximation.
Audience quality is roughly something like âthe people we care most about reaching,â and thus âpeople who can go into work on technical AI safetyâ seems very important.
Rob wasnât the only one mentioned, the next most popular were Cognitive Revolution and AI in context (people often said âAricâ) since I asked them to just name anyone they listen to/âwould consider an AI safety youtuber, etc.
On 4, I greatly encourage people to input their own weights, I specifically put that in the doc and part of the reason for doing this project was to get people to talk about cost effectiveness in AI safety.
On my bias:
Like all human beings, Iâm flawed and have biases, but I did my best to just objectively look at data in what I thought the best way possible. I appreciate that you talked to others regarding my intentions.
Iâll happily link to my comments on Manifund 1 2 3 you may be referring to for people to see the full comments and perhaps emphasize some points I wrote
Many people can tell you that I have a problem with the free-spending, lavish and often wasteful spending in the longtermist side of EA. I think I made it pretty clear that I was using this RFP as an example because other regrantors gave to it.
This project with Austin was planned to happen before you posted your RFP on Manifund (I can provide proof if youâd like).
I wasnât playing around with the weights to make you come out lower. I assure you, my bias is usually against projects I perceive to be âfree-spendingâ.
I think itâs good/ânatural to try to create separation between evaluators/âprojects though.
For context, you asked me for data for something you were planning (at the time) to publish day-off. Thereâs no way to get the watchtime easily on TikTok (which is why I had to do manual addition of things on a computer) and I was not on my laptop, so couldnât do it when you messaged me. You didnât follow up to clarify that watchtime was actually the key metric in your system and you actually needed that number.
Good to know that the 50 people were 4 Safety people and 46 people who hang at Mox and Taco Tuesday. I understand youâre trying to reach the MIT-graduate working in AI who might somehow transition to AI Safety work at a lab /â constellation. I know that Dwarkesh & Nathan are quite popular with that crowd, and I have a lot of respect for what Aric (& co) did, so the data you collected make a lot of sense to me. I think I can start to understand why you gave a lower score to Rational Animations or other stuff like AIRN.
Iâm now modeling you as trying to answer something like âhow do we cost-effectively feed AI Safety ideas to the kind of people who walk in at Taco Tuesday, who have the potential to be good AI Safety researchersâ. Given that, I can now understand better how you ended up giving some higher score to Cognitive Revolution and Robert Miles.