Thank you both for doing this, I appreciate the effort in trying to get some estimates.
However, I would like to flag that your viewer minute numbers for my short-form content are off by an order of magnitude. And I’ve done 4 full weeks on the Manifund grant, so it’s 4 * $2k = $8k, not $24k.
Plugging these numbers in (google sheets here) I get a QAVM/$ of 389 instead of the 18 you have listed.
Other data corrections:
You say that the FLI podcast has 17M views and costed $500k. However, 17M is the amount of views on the overall FLI channel. If you look at the FLI podcast playlist and just add up the numbers you get something closer to 1M (calculation here). I’m assuming the $500k come from the podcast existing for 5 years and costing ~$100k / year? If so this does not include the three most viewed videos (that probably required a large budget) about slaughter bots and nuclear war (~12M of the 17M views). So really the google sheets should say 1M views, and the viewer minutes should be updated accordingly. (Except if they managed to produce all the slaughter bots / nuclear war stuff with a $500k budget).
Now, regarding watch time, saying that podcasts have a 33% watch time is I think overly optimistic. To give you some idea, in my case a good 12m video with 40k views has ~40% watchtime. And my most viewed podcasts average 12% of watchtime. So I’d say you’re probably off by a factor of 3 for podcast viewer minutes.
Finally, for a couple of these podcasts the views are inflated because the podcasts are promoted via paid advertising. Some creators are quite open about it so you can ask them directly and they’ll tell you. If you really want to know if the views are inflated one way to determine that is to look at the likes / view ratio. For instance, if a podcast has 20,000 views but 50 likes, the ratio is 0.25%. This is 20x to 40x too low (cf. here) for a non-promoted youtube with the same amount views. And if you look closely at a couple of these podcasts you listed you’ll find exactly that. There is no problem in doing that (since it probably helps growth, and it’s good to spend money to have more eyeballs), but if you’re spending money to get views that are non-organic and viewers don’t end up liking your video / engaging (and thus not driving organic reach) then your QAVM/$ is basically the cost of ads on youtube.
If we now look at youtube explainers:
You end up ranking Rational Animations at 8th position in QAVM/$, despite it having the most views. I find this quite surprising because in my opinion RA is the highest-production channel on the list, on par with AI in context. One factor is that you rate RA’s quality of audience as 4, compared to 12 for Robert Miles. I understand this is because you had lots of conversations where Robert Miles name came up and people said he had influenced them.
However, I think Robert Miles’ name comes up first in in people’s minds primarily because he was the earliest AI safety YouTuber. He is also one of the only channels on the list that has his face in videos. So it’s not surprising to me that many people in the community learned about AI Safety through him. The 3x score difference (in quality of audience) with RA seems too high.
Regarding your weights, you place both my TikTok and Youtube channel at 0.1 and 0.2 in quality, which I find surprising, especially Youtube:
My second most watched video is an interview with Robert Miles, so it would be hard to argue that my content is 60x lower quality than Robert Miles’ own videos.
Similarly, for Cognitive Revolution, we were supposed to cross-post my Evan Hubinger interview to his platform, and ended up crossposting my Owain interview instead, so can we really say that his content is 12x higher quality if (in some cases) Nathan would be happy for the content to be literally the same?
Overall, I’m a bit disappointed by your data errors given that I replied to you by DM saying that your first draft missed a lot of important factors & data, and suggested helping you / delaying publication, which you refused.
Update: I’ve now estimated the quality for my long-form youtube content to be Q= 6*0.45*1 = 2.7 for Youtube, and Q=3*0.9*0.75=2.0Q = 3*0.45*0.75 = 1.0 for TikTok. See details here for Youtube, and here for TikTok. Using these updated weights (see “Michael’s weights” here) I get this final table.
Hi Michael, sorry for coming back a bit late. I was flying out of SF
For costs, I’m going to stand strongly by my number here. In fact, I think it should be $26k. I treated everyone the same and counted the value of their time, at their suggested rate, for the amount of time they were doing their work. This affected everyone, and I think it is a much more accurate way to measure things. This affected AI Species and Doom Debates quite severely as well, more so than you as well as others. I briefly touched on this in the post, but I’ll expand here. The goal of this exercise isn’t to measure money spent vs output, but rather cost effectiveness per resource put in. If time is put in unpaid, this should be accounted for since it isn’t going to be unpaid forever. Otherwise, there will be “gamed” cost effectiveness, where you can increase your numbers for a time by not paying yourself. Even if you never planned to take funding, you could spend your time doing other things, and thus there is still a tradeoff. It’s natural/normal for projects at the start to be unpaid and apply for funding later, and for a few months of work to go unpaid. For example, I did this work unpaid, but if I am going to continue to do CEA for longtermist stuff, I will expect to be paid eventually.
In your case, your first TikTok was made on June 15, and given that you post ~1/day, I assume that you basically made the short on the same day. Given I made your calculations on Sept 9⁄10, that’s 13 weeks. In your Manifund post, you are asking for $2k/week, and thus I take that to be your actual cost of doing work. I’m not simply measuring your “work done on the grant” and just accepting the time you did for free beforehand.
2. I’m happy to take data corrections. Undoubtedly, I have made some mistakes since not everyone responded to me, data is a bit messy, etc.
A) For the FLI podcast, I ran numbers for the whole channel, not just the podcast. That means their viewer minutes are calculated over the whole channel. They haven’t gotten back to me yet so I hope to update their numbers when they do. I agree that their metrics have a wide error bar.
B) I was in the process of coming up with better estimates of viewer minutes based on video/podcast length but I stopped because people were responding to me, and I thought it better to just use accurate numbers. I stand by this decision, though I acknowledge the tradeoff.
C) If a video has “inflated” views due to paid advertising, that’s fine. It shows up in the cost part of cost-effectiveness. For example, Cognitive Revolution, who does boosts their videos/advertising, that’s part of their costs. I don’t think its a problem that some viewers are paid for, maybe they see a video they otherwise wouldn’t have. That’s fine. I also acknowledge that others may feel differently about how this translates to impact. That said, no, this won’t reduce QAVM/$ to simply the cost of ads. Ads just don’t work very well without organic views.
3. For Rational Animations showing up low on the list, the primary reason for this is that they spend a boatload of money; nobody else comes close. I’m not saying that’s bad. Its just a fact. They spend more than everyone else combined. Since, I am dividing by dollars, they get a lower VM/$ and thus QAVM/$.
If you wish, you can simply look at VM/$. They score low here too (8th, same as adjusted).
As for giving Robert Miles a high ranking, this came about because Austin really thought Dwarkesh was an AI safety YouTuber, and so I asked ~50 people different variants of the question “who is your favourite AI safety creator”, “Which AI safety YouTubers did/do you watch”, etc. It’s hard to overstate this; Robert Miles was the first person EVERYONE mentioned. I found this surprising since, well, his view counts don’t bear that out. Furthermore, 3 people told me that they work in AI safety because of his videos. I think there is a good case that his adjustment factors should be far HIGHER, not lower.
4. Regarding your weights. I encourage people (and did so in the post) to give their own weights to channels. For this exercise, I watched a lot of AI safety content to get a sense of what is out there. My quality weights were based on this (and discussions with Austin and others). I encourage you to consider each weight separately. Austin added the “total” quality factor at the end, I kinda didn’t want it since I thought it could lead to this. For audience quality, I looked at things like TikTok viewership vs. YouTube/Apple Podcasts. For message fidelity, respectfully, you’re just posting clips of other podcasts and such and this just doesn’t do a great job of getting a message across. For fidelity of message, everyone but Rob Miles got <0.5 since I am comparing to Rob. With a different reference video, I would get different results. For Qm, your number is very similar to others but even still, I found the message to not be the best.
Again, most of the quality factor is being done by audience quality and yes, shorts just have a far lower audience quality.
On the data errors, as expressed above, I don’t think I made data errors. I get the sense, while reading this, that you feel I was “out to get you” or something and was being willfully biased. I want to assure you that this wasn’t the case. Lots of different creators have issues with how I decided to do this analysis, and in general, they wanted the analysis to be done in a way that would give them better numbers. I think that’s human nature, partially, and also that they likely made their content with their assumptions in mind. In the end, I settled on this process as what I (and Austin) found to be the most reasonable, taken everything we learned into account. I am not saying my analysis is the be-all end-all and should dictate where Open Phil money goes tomorrow until further analysis is done.
I hope that explains/answers all your points. I am happy to engage further.
1) Feel free to use $26k. My main issue was that you didn’t ask me for my viewer minutes for TikTok (EDIT: didn’t follow up to make sure I give you the viewer minutes for TikTok) and instead used a number that is off by a factor of 10. Please use a correct number in future analysis. For June 15 - Sep 10, that’s 4,150,000 minutes, meaning a VM/$ of 160 instead of 18 (details here).
A) Your screenshots of google sheets say “FLI podcast”, but you ran your script on the entire channel. And you say that the budget is $500k. Can you confirm what you’re trying to measure here? The entire video work of FLI? Just the podcast? If you’re trying to get the entire channel, is the budget really $500k for the entire thing? I’m confused.
B) If you use accurate numbers for some things and estimate for others, I’d make sure to communicate explicitly about which ones are which. Even then, when you then compare estimates and real numbers there’s a risk that your estimates are off by a a huge factor (has happened with my TikTok numbers), which makes me question the value of the comparisons.
C) Let me try to be clearer regarding paid advertising:
If some of the watchtime estimates you got from people are (views * 33% of length), and they pay $X per view (fixed cost of ads on youtube), then the VM/$ will be: [nb_views * (33% length) / total_cost] = [ nb_views * 33% length] / [nb_views * X] = [33% length / X]. Which is why I mean it’s basically the cost of ads. (Note: I didn’t include the organic views above because I’m assuming they’re negligible compared to the inorganic ones. If you want me to give examples of videos where I see mostly inorganic views, I’ll send you by DM).
For the cases where you got the actual watchtime numbers instead of multiplying the length by a constant or using a script (say, someone tells you they have Y amount of hours total on their channel), or the ads lead to real organic views, your reasoning around ads makes sense, though I’d still argue that in terms of impact the engagement is the low / pretty disastrous in some cases, and does not translate to things we care about (like people taking action).
3. I think the questions “who is your favourite AI safety creator” or “Which AI safety YouTubers did/do you watch” are heavily biased towards Robert Miles, as he is (and has basically been for the past 8 years) the only “AI Safety Youtuber” (like making purely talking head videos about AI Safety, in comparison, RA is a team). So I think based on these questions it’s quite likely he’d be mentioned, though I agree 50 people saying his name first is important data that needs to be taken into account.
That said, I’m trying to wrap my head around how to go from your definition of “quality of audience” to “Robert Miles was chosen by 50 different people to be their favorite youtuber, as the first person mentioned”. My interpretation is that you’re saying: 1) you’ve spoken to 50 people who are people who work in AI Safety 2) they all mentioned Rob as the canonical Youtuber, so “therefore” A) Rob has the highest quality audience? (cf. you wrote in OP “This led me to make the “audience quality” category and rate his audience much higher.”)
My model for how this claim could be true that 1) you asked 50 people who you all thought were “high quality” audience 2) they all mentioned rob and nobody else (or rarely nobody else), so 3) you inferred “high quality audience ⇒ watches Rob” and therefore 4) also inferred “watches Rob ⇒ high quality”?
4. Regarding weights, also respectfully, I did indeed look at them individually. You can check my analysis for what I think the TikTok individual weights should be here. For Youtube see here. Regarding your points:
I have posted in my analysis of tiktok a bunch of datapoints that you probably don’t have about the fact that my audience is mostly older high-income people from richer countries, which is unusually good for TikTok. Which is why I put 3 instead of your 2.
“you’re just posting clips of other podcasts and such and this just doesn’t do a great job of getting a message across” → the clips that end up making the majority of viewer minutes are actually quite high fidelity since they’re quite long (2-4m long) and get the message more crisply than the average podcast minute. Anyway, once you look at my TikTok analysis you’ll see that I ended up dividing everything by 2 to have the max fidelity tiktok have 0.5 (same as Cognitive Revolution), which means my number is Qf=0.45 at the end (instead of your 0.1) to just be coherent with the rest of your numbers.
Qm: that’s subjective but FWIW I myself only align to 0.75 to my TikTok and not 1 (see analysis)
“Again, most of the quality factor is being done by audience quality and yes, shorts just have a far lower audience quality.” --> again, respectfully, from looking at your tables I think this is false. You rank the fidelity of TikTok as 0.1, which is 5x less than 4 other channels. No other channels except my content (TikTok & YT) has less 0.3. In comparison, if you forget about rob’s row, the audience quality varies only by 3x between my Qa for TikTok and the rest. So no, the quality factor is not mainly done by audience quality.
On 1, with your permission, I’d ask if I could share a screenshot of me asking you in DMs, directly, for viewer minutes. You gave me views, and thus I multiplied the average TikTok length and by a factor for % watched.
On A, yes, the FLI Podcast was perhaps the data point I did the most estimating for a variety of reasons I explained before.
On B, I think you can, in fact, find which are and aren’t estimates though I do understand how it’s not clear. We considered ways of doing this without being messy. Ill try to make it more clear.
On C, how much you pay for a view is not a constant though. It depends a lot on organic views. And I think boosting videos is a sensible strategy since you put $ into both production costs (time, equipment, etc.) and advertisement. FIguring out how to spend that money efficiently is important.
On 3, many other people were mentioned. In fact, I found a couple of creators this way. But yes, it was extremely striking and thus suggested that this was a very important factor in the analysis. I want to stress that I do in fact, think that this matters a lot. When Austin and I were speaking and relying on comparisons, we thought his quality numbers should be much higher in fact, we toned it down though maybe we shouldn’t have.
To give clarity, I didn’t seek people out who worked in AI safety. Here’s what I did to the best of my recollection.
Over the course of 3 days, I asked anyone I saw in Mox who seemed friendly enough, as well as Taco Tuesday, and sent a few DMs to acquaintances. The DMs I sent were to people who work in AI safety, but there were only 4. So ~46 came from people hanging out around Mox and Taco Tuesday.
I will grant that this lends to an SF/AI safety bias. Now, Rob Miles’ audience comes heavily from Computerphile and such whose audience is largely young people interested in STEM who like to grapple with interesting academic-y problems in their spare time (outside of school). In other words, this is an audience that we care a lot about reaching. It’s hard to overstate the possible variance in audience “quality”. For example, Jane Street pays millions to advertisers to get itself seen in front of potential traders on channels like Stand-Up Maths or the Dwarkesh podcast. These channels don’t actually get that many views compared to others but they have a very high “audience quality”, clearly, based on how much trading firms are willing to pay to advertise there. We actually thought a decent, though imperfect, metric for audience quality would just be a person’s income compared to the world average of ~12k. This meant the average american would have an audience quality of 7. Austin and I thought this might be a bit too controversial and doesn’t capture exaxctly what we mean (we care about attracking a poor MIT CS student more than a mid-level real estate developer in Miami) but it’s a decent approximation.
Audience quality is roughly something like “the people we care most about reaching,” and thus “people who can go into work on technical AI safety” seems very important.
Rob wasn’t the only one mentioned, the next most popular were Cognitive Revolution and AI in context (people often said “Aric”) since I asked them to just name anyone they listen to/would consider an AI safety youtuber, etc.
On 4, I greatly encourage people to input their own weights, I specifically put that in the doc and part of the reason for doing this project was to get people to talk about cost effectiveness in AI safety.
On my bias: Like all human beings, I’m flawed and have biases, but I did my best to just objectively look at data in what I thought the best way possible. I appreciate that you talked to others regarding my intentions.
I’ll happily link to my comments on Manifund 123 you may be referring to for people to see the full comments and perhaps emphasize some points I wrote
@ I want to quickly note that it’s a bit unfair for me to specifically only call you out on this or rather, that this is a thing I find with many AI safety projects. It just came up high on Manifund when I logged on for other reasons and saw donations from people I respect.
FWIW, I don’t want to single you out, I have this kind of critique of many, many people doing AI safety work but this just seems like a striking example of it.
I didn’t mean my comments to say “you should return this money”. Lots of grants/spending in EA ecosystems I consider to be wasteful, ineffective etc. And again, apologies for singling you out on a gripe I have with EA funding.
Many people can tell you that I have a problem with the free-spending, lavish and often wasteful spending in the longtermist side of EA. I think I made it pretty clear that I was using this RFP as an example because other regrantors gave to it.
This project with Austin was planned to happen before you posted your RFP on Manifund (I can provide proof if you’d like).
I wasn’t playing around with the weights to make you come out lower. I assure you, my bias is usually against projects I perceive to be “free-spending”.
I think it’s good/natural to try to create separation between evaluators/projects though.
For context, you asked me for data for something you were planning (at the time) to publish day-off. There’s no way to get the watchtime easily on TikTok (which is why I had to do manual addition of things on a computer) and I was not on my laptop, so couldn’t do it when you messaged me. You didn’t follow up to clarify that watchtime was actually the key metric in your system and you actually needed that number.
Good to know that the 50 people were 4 Safety people and 46 people who hang at Mox and Taco Tuesday. I understand you’re trying to reach the MIT-graduate working in AI who might somehow transition to AI Safety work at a lab / constellation. I know that Dwarkesh & Nathan are quite popular with that crowd, and I have a lot of respect for what Aric (& co) did, so the data you collected make a lot of sense to me. I think I can start to understand why you gave a lower score to Rational Animations or other stuff like AIRN.
I’m now modeling you as trying to answer something like “how do we cost-effectively feed AI Safety ideas to the kind of people who walk in at Taco Tuesday, who have the potential to be good AI Safety researchers”. Given that, I can now understand better how you ended up giving some higher score to Cognitive Revolution and Robert Miles.
Your points seem pretty fair to me. In particular, I agree that putting your videos at 0.2 seems pretty unreasonable and out of line with the other channels—I would have guessed that you’re sufficiently niche that a lot of your viewers are already interested in AI Safety! TikTok I expect is pretty awful, so 0.1 might be reasonable there
Agreed that the quality of audience is definitely higher for my (niche) AI Safety content on Youtube, and I’d expect Q to be higher for (longform) Youtube than Tiktok.
In particular, I estimate Q(The Inside View Youtube) = 2.7, instead of 0.2, with (Qa, Qf, Qm) = (6, 0.45, 1), though I acknowledge that Qm is (by definition) the most subjective.
To make this easier to read & reply to, I’ll post my analysis for Q(The Inside View Tiktok) in another comment, which I’ll link to when it’s up. EDIT: link for TikTok analysis here.
The Inside View (Youtube) - Qa = 6
In light of @Drew Spartz’s comment (saying one way to quantify the quality of audience would be to look at the CPM [1]), I’ve compiled my CPM Youtube data and my average Playback-based CPM is $14.8, which according to this website[2]would put my CPM above the 97.5 percentile in the UK, and close to the 97.5 percentile in the US.
Now, this is more anecdotal evidence than data-based, but I’ve met quite a few people over the years (from programs like MATS, or working at AI Safety orgs) who’ve told me they discovered AI Safety from my Inside View podcast. And I expect the SB-1047 documentary to have attracted a niche audience interested in AI regulation.
Given the above, I think it would make sense to have the Qa(Youtube) be between 6 (same as other technical podcasts) and 12 (Robert Miles). For the sake of giving a concrete number, I’ll say 6 to be on par with other podcasts like FLI and CR.
The Inside View (Youtube) - Qf = 0.45
In the paragraph below I’ll say Qf_M for the Qf that Marcus assigns to other creators.
For the fidelity of message, I think it’s a bit of a mixed bag here. As I said previously, I expect the podcasts that Nathan would be willing to crosspost to be on par with his channel’s quality, so in that sense I’d say the fidelity of message for these technical episodes (Owain Evans, Evan Hubinger) to be on par with CR (Qf_M = 0.5). Some of my non-technicalinterviews are probably closer to discussions we could find on Doom Debates (Qf_M = 0.4), though there are less of them. My SB-1047 documentary is probably similar in fidelity of message to AI in context (Qf_M = 0.5), and this fictional scenario is very similar to Drew’s content (Qf_M = 0.5). I’ve also posted video explainers that range from low effort (Qf around 0.4?) to very high effort (Qf around 0.5?).
Given all of the above, I’d say the Qf for the entire channel is probably around 0.45.
The Inside View (Youtube) - Qm = 1
As you say, for the alignment of message, this is probably the most subjective. I think by definition the content I post is the message that aligns the most with my values (at least for my Youtube content) so I’d say 1 here.
The Inside View (Youtube) - Q = 2.7
Multiplying these numbers I get Q = 2.7. Doing a sanity check, this seems about the same as Cognitive Revolution, which doesn’t seem crazy given we’ve interviewed similar people & the cross-post arguments I’ve said before.
(Obviously if I was to modify all of these Qa, Qf, Qm numbers for all channels I’d probably end up with different quality comparisons).
I haven’t done extended research here and expect I’d probably get different results looking at different websites. This one was the first one I found on google so not cherry-picked.
This comment is answering “TikTok I expect is pretty awful, so 0.1 might be reasonable there”. For my previous estimate on the quality of my Youtube long-form stuff, see this comment.
tl;dr: I now estimate the quality of my TikTok content to be Q = 0.75 * 0.45 * 3 = 1
To estimate fidelity of message (Qf) and alignment of message (Qm) in a systematic way, I compiled my top 10 most performing tiktoks and ranked their individual Qf and Qm (see tab called “TikTok Qa & Qf” here, which contains the reasoning for each individual number).
Update Sep 14: I’ve realized that my numbers about fidelity used 1 as the maximum, but now that I’ve looked at Marcus’ weights for other stuff, I think I should use 0.5 because that’s the number he gives to a podcast like Cognitive Revolution, and I don’t want to claim that a long tiktok clip is more high-fidelity than the average Cognitive Revolution podcast. So I divided everything by 2 so my maximum fidelity is now 0.5 to match Marcus’ other weights.
Then, by doing a minute-adjusted weighted average of the Qas and Qfs I get:
Qf(The Inside View TikTok) = 0.45
Qm(The Inside View TikTok) = 0.75
What this means:
Since I’m editing clips, the message is already high-fidelity (comes from the source, most of the time). The question is whether people will get a high-fidelity long explanation, or something short but potentially compressed. When weighing things by minute we end up with 0.9 meaning that most of the watchtime-minutes come from the high-fidelity content.
I am not always fully aligned with the clips that I post, but I am mostly aligned with them.
The Inside View (TikTok) - Quality of Audience = 3
I believe the original reasoning for Qa = 2 is that people watching short-form by default would be young and / or have short attention spans, and therefore be less of a high-quality audience.
However, most of my high-performing TikTok clips (that represent most of the watch time) are quite long (2m-3m30s long), which makes me think the kind of audience who watch these until the end are not as different from Youtube.
On top of that, my audience a) skews towards US (33%) or high-income countries (more than half are in US / Australian / UK etc.) and 88% of my audience being over 25, with 61% being above 35. (Data here).
Therefore, in terms of quality of audience, I don’t see why the audience would be worse in quality than people who watch AI Species / AI Risk Network.
Which is why I’m estimating: Qa(The Inside View TikTok) = 3.
Conclusion
If we multiply these three numbers we get Q = 0.75 * 0.45 * 3 = 1
I struggle to imagine Qf 0.9 being reasonable for anything on TikTok. My understanding of TikTok is that most viewers will be idly scrolling through their feed, watch your thing for a bit as part of this endless stream, then continue, and even if they decide to stop for a while and get interested, they still would take long enough to switch out of the endless scrolling mode to not properly engage with large chunks of the video. Is that a correct model, or do you think that eg most of your viewer minutes come from people who stop and engage properly?
Update: after looking at Marcus’ weights, I ended up dividing all the intermediary values of Qf I had by 2, so that it matches with Marcus’ weights where Cognitive Revolution = 0.5. Dividing by 2 caps the best tiktok-minute to the average Cognitive Revolution minute. Neel was correct to claim that 0.9 was way too high.
===
My model is that most of the viewer minutes come from people who watch the all thing, and some decent fraction end up following, which means they’ll end up engaging more with AI-Safety-related content in the future as I post more.
Looking at my most viewed TikTok:
TikTok says 15.5% of viewers (aka 0.155 * 1400000 = 217000) watched the entire thing, and most people who watch the first half end up watching until the end (retention is 18% at half point, and 10% at the end).
And then assuming the 11k who followed came from those 217000 who watched the whole thing, we can say that’s 11000/217000 = 5% of the people who finished the video that end up deciding to see more stuff like that in the future.
So yes, I’d say that if a significant fraction (15.5%) watch the full thing, and 0.155*0.05 = 0.7% of the total end up following, I think that’s “engaging properly”.
And most importantly, most of the viewer-minutes on TikTok do come from these long videos that are 1-4 minutes long (especially ones that are > 2 minutes long):
The short / low-fidelity takes that are 10-20s long don’t get picked up by the new tiktok algorithm, don’t get much views, so didn’t end up in that “TikTok Qa & Qs” sheet of top 10 videos (and for the ones that did, they didn’t really contribute to the total minutes, so to the final Qf).
To show that the Eric Schimdt example above is not cherry-picked, here is a google docs with similar screenshots of stats for the top 10 videos that I use to compute Qf. From these 10 videos, 6 are more than 1m long, and 4 are more than 2 minutes long. The precise distribution is:
0m-1m: 4 videos
1m-2m: 2 videos
2m-3m: 2 videos
3m-4m: 2 videos
Happy for others to come up with different numbers / models for this, or play with my model through the “TikTok Qa & Qf” sheet here, using different intermediary numbers.
Update: as I said at the top, I was actually wrong to have initially said Qf=0.9 given the other values. I now claim that Qf should be closer to 0.45. Neel was right to make that comment.
Thanks for the thoughtful replies, here and elsewhere!
Very much appreciate data corrections! I think medium-to-long term, our goal is to have this info in some kind of database where anyone can suggest data corrections or upload their own numbers, like Wikipedia or Github
Tentatively, I think paid advertising is reasonable to include. Maybe more creators should be buying ads! So long as you’re getting exposure in a cost-effective way and reaching equivalently-good people, I think “spend money/effort to create content” and “spend money/effort to distribute content” are both very reasonable interventions
I don’t have strong takes on quality weightings—Marcus is much more of a video junkie than me, and has spent the last couple weeks with these videos playing constantly, so I’ll let him weigh in. But broadly I do expect people to have very different takes on quality—I’m not expecting people to agree on quality, but rather want people to have the chance to put down their own estimates. (I’m curious to see your takes on all the other channels too!)
Sorry if we didn’t include your feedback in the first post—I think the nature of this project is that waiting for feedback from everyone is going to delay our output by too much, and we’re aiming to post often and wait for corrections in public, mostly because we’re extremely bandwidth constrained (running this on something like 0.25 FTE between Marcus and me)
I would be pretty shocked if paid ads reach equivalently good people, given that these are not people who have chosen to watch the video, and may have very little interest
Thank you both for doing this, I appreciate the effort in trying to get some estimates.
However, I would like to flag that your viewer minute numbers for my short-form content are off by an order of magnitude. And I’ve done 4 full weeks on the Manifund grant, so it’s 4 * $2k = $8k, not $24k.
Plugging these numbers in (google sheets here) I get a QAVM/$ of 389 instead of the 18 you have listed.
Other data corrections:
You say that the FLI podcast has 17M views and costed $500k. However, 17M is the amount of views on the overall FLI channel. If you look at the FLI podcast playlist and just add up the numbers you get something closer to 1M (calculation here). I’m assuming the $500k come from the podcast existing for 5 years and costing ~$100k / year? If so this does not include the three most viewed videos (that probably required a large budget) about slaughter bots and nuclear war (~12M of the 17M views). So really the google sheets should say 1M views, and the viewer minutes should be updated accordingly. (Except if they managed to produce all the slaughter bots / nuclear war stuff with a $500k budget).
Now, regarding watch time, saying that podcasts have a 33% watch time is I think overly optimistic. To give you some idea, in my case a good 12m video with 40k views has ~40% watchtime. And my most viewed podcasts average 12% of watchtime. So I’d say you’re probably off by a factor of 3 for podcast viewer minutes.
Finally, for a couple of these podcasts the views are inflated because the podcasts are promoted via paid advertising. Some creators are quite open about it so you can ask them directly and they’ll tell you. If you really want to know if the views are inflated one way to determine that is to look at the likes / view ratio. For instance, if a podcast has 20,000 views but 50 likes, the ratio is 0.25%. This is 20x to 40x too low (cf. here) for a non-promoted youtube with the same amount views. And if you look closely at a couple of these podcasts you listed you’ll find exactly that. There is no problem in doing that (since it probably helps growth, and it’s good to spend money to have more eyeballs), but if you’re spending money to get views that are non-organic and viewers don’t end up liking your video / engaging (and thus not driving organic reach) then your QAVM/$ is basically the cost of ads on youtube.
If we now look at youtube explainers:
You end up ranking Rational Animations at 8th position in QAVM/$, despite it having the most views. I find this quite surprising because in my opinion RA is the highest-production channel on the list, on par with AI in context. One factor is that you rate RA’s quality of audience as 4, compared to 12 for Robert Miles. I understand this is because you had lots of conversations where Robert Miles name came up and people said he had influenced them.
However, I think Robert Miles’ name comes up first in in people’s minds primarily because he was the earliest AI safety YouTuber. He is also one of the only channels on the list that has his face in videos. So it’s not surprising to me that many people in the community learned about AI Safety through him. The 3x score difference (in quality of audience) with RA seems too high.
Regarding your weights, you place both my TikTok and Youtube channel at 0.1 and 0.2 in quality, which I find surprising, especially Youtube:
My second most watched video is an interview with Robert Miles, so it would be hard to argue that my content is 60x lower quality than Robert Miles’ own videos.
Similarly, for Cognitive Revolution, we were supposed to cross-post my Evan Hubinger interview to his platform, and ended up crossposting my Owain interview instead, so can we really say that his content is 12x higher quality if (in some cases) Nathan would be happy for the content to be literally the same?
Overall, I’m a bit disappointed by your data errors given that I replied to you by DM saying that your first draft missed a lot of important factors & data, and suggested helping you / delaying publication, which you refused.
Update: I’ve now estimated the quality for my long-form youtube content to be Q= 6*0.45*1 = 2.7 for Youtube, and
Q=3*0.9*0.75=2.0Q = 3*0.45*0.75 = 1.0 for TikTok. See details here for Youtube, and here for TikTok. Using these updated weights (see “Michael’s weights” here) I get this final table.Hi Michael, sorry for coming back a bit late. I was flying out of SF
For costs, I’m going to stand strongly by my number here. In fact, I think it should be $26k. I treated everyone the same and counted the value of their time, at their suggested rate, for the amount of time they were doing their work. This affected everyone, and I think it is a much more accurate way to measure things. This affected AI Species and Doom Debates quite severely as well, more so than you as well as others. I briefly touched on this in the post, but I’ll expand here. The goal of this exercise isn’t to measure money spent vs output, but rather cost effectiveness per resource put in. If time is put in unpaid, this should be accounted for since it isn’t going to be unpaid forever. Otherwise, there will be “gamed” cost effectiveness, where you can increase your numbers for a time by not paying yourself. Even if you never planned to take funding, you could spend your time doing other things, and thus there is still a tradeoff. It’s natural/normal for projects at the start to be unpaid and apply for funding later, and for a few months of work to go unpaid. For example, I did this work unpaid, but if I am going to continue to do CEA for longtermist stuff, I will expect to be paid eventually.
In your case, your first TikTok was made on June 15, and given that you post ~1/day, I assume that you basically made the short on the same day. Given I made your calculations on Sept 9⁄10, that’s 13 weeks. In your Manifund post, you are asking for $2k/week, and thus I take that to be your actual cost of doing work. I’m not simply measuring your “work done on the grant” and just accepting the time you did for free beforehand.
2. I’m happy to take data corrections. Undoubtedly, I have made some mistakes since not everyone responded to me, data is a bit messy, etc.
A) For the FLI podcast, I ran numbers for the whole channel, not just the podcast. That means their viewer minutes are calculated over the whole channel. They haven’t gotten back to me yet so I hope to update their numbers when they do. I agree that their metrics have a wide error bar.
B) I was in the process of coming up with better estimates of viewer minutes based on video/podcast length but I stopped because people were responding to me, and I thought it better to just use accurate numbers. I stand by this decision, though I acknowledge the tradeoff.
C) If a video has “inflated” views due to paid advertising, that’s fine. It shows up in the cost part of cost-effectiveness. For example, Cognitive Revolution, who does boosts their videos/advertising, that’s part of their costs. I don’t think its a problem that some viewers are paid for, maybe they see a video they otherwise wouldn’t have. That’s fine. I also acknowledge that others may feel differently about how this translates to impact. That said, no, this won’t reduce QAVM/$ to simply the cost of ads. Ads just don’t work very well without organic views.
3. For Rational Animations showing up low on the list, the primary reason for this is that they spend a boatload of money; nobody else comes close. I’m not saying that’s bad. Its just a fact. They spend more than everyone else combined. Since, I am dividing by dollars, they get a lower VM/$ and thus QAVM/$.
If you wish, you can simply look at VM/$. They score low here too (8th, same as adjusted).
As for giving Robert Miles a high ranking, this came about because Austin really thought Dwarkesh was an AI safety YouTuber, and so I asked ~50 people different variants of the question “who is your favourite AI safety creator”, “Which AI safety YouTubers did/do you watch”, etc. It’s hard to overstate this; Robert Miles was the first person EVERYONE mentioned. I found this surprising since, well, his view counts don’t bear that out. Furthermore, 3 people told me that they work in AI safety because of his videos. I think there is a good case that his adjustment factors should be far HIGHER, not lower.
4. Regarding your weights. I encourage people (and did so in the post) to give their own weights to channels. For this exercise, I watched a lot of AI safety content to get a sense of what is out there. My quality weights were based on this (and discussions with Austin and others). I encourage you to consider each weight separately. Austin added the “total” quality factor at the end, I kinda didn’t want it since I thought it could lead to this. For audience quality, I looked at things like TikTok viewership vs. YouTube/Apple Podcasts. For message fidelity, respectfully, you’re just posting clips of other podcasts and such and this just doesn’t do a great job of getting a message across. For fidelity of message, everyone but Rob Miles got <0.5 since I am comparing to Rob. With a different reference video, I would get different results. For Qm, your number is very similar to others but even still, I found the message to not be the best.
Again, most of the quality factor is being done by audience quality and yes, shorts just have a far lower audience quality.
On the data errors, as expressed above, I don’t think I made data errors. I get the sense, while reading this, that you feel I was “out to get you” or something and was being willfully biased. I want to assure you that this wasn’t the case. Lots of different creators have issues with how I decided to do this analysis, and in general, they wanted the analysis to be done in a way that would give them better numbers. I think that’s human nature, partially, and also that they likely made their content with their assumptions in mind. In the end, I settled on this process as what I (and Austin) found to be the most reasonable, taken everything we learned into account. I am not saying my analysis is the be-all end-all and should dictate where Open Phil money goes tomorrow until further analysis is done.
I hope that explains/answers all your points. I am happy to engage further.
1) Feel free to use $26k. My main issue was that you didn’t
ask me for my viewer minutes for TikTok(EDIT: didn’t follow up to make sure I give you the viewer minutes for TikTok) and instead used a number that is off by a factor of 10. Please use a correct number in future analysis. For June 15 - Sep 10, that’s 4,150,000 minutes, meaning a VM/$ of 160 instead of 18 (details here).A) Your screenshots of google sheets say “FLI podcast”, but you ran your script on the entire channel. And you say that the budget is $500k. Can you confirm what you’re trying to measure here? The entire video work of FLI? Just the podcast? If you’re trying to get the entire channel, is the budget really $500k for the entire thing? I’m confused.
B) If you use accurate numbers for some things and estimate for others, I’d make sure to communicate explicitly about which ones are which. Even then, when you then compare estimates and real numbers there’s a risk that your estimates are off by a a huge factor (has happened with my TikTok numbers), which makes me question the value of the comparisons.
C) Let me try to be clearer regarding paid advertising:
If some of the watchtime estimates you got from people are (views * 33% of length), and they pay $X per view (fixed cost of ads on youtube), then the VM/$ will be: [nb_views * (33% length) / total_cost] = [ nb_views * 33% length] / [nb_views * X] = [33% length / X]. Which is why I mean it’s basically the cost of ads. (Note: I didn’t include the organic views above because I’m assuming they’re negligible compared to the inorganic ones. If you want me to give examples of videos where I see mostly inorganic views, I’ll send you by DM).
For the cases where you got the actual watchtime numbers instead of multiplying the length by a constant or using a script (say, someone tells you they have Y amount of hours total on their channel), or the ads lead to real organic views, your reasoning around ads makes sense, though I’d still argue that in terms of impact the engagement is the low / pretty disastrous in some cases, and does not translate to things we care about (like people taking action).
3. I think the questions “who is your favourite AI safety creator” or “Which AI safety YouTubers did/do you watch” are heavily biased towards Robert Miles, as he is (and has basically been for the past 8 years) the only “AI Safety Youtuber” (like making purely talking head videos about AI Safety, in comparison, RA is a team). So I think based on these questions it’s quite likely he’d be mentioned, though I agree 50 people saying his name first is important data that needs to be taken into account.
That said, I’m trying to wrap my head around how to go from your definition of “quality of audience” to “Robert Miles was chosen by 50 different people to be their favorite youtuber, as the first person mentioned”. My interpretation is that you’re saying: 1) you’ve spoken to 50 people who are people who work in AI Safety 2) they all mentioned Rob as the canonical Youtuber, so “therefore” A) Rob has the highest quality audience? (cf. you wrote in OP “This led me to make the “audience quality” category and rate his audience much higher.”)
My model for how this claim could be true that 1) you asked 50 people who you all thought were “high quality” audience 2) they all mentioned rob and nobody else (or rarely nobody else), so 3) you inferred “high quality audience ⇒ watches Rob” and therefore 4) also inferred “watches Rob ⇒ high quality”?
4. Regarding weights, also respectfully, I did indeed look at them individually. You can check my analysis for what I think the TikTok individual weights should be here. For Youtube see here. Regarding your points:
I have posted in my analysis of tiktok a bunch of datapoints that you probably don’t have about the fact that my audience is mostly older high-income people from richer countries, which is unusually good for TikTok. Which is why I put 3 instead of your 2.
“you’re just posting clips of other podcasts and such and this just doesn’t do a great job of getting a message across” → the clips that end up making the majority of viewer minutes are actually quite high fidelity since they’re quite long (2-4m long) and get the message more crisply than the average podcast minute. Anyway, once you look at my TikTok analysis you’ll see that I ended up dividing everything by 2 to have the max fidelity tiktok have 0.5 (same as Cognitive Revolution), which means my number is Qf=0.45 at the end (instead of your 0.1) to just be coherent with the rest of your numbers.
Qm: that’s subjective but FWIW I myself only align to 0.75 to my TikTok and not 1 (see analysis)
“Again, most of the quality factor is being done by audience quality and yes, shorts just have a far lower audience quality.” --> again, respectfully, from looking at your tables I think this is false. You rank the fidelity of TikTok as 0.1, which is 5x less than 4 other channels. No other channels except my content (TikTok & YT) has less 0.3. In comparison, if you forget about rob’s row, the audience quality varies only by 3x between my Qa for TikTok and the rest. So no, the quality factor is not mainly done by audience quality.
On 1, with your permission, I’d ask if I could share a screenshot of me asking you in DMs, directly, for viewer minutes. You gave me views, and thus I multiplied the average TikTok length and by a factor for % watched.
On A, yes, the FLI Podcast was perhaps the data point I did the most estimating for a variety of reasons I explained before.
On B, I think you can, in fact, find which are and aren’t estimates though I do understand how it’s not clear. We considered ways of doing this without being messy. Ill try to make it more clear.
On C, how much you pay for a view is not a constant though. It depends a lot on organic views. And I think boosting videos is a sensible strategy since you put $ into both production costs (time, equipment, etc.) and advertisement. FIguring out how to spend that money efficiently is important.
On 3, many other people were mentioned. In fact, I found a couple of creators this way. But yes, it was extremely striking and thus suggested that this was a very important factor in the analysis. I want to stress that I do in fact, think that this matters a lot. When Austin and I were speaking and relying on comparisons, we thought his quality numbers should be much higher in fact, we toned it down though maybe we shouldn’t have.
To give clarity, I didn’t seek people out who worked in AI safety. Here’s what I did to the best of my recollection.
Over the course of 3 days, I asked anyone I saw in Mox who seemed friendly enough, as well as Taco Tuesday, and sent a few DMs to acquaintances. The DMs I sent were to people who work in AI safety, but there were only 4. So ~46 came from people hanging out around Mox and Taco Tuesday.
I will grant that this lends to an SF/AI safety bias. Now, Rob Miles’ audience comes heavily from Computerphile and such whose audience is largely young people interested in STEM who like to grapple with interesting academic-y problems in their spare time (outside of school). In other words, this is an audience that we care a lot about reaching. It’s hard to overstate the possible variance in audience “quality”. For example, Jane Street pays millions to advertisers to get itself seen in front of potential traders on channels like Stand-Up Maths or the Dwarkesh podcast. These channels don’t actually get that many views compared to others but they have a very high “audience quality”, clearly, based on how much trading firms are willing to pay to advertise there. We actually thought a decent, though imperfect, metric for audience quality would just be a person’s income compared to the world average of ~12k. This meant the average american would have an audience quality of 7. Austin and I thought this might be a bit too controversial and doesn’t capture exaxctly what we mean (we care about attracking a poor MIT CS student more than a mid-level real estate developer in Miami) but it’s a decent approximation.
Audience quality is roughly something like “the people we care most about reaching,” and thus “people who can go into work on technical AI safety” seems very important.
Rob wasn’t the only one mentioned, the next most popular were Cognitive Revolution and AI in context (people often said “Aric”) since I asked them to just name anyone they listen to/would consider an AI safety youtuber, etc.
On 4, I greatly encourage people to input their own weights, I specifically put that in the doc and part of the reason for doing this project was to get people to talk about cost effectiveness in AI safety.
On my bias:
Like all human beings, I’m flawed and have biases, but I did my best to just objectively look at data in what I thought the best way possible. I appreciate that you talked to others regarding my intentions.
I’ll happily link to my comments on Manifund 1 2 3 you may be referring to for people to see the full comments and perhaps emphasize some points I wrote
Many people can tell you that I have a problem with the free-spending, lavish and often wasteful spending in the longtermist side of EA. I think I made it pretty clear that I was using this RFP as an example because other regrantors gave to it.
This project with Austin was planned to happen before you posted your RFP on Manifund (I can provide proof if you’d like).
I wasn’t playing around with the weights to make you come out lower. I assure you, my bias is usually against projects I perceive to be “free-spending”.
I think it’s good/natural to try to create separation between evaluators/projects though.
For context, you asked me for data for something you were planning (at the time) to publish day-off. There’s no way to get the watchtime easily on TikTok (which is why I had to do manual addition of things on a computer) and I was not on my laptop, so couldn’t do it when you messaged me. You didn’t follow up to clarify that watchtime was actually the key metric in your system and you actually needed that number.
Good to know that the 50 people were 4 Safety people and 46 people who hang at Mox and Taco Tuesday. I understand you’re trying to reach the MIT-graduate working in AI who might somehow transition to AI Safety work at a lab / constellation. I know that Dwarkesh & Nathan are quite popular with that crowd, and I have a lot of respect for what Aric (& co) did, so the data you collected make a lot of sense to me. I think I can start to understand why you gave a lower score to Rational Animations or other stuff like AIRN.
I’m now modeling you as trying to answer something like “how do we cost-effectively feed AI Safety ideas to the kind of people who walk in at Taco Tuesday, who have the potential to be good AI Safety researchers”. Given that, I can now understand better how you ended up giving some higher score to Cognitive Revolution and Robert Miles.
Your points seem pretty fair to me. In particular, I agree that putting your videos at 0.2 seems pretty unreasonable and out of line with the other channels—I would have guessed that you’re sufficiently niche that a lot of your viewers are already interested in AI Safety! TikTok I expect is pretty awful, so 0.1 might be reasonable there
Agreed that the quality of audience is definitely higher for my (niche) AI Safety content on Youtube, and I’d expect Q to be higher for (longform) Youtube than Tiktok.
In particular, I estimate Q(The Inside View Youtube) = 2.7, instead of 0.2, with (Qa, Qf, Qm) = (6, 0.45, 1), though I acknowledge that Qm is (by definition) the most subjective.
To make this easier to read & reply to, I’ll post my analysis for Q(The Inside View Tiktok) in another comment, which I’ll link to when it’s up. EDIT: link for TikTok analysis here.
The Inside View (Youtube) - Qa = 6
In light of @Drew Spartz’s comment (saying one way to quantify the quality of audience would be to look at the CPM [1]), I’ve compiled my CPM Youtube data and my average Playback-based CPM is $14.8, which according to this website [2] would put my CPM above the 97.5 percentile in the UK, and close to the 97.5 percentile in the US.
Now, this is more anecdotal evidence than data-based, but I’ve met quite a few people over the years (from programs like MATS, or working at AI Safety orgs) who’ve told me they discovered AI Safety from my Inside View podcast. And I expect the SB-1047 documentary to have attracted a niche audience interested in AI regulation.
Given the above, I think it would make sense to have the Qa(Youtube) be between 6 (same as other technical podcasts) and 12 (Robert Miles). For the sake of giving a concrete number, I’ll say 6 to be on par with other podcasts like FLI and CR.
The Inside View (Youtube) - Qf = 0.45
In the paragraph below I’ll say Qf_M for the Qf that Marcus assigns to other creators.
For the fidelity of message, I think it’s a bit of a mixed bag here. As I said previously, I expect the podcasts that Nathan would be willing to crosspost to be on par with his channel’s quality, so in that sense I’d say the fidelity of message for these technical episodes (Owain Evans, Evan Hubinger) to be on par with CR (Qf_M = 0.5). Some of my non-technical interviews are probably closer to discussions we could find on Doom Debates (Qf_M = 0.4), though there are less of them. My SB-1047 documentary is probably similar in fidelity of message to AI in context (Qf_M = 0.5), and this fictional scenario is very similar to Drew’s content (Qf_M = 0.5). I’ve also posted video explainers that range from low effort (Qf around 0.4?) to very high effort (Qf around 0.5?).
Given all of the above, I’d say the Qf for the entire channel is probably around 0.45.
The Inside View (Youtube) - Qm = 1
As you say, for the alignment of message, this is probably the most subjective. I think by definition the content I post is the message that aligns the most with my values (at least for my Youtube content) so I’d say 1 here.
The Inside View (Youtube) - Q = 2.7
Multiplying these numbers I get Q = 2.7. Doing a sanity check, this seems about the same as Cognitive Revolution, which doesn’t seem crazy given we’ve interviewed similar people & the cross-post arguments I’ve said before.
(Obviously if I was to modify all of these Qa, Qf, Qm numbers for all channels I’d probably end up with different quality comparisons).
CPM means Cost Per Mille. In YT Studio it’s defined as “How much advertisers pay every thousand times your Watch Page content is viewed with ads.”
I haven’t done extended research here and expect I’d probably get different results looking at different websites. This one was the first one I found on google so not cherry-picked.
I answered Michael directly on the parent. Hopefully, that gives some colour.
This comment is answering “TikTok I expect is pretty awful, so 0.1 might be reasonable there”. For my previous estimate on the quality of my Youtube long-form stuff, see this comment.
tl;dr: I now estimate the quality of my TikTok content to be Q = 0.75 * 0.45 * 3 = 1
The Inside View (TikTok) - Alignment = 0.75 & Fidelity = 0.45
To estimate fidelity of message (Qf) and alignment of message (Qm) in a systematic way, I compiled my top 10 most performing tiktoks and ranked their individual Qf and Qm (see tab called “TikTok Qa & Qf” here, which contains the reasoning for each individual number).
Update Sep 14: I’ve realized that my numbers about fidelity used 1 as the maximum, but now that I’ve looked at Marcus’ weights for other stuff, I think I should use 0.5 because that’s the number he gives to a podcast like Cognitive Revolution, and I don’t want to claim that a long tiktok clip is more high-fidelity than the average Cognitive Revolution podcast. So I divided everything by 2 so my maximum fidelity is now 0.5 to match Marcus’ other weights.
Then, by doing a minute-adjusted weighted average of the Qas and Qfs I get:
Qf(The Inside View TikTok) = 0.45
Qm(The Inside View TikTok) = 0.75
What this means:
Since I’m editing clips, the message is already high-fidelity (comes from the source, most of the time). The question is whether people will get a high-fidelity long explanation, or something short but potentially compressed. When weighing things by minute we end up with 0.9 meaning that most of the watchtime-minutes come from the high-fidelity content.
I am not always fully aligned with the clips that I post, but I am mostly aligned with them.
The Inside View (TikTok) - Quality of Audience = 3
I believe the original reasoning for Qa = 2 is that people watching short-form by default would be young and / or have short attention spans, and therefore be less of a high-quality audience.
However, most of my high-performing TikTok clips (that represent most of the watch time) are quite long (2m-3m30s long), which makes me think the kind of audience who watch these until the end are not as different from Youtube.
On top of that, my audience a) skews towards US (33%) or high-income countries (more than half are in US / Australian / UK etc.) and 88% of my audience being over 25, with 61% being above 35. (Data here).
Therefore, in terms of quality of audience, I don’t see why the audience would be worse in quality than people who watch AI Species / AI Risk Network.
Which is why I’m estimating: Qa(The Inside View TikTok) = 3.
Conclusion
If we multiply these three numbers we get Q = 0.75 * 0.45 * 3 = 1
I struggle to imagine Qf 0.9 being reasonable for anything on TikTok. My understanding of TikTok is that most viewers will be idly scrolling through their feed, watch your thing for a bit as part of this endless stream, then continue, and even if they decide to stop for a while and get interested, they still would take long enough to switch out of the endless scrolling mode to not properly engage with large chunks of the video. Is that a correct model, or do you think that eg most of your viewer minutes come from people who stop and engage properly?
Update: after looking at Marcus’ weights, I ended up dividing all the intermediary values of Qf I had by 2, so that it matches with Marcus’ weights where Cognitive Revolution = 0.5. Dividing by 2 caps the best tiktok-minute to the average Cognitive Revolution minute. Neel was correct to claim that 0.9 was way too high.
===
My model is that most of the viewer minutes come from people who watch the all thing, and some decent fraction end up following, which means they’ll end up engaging more with AI-Safety-related content in the future as I post more.
Looking at my most viewed TikTok:
TikTok says 15.5% of viewers (aka 0.155 * 1400000 = 217000) watched the entire thing, and most people who watch the first half end up watching until the end (retention is 18% at half point, and 10% at the end).
And then assuming the 11k who followed came from those 217000 who watched the whole thing, we can say that’s 11000/217000 = 5% of the people who finished the video that end up deciding to see more stuff like that in the future.
So yes, I’d say that if a significant fraction (15.5%) watch the full thing, and 0.155*0.05 = 0.7% of the total end up following, I think that’s “engaging properly”.
And most importantly, most of the viewer-minutes on TikTok do come from these long videos that are 1-4 minutes long (especially ones that are > 2 minutes long):
The short / low-fidelity takes that are 10-20s long don’t get picked up by the new tiktok algorithm, don’t get much views, so didn’t end up in that “TikTok Qa & Qs” sheet of top 10 videos (and for the ones that did, they didn’t really contribute to the total minutes, so to the final Qf).
To show that the Eric Schimdt example above is not cherry-picked, here is a google docs with similar screenshots of stats for the top 10 videos that I use to compute Qf. From these 10 videos, 6 are more than 1m long, and 4 are more than 2 minutes long. The precise distribution is:
0m-1m: 4 videos
1m-2m: 2 videos
2m-3m: 2 videos
3m-4m: 2 videos
Happy for others to come up with different numbers / models for this, or play with my model through the “TikTok Qa & Qf” sheet here, using different intermediary numbers.
Update: as I said at the top, I was actually wrong to have initially said Qf=0.9 given the other values. I now claim that Qf should be closer to 0.45. Neel was right to make that comment.
Thanks for the thoughtful replies, here and elsewhere!
Very much appreciate data corrections! I think medium-to-long term, our goal is to have this info in some kind of database where anyone can suggest data corrections or upload their own numbers, like Wikipedia or Github
Tentatively, I think paid advertising is reasonable to include. Maybe more creators should be buying ads! So long as you’re getting exposure in a cost-effective way and reaching equivalently-good people, I think “spend money/effort to create content” and “spend money/effort to distribute content” are both very reasonable interventions
I don’t have strong takes on quality weightings—Marcus is much more of a video junkie than me, and has spent the last couple weeks with these videos playing constantly, so I’ll let him weigh in. But broadly I do expect people to have very different takes on quality—I’m not expecting people to agree on quality, but rather want people to have the chance to put down their own estimates. (I’m curious to see your takes on all the other channels too!)
Sorry if we didn’t include your feedback in the first post—I think the nature of this project is that waiting for feedback from everyone is going to delay our output by too much, and we’re aiming to post often and wait for corrections in public, mostly because we’re extremely bandwidth constrained (running this on something like 0.25 FTE between Marcus and me)
I would be pretty shocked if paid ads reach equivalently good people, given that these are not people who have chosen to watch the video, and may have very little interest
Oh definitely! I agree that by default, paid ads reach lower-quality & less-engaged audiences, and the question would be how much to adjust that by.
(though paid ads might work better for a goal of reaching new people, of increasing total # of people who have heard of core AI safety ideas)