Your points seem pretty fair to me. In particular, I agree that putting your videos at 0.2 seems pretty unreasonable and out of line with the other channels—I would have guessed that you’re sufficiently niche that a lot of your viewers are already interested in AI Safety! TikTok I expect is pretty awful, so 0.1 might be reasonable there
Agreed that the quality of audience is definitely higher for my (niche) AI Safety content on Youtube, and I’d expect Q to be higher for (longform) Youtube than Tiktok.
In particular, I estimate Q(The Inside View Youtube) = 2.7, instead of 0.2, with (Qa, Qf, Qm) = (6, 0.45, 1), though I acknowledge that Qm is (by definition) the most subjective.
To make this easier to read & reply to, I’ll post my analysis for Q(The Inside View Tiktok) in another comment, which I’ll link to when it’s up. EDIT: link for TikTok analysis here.
The Inside View (Youtube) - Qa = 6
In light of @Drew Spartz’s comment (saying one way to quantify the quality of audience would be to look at the CPM [1]), I’ve compiled my CPM Youtube data and my average Playback-based CPM is $14.8, which according to this website[2]would put my CPM above the 97.5 percentile in the UK, and close to the 97.5 percentile in the US.
Now, this is more anecdotal evidence than data-based, but I’ve met quite a few people over the years (from programs like MATS, or working at AI Safety orgs) who’ve told me they discovered AI Safety from my Inside View podcast. And I expect the SB-1047 documentary to have attracted a niche audience interested in AI regulation.
Given the above, I think it would make sense to have the Qa(Youtube) be between 6 (same as other technical podcasts) and 12 (Robert Miles). For the sake of giving a concrete number, I’ll say 6 to be on par with other podcasts like FLI and CR.
The Inside View (Youtube) - Qf = 0.45
In the paragraph below I’ll say Qf_M for the Qf that Marcus assigns to other creators.
For the fidelity of message, I think it’s a bit of a mixed bag here. As I said previously, I expect the podcasts that Nathan would be willing to crosspost to be on par with his channel’s quality, so in that sense I’d say the fidelity of message for these technical episodes (Owain Evans, Evan Hubinger) to be on par with CR (Qf_M = 0.5). Some of my non-technicalinterviews are probably closer to discussions we could find on Doom Debates (Qf_M = 0.4), though there are less of them. My SB-1047 documentary is probably similar in fidelity of message to AI in context (Qf_M = 0.5), and this fictional scenario is very similar to Drew’s content (Qf_M = 0.5). I’ve also posted video explainers that range from low effort (Qf around 0.4?) to very high effort (Qf around 0.5?).
Given all of the above, I’d say the Qf for the entire channel is probably around 0.45.
The Inside View (Youtube) - Qm = 1
As you say, for the alignment of message, this is probably the most subjective. I think by definition the content I post is the message that aligns the most with my values (at least for my Youtube content) so I’d say 1 here.
The Inside View (Youtube) - Q = 2.7
Multiplying these numbers I get Q = 2.7. Doing a sanity check, this seems about the same as Cognitive Revolution, which doesn’t seem crazy given we’ve interviewed similar people & the cross-post arguments I’ve said before.
(Obviously if I was to modify all of these Qa, Qf, Qm numbers for all channels I’d probably end up with different quality comparisons).
I haven’t done extended research here and expect I’d probably get different results looking at different websites. This one was the first one I found on google so not cherry-picked.
This comment is answering “TikTok I expect is pretty awful, so 0.1 might be reasonable there”. For my previous estimate on the quality of my Youtube long-form stuff, see this comment.
tl;dr: I now estimate the quality of my TikTok content to be Q = 0.75 * 0.45 * 3 = 1
To estimate fidelity of message (Qf) and alignment of message (Qm) in a systematic way, I compiled my top 10 most performing tiktoks and ranked their individual Qf and Qm (see tab called “TikTok Qa & Qf” here, which contains the reasoning for each individual number).
Update Sep 14: I’ve realized that my numbers about fidelity used 1 as the maximum, but now that I’ve looked at Marcus’ weights for other stuff, I think I should use 0.5 because that’s the number he gives to a podcast like Cognitive Revolution, and I don’t want to claim that a long tiktok clip is more high-fidelity than the average Cognitive Revolution podcast. So I divided everything by 2 so my maximum fidelity is now 0.5 to match Marcus’ other weights.
Then, by doing a minute-adjusted weighted average of the Qas and Qfs I get:
Qf(The Inside View TikTok) = 0.45
Qm(The Inside View TikTok) = 0.75
What this means:
Since I’m editing clips, the message is already high-fidelity (comes from the source, most of the time). The question is whether people will get a high-fidelity long explanation, or something short but potentially compressed. When weighing things by minute we end up with 0.9 meaning that most of the watchtime-minutes come from the high-fidelity content.
I am not always fully aligned with the clips that I post, but I am mostly aligned with them.
The Inside View (TikTok) - Quality of Audience = 3
I believe the original reasoning for Qa = 2 is that people watching short-form by default would be young and / or have short attention spans, and therefore be less of a high-quality audience.
However, most of my high-performing TikTok clips (that represent most of the watch time) are quite long (2m-3m30s long), which makes me think the kind of audience who watch these until the end are not as different from Youtube.
On top of that, my audience a) skews towards US (33%) or high-income countries (more than half are in US / Australian / UK etc.) and 88% of my audience being over 25, with 61% being above 35. (Data here).
Therefore, in terms of quality of audience, I don’t see why the audience would be worse in quality than people who watch AI Species / AI Risk Network.
Which is why I’m estimating: Qa(The Inside View TikTok) = 3.
Conclusion
If we multiply these three numbers we get Q = 0.75 * 0.45 * 3 = 1
I struggle to imagine Qf 0.9 being reasonable for anything on TikTok. My understanding of TikTok is that most viewers will be idly scrolling through their feed, watch your thing for a bit as part of this endless stream, then continue, and even if they decide to stop for a while and get interested, they still would take long enough to switch out of the endless scrolling mode to not properly engage with large chunks of the video. Is that a correct model, or do you think that eg most of your viewer minutes come from people who stop and engage properly?
Update: after looking at Marcus’ weights, I ended up dividing all the intermediary values of Qf I had by 2, so that it matches with Marcus’ weights where Cognitive Revolution = 0.5. Dividing by 2 caps the best tiktok-minute to the average Cognitive Revolution minute. Neel was correct to claim that 0.9 was way too high.
===
My model is that most of the viewer minutes come from people who watch the all thing, and some decent fraction end up following, which means they’ll end up engaging more with AI-Safety-related content in the future as I post more.
Looking at my most viewed TikTok:
TikTok says 15.5% of viewers (aka 0.155 * 1400000 = 217000) watched the entire thing, and most people who watch the first half end up watching until the end (retention is 18% at half point, and 10% at the end).
And then assuming the 11k who followed came from those 217000 who watched the whole thing, we can say that’s 11000/217000 = 5% of the people who finished the video that end up deciding to see more stuff like that in the future.
So yes, I’d say that if a significant fraction (15.5%) watch the full thing, and 0.155*0.05 = 0.7% of the total end up following, I think that’s “engaging properly”.
And most importantly, most of the viewer-minutes on TikTok do come from these long videos that are 1-4 minutes long (especially ones that are > 2 minutes long):
The short / low-fidelity takes that are 10-20s long don’t get picked up by the new tiktok algorithm, don’t get much views, so didn’t end up in that “TikTok Qa & Qs” sheet of top 10 videos (and for the ones that did, they didn’t really contribute to the total minutes, so to the final Qf).
To show that the Eric Schimdt example above is not cherry-picked, here is a google docs with similar screenshots of stats for the top 10 videos that I use to compute Qf. From these 10 videos, 6 are more than 1m long, and 4 are more than 2 minutes long. The precise distribution is:
0m-1m: 4 videos
1m-2m: 2 videos
2m-3m: 2 videos
3m-4m: 2 videos
Happy for others to come up with different numbers / models for this, or play with my model through the “TikTok Qa & Qf” sheet here, using different intermediary numbers.
Update: as I said at the top, I was actually wrong to have initially said Qf=0.9 given the other values. I now claim that Qf should be closer to 0.45. Neel was right to make that comment.
Your points seem pretty fair to me. In particular, I agree that putting your videos at 0.2 seems pretty unreasonable and out of line with the other channels—I would have guessed that you’re sufficiently niche that a lot of your viewers are already interested in AI Safety! TikTok I expect is pretty awful, so 0.1 might be reasonable there
Agreed that the quality of audience is definitely higher for my (niche) AI Safety content on Youtube, and I’d expect Q to be higher for (longform) Youtube than Tiktok.
In particular, I estimate Q(The Inside View Youtube) = 2.7, instead of 0.2, with (Qa, Qf, Qm) = (6, 0.45, 1), though I acknowledge that Qm is (by definition) the most subjective.
To make this easier to read & reply to, I’ll post my analysis for Q(The Inside View Tiktok) in another comment, which I’ll link to when it’s up. EDIT: link for TikTok analysis here.
The Inside View (Youtube) - Qa = 6
In light of @Drew Spartz’s comment (saying one way to quantify the quality of audience would be to look at the CPM [1]), I’ve compiled my CPM Youtube data and my average Playback-based CPM is $14.8, which according to this website [2] would put my CPM above the 97.5 percentile in the UK, and close to the 97.5 percentile in the US.
Now, this is more anecdotal evidence than data-based, but I’ve met quite a few people over the years (from programs like MATS, or working at AI Safety orgs) who’ve told me they discovered AI Safety from my Inside View podcast. And I expect the SB-1047 documentary to have attracted a niche audience interested in AI regulation.
Given the above, I think it would make sense to have the Qa(Youtube) be between 6 (same as other technical podcasts) and 12 (Robert Miles). For the sake of giving a concrete number, I’ll say 6 to be on par with other podcasts like FLI and CR.
The Inside View (Youtube) - Qf = 0.45
In the paragraph below I’ll say Qf_M for the Qf that Marcus assigns to other creators.
For the fidelity of message, I think it’s a bit of a mixed bag here. As I said previously, I expect the podcasts that Nathan would be willing to crosspost to be on par with his channel’s quality, so in that sense I’d say the fidelity of message for these technical episodes (Owain Evans, Evan Hubinger) to be on par with CR (Qf_M = 0.5). Some of my non-technical interviews are probably closer to discussions we could find on Doom Debates (Qf_M = 0.4), though there are less of them. My SB-1047 documentary is probably similar in fidelity of message to AI in context (Qf_M = 0.5), and this fictional scenario is very similar to Drew’s content (Qf_M = 0.5). I’ve also posted video explainers that range from low effort (Qf around 0.4?) to very high effort (Qf around 0.5?).
Given all of the above, I’d say the Qf for the entire channel is probably around 0.45.
The Inside View (Youtube) - Qm = 1
As you say, for the alignment of message, this is probably the most subjective. I think by definition the content I post is the message that aligns the most with my values (at least for my Youtube content) so I’d say 1 here.
The Inside View (Youtube) - Q = 2.7
Multiplying these numbers I get Q = 2.7. Doing a sanity check, this seems about the same as Cognitive Revolution, which doesn’t seem crazy given we’ve interviewed similar people & the cross-post arguments I’ve said before.
(Obviously if I was to modify all of these Qa, Qf, Qm numbers for all channels I’d probably end up with different quality comparisons).
CPM means Cost Per Mille. In YT Studio it’s defined as “How much advertisers pay every thousand times your Watch Page content is viewed with ads.”
I haven’t done extended research here and expect I’d probably get different results looking at different websites. This one was the first one I found on google so not cherry-picked.
I answered Michael directly on the parent. Hopefully, that gives some colour.
This comment is answering “TikTok I expect is pretty awful, so 0.1 might be reasonable there”. For my previous estimate on the quality of my Youtube long-form stuff, see this comment.
tl;dr: I now estimate the quality of my TikTok content to be Q = 0.75 * 0.45 * 3 = 1
The Inside View (TikTok) - Alignment = 0.75 & Fidelity = 0.45
To estimate fidelity of message (Qf) and alignment of message (Qm) in a systematic way, I compiled my top 10 most performing tiktoks and ranked their individual Qf and Qm (see tab called “TikTok Qa & Qf” here, which contains the reasoning for each individual number).
Update Sep 14: I’ve realized that my numbers about fidelity used 1 as the maximum, but now that I’ve looked at Marcus’ weights for other stuff, I think I should use 0.5 because that’s the number he gives to a podcast like Cognitive Revolution, and I don’t want to claim that a long tiktok clip is more high-fidelity than the average Cognitive Revolution podcast. So I divided everything by 2 so my maximum fidelity is now 0.5 to match Marcus’ other weights.
Then, by doing a minute-adjusted weighted average of the Qas and Qfs I get:
Qf(The Inside View TikTok) = 0.45
Qm(The Inside View TikTok) = 0.75
What this means:
Since I’m editing clips, the message is already high-fidelity (comes from the source, most of the time). The question is whether people will get a high-fidelity long explanation, or something short but potentially compressed. When weighing things by minute we end up with 0.9 meaning that most of the watchtime-minutes come from the high-fidelity content.
I am not always fully aligned with the clips that I post, but I am mostly aligned with them.
The Inside View (TikTok) - Quality of Audience = 3
I believe the original reasoning for Qa = 2 is that people watching short-form by default would be young and / or have short attention spans, and therefore be less of a high-quality audience.
However, most of my high-performing TikTok clips (that represent most of the watch time) are quite long (2m-3m30s long), which makes me think the kind of audience who watch these until the end are not as different from Youtube.
On top of that, my audience a) skews towards US (33%) or high-income countries (more than half are in US / Australian / UK etc.) and 88% of my audience being over 25, with 61% being above 35. (Data here).
Therefore, in terms of quality of audience, I don’t see why the audience would be worse in quality than people who watch AI Species / AI Risk Network.
Which is why I’m estimating: Qa(The Inside View TikTok) = 3.
Conclusion
If we multiply these three numbers we get Q = 0.75 * 0.45 * 3 = 1
I struggle to imagine Qf 0.9 being reasonable for anything on TikTok. My understanding of TikTok is that most viewers will be idly scrolling through their feed, watch your thing for a bit as part of this endless stream, then continue, and even if they decide to stop for a while and get interested, they still would take long enough to switch out of the endless scrolling mode to not properly engage with large chunks of the video. Is that a correct model, or do you think that eg most of your viewer minutes come from people who stop and engage properly?
Update: after looking at Marcus’ weights, I ended up dividing all the intermediary values of Qf I had by 2, so that it matches with Marcus’ weights where Cognitive Revolution = 0.5. Dividing by 2 caps the best tiktok-minute to the average Cognitive Revolution minute. Neel was correct to claim that 0.9 was way too high.
===
My model is that most of the viewer minutes come from people who watch the all thing, and some decent fraction end up following, which means they’ll end up engaging more with AI-Safety-related content in the future as I post more.
Looking at my most viewed TikTok:
TikTok says 15.5% of viewers (aka 0.155 * 1400000 = 217000) watched the entire thing, and most people who watch the first half end up watching until the end (retention is 18% at half point, and 10% at the end).
And then assuming the 11k who followed came from those 217000 who watched the whole thing, we can say that’s 11000/217000 = 5% of the people who finished the video that end up deciding to see more stuff like that in the future.
So yes, I’d say that if a significant fraction (15.5%) watch the full thing, and 0.155*0.05 = 0.7% of the total end up following, I think that’s “engaging properly”.
And most importantly, most of the viewer-minutes on TikTok do come from these long videos that are 1-4 minutes long (especially ones that are > 2 minutes long):
The short / low-fidelity takes that are 10-20s long don’t get picked up by the new tiktok algorithm, don’t get much views, so didn’t end up in that “TikTok Qa & Qs” sheet of top 10 videos (and for the ones that did, they didn’t really contribute to the total minutes, so to the final Qf).
To show that the Eric Schimdt example above is not cherry-picked, here is a google docs with similar screenshots of stats for the top 10 videos that I use to compute Qf. From these 10 videos, 6 are more than 1m long, and 4 are more than 2 minutes long. The precise distribution is:
0m-1m: 4 videos
1m-2m: 2 videos
2m-3m: 2 videos
3m-4m: 2 videos
Happy for others to come up with different numbers / models for this, or play with my model through the “TikTok Qa & Qf” sheet here, using different intermediary numbers.
Update: as I said at the top, I was actually wrong to have initially said Qf=0.9 given the other values. I now claim that Qf should be closer to 0.45. Neel was right to make that comment.