It’s a bot so it does it automatically. We figured since the vast majority of people will want to have their content read by more people that it made more sense to by default convert and let people opt out by filling out this form.
I do think that there’s an interesting fuzzy boundary here between “derivative work” and “interpretative tool”.
e.g. with the framing “turn it into a podcast” I feel kind of uncomfortable and gut-level wish I was consulted on that happening to any of my posts.
But here’s another framing: it’s pretty easy to imagine a near-future world where anyone who wants can have a browser extension which will read things to them at this quality level rather than having visual fonts. If I ask “am I in favour of people having access to that browser extension?”, I’m a fairly unambiguous yes. And then the current project can be seen as selectively providing early access to that technology. And that seems … pretty fine?
This actually makes me more favourable to the version with automated rather than human readers. Human readers would make it seem more like a derivative work, whereas the automation makes the current thing seem closer to an interpretative tool.
As far as I understand, text to speech browser extensions do exist, but I haven’t tested their quality relative to this one.
Extensions Read Aloud: A Text to Speech Voice Reader
Read aloud the current web-page article with one click, using text to speech (TTS). Supports 40+ languages.
Read Aloud uses text-to-speech (TTS) technology to convert webpage text to audio. It works on a variety of websites, including news sites, blogs, fan fiction, publications, textbooks, school and class websites, and online university course materials.
I feel like I am writing an Elon Musk tweet but in case anyone is interested:
As the post mentions, basically the speech quality comes from using the newer TTS models. From the sound, I think this is using Amazon Polly, the voice “Matthew, Male”.
I am an imposter, but I know this because I “made an app” for myself that does general TTS for local docs and websites.
The relevant code that produces the voices is two lines long. I did not check but I would be surprised if there was not a browser extension already.
If EAs think that a browser extension would be valuable, or want really any of the permutations of forum/comment or other services, my guess is that a working quality project and full deployment could be made for $30,000 or maybe as little as $3,000 (the crux is operational like handling payment, accounts, as well as interpretation of the value of project management)
If there is interest I think we can just put this in EAIF or the future fund or something.
And they do already exist, they just have the limitations I mentioned in the blog (glitchy, no playlists, not multi-platform, etc). Here are the ones I use for various purposes
If we had to ask each person before converting their text to audio, it just wouldn’t feasibly happen.
The way we thought about it was that >99.9% of people will be thrilled to have their writing read by more people. For the <0.01% who won’t, we made it easy for them to opt out, either for a particular article or for all their work. This way the whole community and hundreds of EAs can have access to great content and the <0.01% can also keep their writing in only written format.
This way everybody wins. :) I think the utilitarian case for it is strong.
Especially once it’s more known, people will know that if their post gets enough upvotes it’ll be converted, so they can request to not have it converted beforehand if they want. The potential harm is small and mitigated and the potential upside is huge.
FWIW I think I endorse Kat’s reasoning here. I don’t think it matters if it is illegal if I’m correct in suspecting that the only people who could bring a copyright claim are the authors, and assuming the authors are happy with the system being used. This is analogous to the way it is illegal, by violating minimum wage laws, to do work for your own company without paying yourself, but the only person who has standing to sue you is AFAIK yourself.
Not a lawyer, not claiming to know the legal details of these cases, but I think this standing thing is real and an appropriate way to handle
I’ve seen this reasoning a lot, where EA organisations assume they won’t get sued because the only people they’re illegally using the data of are other EAs, and and as someone whose data has been misused with this reasoning, I don’t love it!
This is a reason to fix the system! My point is that it reduces to “make all the authors happy with how you are doing things”, there is not some spooky extra thing having to do with illegality
TBC I do not endorse using people’s content in a way they aren’t happy with, but I would still have that same belief if it wasn’t illegal at all to do so.
And there are various things one could probably do to make it not illegal but still messed up and the wrong thing to do! Like make it mandatory to check a box saying you waive your copyright for audio on a thing before you post on the forum. I think if, like some of the tech companies, you made this box really little and hard to find, most people would not change their posting behavior very much, and would now be totally legal (by assumption).
I am generally pro more audio content, and sceptical of the goodness of current copyright laws.
<0.01% seems overconfident to me.
Once something is up on the internet, it’s up forever. Taking it down post-facto doesn’t actually undo the damage.
You only need one person to sue you for things to go quite badly wrong.
As such, flagrantly violating the law on a fairly large scale (and the scale is an important part of the pitch here) seems like a dubious idea. Especially if you also go on public record in a way that suggests you know it’s illegal and don’t care.
<0.01% is definitely overconfident given at that point I had already expressed misgivings and we do not have 10,000 authors on the EA Forum.
(I’m not against my writing being podcastified in principle but I want to check out any podcast services who broadcast my work in advance to decide if I’m happy to be associated with them. I’m strongly against someone else making that decision for me.)
Once something is up on the internet, it’s up forever. Taking it down post-facto doesn’t actually undo the damage.
I think this isn’t actually correct – I think it depends a lot on the type of content, how likely it is to get mirrored, the data format, etc. E.g. the old Leverage Research website is basically unavailable now (except for the front page I think), despite being text (which gets mirrored a lot more).
You only need one person to sue you for things to go quite badly wrong.
Whether it actually goes ‘badly wrong’ depends on the type of lawsuit, the severity of the violation, the PR effects, etc. It’s probably good to err on the side of not violating any laws, and worth looking into it a bit before doing it.
I’m focusing on the prudential angle here, but I’d also be quite sympathetic to an author who was mad that their work was used in this way without their permission, even if it was later taken down.
How would it not be a copyright violation? Seems better to require consent, presumably via a checkbox or an email. Consent also could improve the average quality a bit. Although then the question is whether the EAF/LW/AF designers can be bothered including that kind of email, or checkbox+API, etc as a feature.
It seems like an extremely good use of time for EA forum designers (and others) to implement this. I think this is a fantastic project and I really hope this goes ahead. I also think it’s valuable for the EA community to take less start-up style shortcuts as the movement gets older and more well-known. I think the risk of giving the impression of being low-integrity or messy is high enough that regardless of whether copyright laws are good or not, it’s worth following them.
If we had to ask each person before converting their text to audio, it just wouldn’t feasibly happen.
Which part isn’t feasible? If you have the skill and capacity on the team to write something which will scrape forum posts, check if they have karma above a threshold, convert them to speech, and post them as audio, it seems more likely than not that you’d have the skill/capacity to edit the tool such that it DMs the authors of the posts it wants to scrape, asking them to reply “Yes” or “OK”, and then only uploads posts where the author did respond to the DM with such a reply.
If you think that most authors wouldn’t reply, so this would make the tool much less useful, then this seems like a different claim. Especially if you’re only doing recent posts, it seems somewhat unlikely that the authors will not see their DMs, which would mean that a lack of reply is not that unlikely to indicate a preference against inclusion.
(Note that I strong upvoted because I think your concern is really about fairness, control and privacy, which copyright law doesn’t directly support.)
That is not how copyright works Kat!
I think this is how copyright law effectively works.
Basically, considerations such as “fair use”, low damages/profit, and the fact the copyright process is initiated by the owner, means that it’s at least a grey area and in this case, it’s very possible they are safe.
For more details, here is some logic about “fair use”, copying and pasting a forum post (heh).
The law limits the rights of copyright holders. A judge will make a fair use determination using the four factors listed in the law on fair use:
1. Character of use , 2. Nature of work , 3. Amount used , 4. Effect on market
4. Effect on market. This is the most important factor.
I do not see how your use could possibly affect the market based on the information you have given me. My gut feeling, based on what you have told me, is that 3 of the 4 factors, including the important one, work in your favor.
Personally, I think it is foolish to worry about [ Can they take legal action] I believe that what you are doing is completely legal, and I believe that any legal action they take will fail.
I don’t think this analysis is right. The character of use may be educational, but on the other hand, I’m not sure you’re transforming it—you’re simply reproducing the text as audio. The nature of work goes beyond presenting public domain things like facts and ideas, by quoting the text as a whole. The amount used is maximal. The effect on market is substantial, in that it prevents the author from selling their writing as audio.
As for precedents. Well… Righthaven v. Hoehn established that it is OK to present a full editorial article in the setting of a noncommercial online post that discusses it. But here you’re just presenting the work in its entirety, and not using it for comment. Google Books scanning was deemed legal. But that was because they don’t represent the whole book. Another relevant example, although it was settled out of court, is that Audible bought rights to some books, and then got sued for allowing the text to be read aloud by TTS AI. If you don’t even own the text, then having the text read is I think a larger infringement.
So I doubt that narrating a whole post is fair use—rather it looks like a copyright infringement.
The effect on market is substantial, in that it prevents the author from selling their writing as audio.
So I doubt that presenting a whole book is fair use.
I could be totally wrong, it’s honestly ridiculous for me to give IP/tort advice.
However, I think the issue at stake is EA forum posts being turned into podcasts. But for EA forum posts, there’s no standard commercial profit from making posts (besides that sweet, sweet clout, which if anything might be increased by podcasting it?). So I don’t get it.
After checking, I’m confused about Righthaven v. Hoehn. Righthaven had some commercial interest to their content (if malicious/rentseeking). Yet it was dismissed on lack of standing. So I guess...this seems orthogonal to this issue?
Are you getting author’s consent before turning their work into a podcast?
It’s a bot so it does it automatically. We figured since the vast majority of people will want to have their content read by more people that it made more sense to by default convert and let people opt out by filling out this form.
If someone deletes their original post, do you auto-remove it from the podcast as well? That would seem important to me.
Good idea! I’ll add that to the list of things to do.
I do think that there’s an interesting fuzzy boundary here between “derivative work” and “interpretative tool”.
e.g. with the framing “turn it into a podcast” I feel kind of uncomfortable and gut-level wish I was consulted on that happening to any of my posts.
But here’s another framing: it’s pretty easy to imagine a near-future world where anyone who wants can have a browser extension which will read things to them at this quality level rather than having visual fonts. If I ask “am I in favour of people having access to that browser extension?”, I’m a fairly unambiguous yes. And then the current project can be seen as selectively providing early access to that technology. And that seems … pretty fine?
This actually makes me more favourable to the version with automated rather than human readers. Human readers would make it seem more like a derivative work, whereas the automation makes the current thing seem closer to an interpretative tool.
As far as I understand, text to speech browser extensions do exist, but I haven’t tested their quality relative to this one.
Gears level info about TTS:
I feel like I am writing an Elon Musk tweet but in case anyone is interested:
As the post mentions, basically the speech quality comes from using the newer TTS models. From the sound, I think this is using Amazon Polly, the voice “Matthew, Male”.
I am an imposter, but I know this because I “made an app” for myself that does general TTS for local docs and websites.
The relevant code that produces the voices is two lines long. I did not check but I would be surprised if there was not a browser extension already.
If EAs think that a browser extension would be valuable, or want really any of the permutations of forum/comment or other services, my guess is that a working quality project and full deployment could be made for $30,000 or maybe as little as $3,000 (the crux is operational like handling payment, accounts, as well as interpretation of the value of project management)
If there is interest I think we can just put this in EAIF or the future fund or something.
Hear the audio version of this comment in “Matthew”, voice.
Hear the audio version of this comment in “Kevin” voice.
You’re right that it’s Matthew!
And they do already exist, they just have the limitations I mentioned in the blog (glitchy, no playlists, not multi-platform, etc). Here are the ones I use for various purposes
For articles on desktop, Natural Reader
For ebooks on Android, Evie
For articles on Android, @Voice
That is not how copyright works Kat!
If we had to ask each person before converting their text to audio, it just wouldn’t feasibly happen.
The way we thought about it was that >99.9% of people will be thrilled to have their writing read by more people. For the <0.01% who won’t, we made it easy for them to opt out, either for a particular article or for all their work. This way the whole community and hundreds of EAs can have access to great content and the <0.01% can also keep their writing in only written format.
This way everybody wins. :) I think the utilitarian case for it is strong.
Especially once it’s more known, people will know that if their post gets enough upvotes it’ll be converted, so they can request to not have it converted beforehand if they want. The potential harm is small and mitigated and the potential upside is huge.
These responses do seem curiously blithe about the question of whether or not this is legal.
FWIW I think I endorse Kat’s reasoning here. I don’t think it matters if it is illegal if I’m correct in suspecting that the only people who could bring a copyright claim are the authors, and assuming the authors are happy with the system being used. This is analogous to the way it is illegal, by violating minimum wage laws, to do work for your own company without paying yourself, but the only person who has standing to sue you is AFAIK yourself.
Not a lawyer, not claiming to know the legal details of these cases, but I think this standing thing is real and an appropriate way to handle
I’ve seen this reasoning a lot, where EA organisations assume they won’t get sued because the only people they’re illegally using the data of are other EAs, and and as someone whose data has been misused with this reasoning, I don’t love it!
This is a reason to fix the system! My point is that it reduces to “make all the authors happy with how you are doing things”, there is not some spooky extra thing having to do with illegality
TBC I do not endorse using people’s content in a way they aren’t happy with, but I would still have that same belief if it wasn’t illegal at all to do so.
And there are various things one could probably do to make it not illegal but still messed up and the wrong thing to do! Like make it mandatory to check a box saying you waive your copyright for audio on a thing before you post on the forum. I think if, like some of the tech companies, you made this box really little and hard to find, most people would not change their posting behavior very much, and would now be totally legal (by assumption).
but it would still be a bad thing to do.
My more detailed response is:
I am generally pro more audio content, and sceptical of the goodness of current copyright laws.
<0.01% seems overconfident to me.
Once something is up on the internet, it’s up forever. Taking it down post-facto doesn’t actually undo the damage.
You only need one person to sue you for things to go quite badly wrong.
As such, flagrantly violating the law on a fairly large scale (and the scale is an important part of the pitch here) seems like a dubious idea. Especially if you also go on public record in a way that suggests you know it’s illegal and don’t care.
<0.01% is definitely overconfident given at that point I had already expressed misgivings and we do not have 10,000 authors on the EA Forum.
(I’m not against my writing being podcastified in principle but I want to check out any podcast services who broadcast my work in advance to decide if I’m happy to be associated with them. I’m strongly against someone else making that decision for me.)
I think this isn’t actually correct – I think it depends a lot on the type of content, how likely it is to get mirrored, the data format, etc. E.g. the old Leverage Research website is basically unavailable now (except for the front page I think), despite being text (which gets mirrored a lot more).
Whether it actually goes ‘badly wrong’ depends on the type of lawsuit, the severity of the violation, the PR effects, etc. It’s probably good to err on the side of not violating any laws, and worth looking into it a bit before doing it.
I otherwise agree with your points!
I’m focusing on the prudential angle here, but I’d also be quite sympathetic to an author who was mad that their work was used in this way without their permission, even if it was later taken down.
How would it not be a copyright violation? Seems better to require consent, presumably via a checkbox or an email. Consent also could improve the average quality a bit. Although then the question is whether the EAF/LW/AF designers can be bothered including that kind of email, or checkbox+API, etc as a feature.
It seems like an extremely good use of time for EA forum designers (and others) to implement this. I think this is a fantastic project and I really hope this goes ahead. I also think it’s valuable for the EA community to take less start-up style shortcuts as the movement gets older and more well-known. I think the risk of giving the impression of being low-integrity or messy is high enough that regardless of whether copyright laws are good or not, it’s worth following them.
Possibly, it is enough to just have a disclaimer like “by submitting, you agree to have this turned into an audio format” to satisfy copyright laws?
Seems like a good idea if it were easy
Very minor point, but 100%-99.9% = 0.1%
Which part isn’t feasible? If you have the skill and capacity on the team to write something which will scrape forum posts, check if they have karma above a threshold, convert them to speech, and post them as audio, it seems more likely than not that you’d have the skill/capacity to edit the tool such that it DMs the authors of the posts it wants to scrape, asking them to reply “Yes” or “OK”, and then only uploads posts where the author did respond to the DM with such a reply.
If you think that most authors wouldn’t reply, so this would make the tool much less useful, then this seems like a different claim. Especially if you’re only doing recent posts, it seems somewhat unlikely that the authors will not see their DMs, which would mean that a lack of reply is not that unlikely to indicate a preference against inclusion.
If I were to receive such messages, I would likely fail to respond (unintentionally) at least 20% of the time.
This seems roughly consistent with “somewhat unlikely”, I expect the fraction is similar for me.
(Note that I strong upvoted because I think your concern is really about fairness, control and privacy, which copyright law doesn’t directly support.)
I think this is how copyright law effectively works.
Basically, considerations such as “fair use”, low damages/profit, and the fact the copyright process is initiated by the owner, means that it’s at least a grey area and in this case, it’s very possible they are safe.
For more details, here is some logic about “fair use”, copying and pasting a forum post (heh).
It’s an obscure, poorly formatted forum post from 2005, but I think the content is correct. See the criteria mentioned here in this Stanford page.
Maybe another way of seeing this is that the poster is unlikely to suffer damages and there’s not much profit for the company.
I think catehall or other lawyers have been helpful, please stomp this post if this is wrong.
I don’t think this analysis is right. The character of use may be educational, but on the other hand, I’m not sure you’re transforming it—you’re simply reproducing the text as audio. The nature of work goes beyond presenting public domain things like facts and ideas, by quoting the text as a whole. The amount used is maximal. The effect on market is substantial, in that it prevents the author from selling their writing as audio.
As for precedents. Well… Righthaven v. Hoehn established that it is OK to present a full editorial article in the setting of a noncommercial online post that discusses it. But here you’re just presenting the work in its entirety, and not using it for comment. Google Books scanning was deemed legal. But that was because they don’t represent the whole book. Another relevant example, although it was settled out of court, is that Audible bought rights to some books, and then got sued for allowing the text to be read aloud by TTS AI. If you don’t even own the text, then having the text read is I think a larger infringement.
So I doubt that narrating a whole post is fair use—rather it looks like a copyright infringement.
I could be totally wrong, it’s honestly ridiculous for me to give IP/tort advice.
However, I think the issue at stake is EA forum posts being turned into podcasts. But for EA forum posts, there’s no standard commercial profit from making posts (besides that sweet, sweet clout, which if anything might be increased by podcasting it?). So I don’t get it.
After checking, I’m confused about Righthaven v. Hoehn. Righthaven had some commercial interest to their content (if malicious/rentseeking). Yet it was dismissed on lack of standing. So I guess...this seems orthogonal to this issue?