However, we remain concerned that in the case of a dispute, we would be accused of creating fake screen recordings/archives.
If there is a third-party service that is trusted by the community that could verify the accuracy of our screen recordings/archives prior to us showing reviews to charities, we’d be much more open to the idea of showing reviews to charities before releasing them. Please let us know if you’re aware of one.
I think it doesn’t do so well for Google Spreadsheets or videos, though:
For example, this archive of ACE’s CEA of ALI only includes the first tab of the sheet. I tried seeing if I could archive the other tabs via their separate links, but it just redirects me here, where only the one tab is archived. Maybe there’s a way around this, or a better acrhiving service.
AFAIK, videos are not archived at all, either.
I’ve also used https://archive.ph/. There’s a browser plugin Archive Page for it, but I often get nginx errors when I try to archive pages with it. You can archive specific tabs by the tab links, e.g. here, but then it doesn’t let you scroll through the sheet, which means you won’t be able to access sheet cells you’d have to scroll to see. It also doesn’t show how cells are calculated.
You could just save the whole sheet and upload it somewhere with a timestamp, and save archives anyway. Maybe there are better options than web.archive.org and archive.ph.
The original information is still archived, my understanding is that those attacks just inject other data that changes what is shown to the user, but as they mention it’s easily detectable and the original information can still be recovered.
A bigger risk would be that the organization asks the archive to delete their data, but that would look very suspicious, and you could use multiple archives (e.g. https://archive.is/ )
Thank you for your reply and technical insights, Lorenzo.
To clarify, we are actually not that concerned about archived documents being manipulated. From what we understand, this is extremely rare.
What we are quite concerned about is that we will be falsely accused of manipulating archives, and the charity accusing us will be given the benefit of the doubt. They could cite articles like the one we cited earlier, and most people do not have the technical expertise to evaluate disputes over archive integrity.
I think that is extremely unlikely, they have a lot to lose as soon as it’s confirmed that the archived data is not manipulated.
Also, from the page you cite:
we emphasize that these attacks can in most cases be launched only by the owners of particular domains.
So they would need to claim that you took control of a relevant domain as well.
But even if something like that happened, you could show that the archive has not been tampered (e.g. by linking the exact resource containing the information, or mentioning the “about this capture” tool that was added by the web archive to mitigate this)
I think that is extremely unlikely, they have a lot to lose as soon as it’s confirmed that the archived data is not manipulated.
Not just that, I expect charities to have a lot to lose just from the fight alone, for better or worse. Getting into fights about your integrity generally has negative effects on your reputation and fundraising capacity.
they have a lot to lose as soon as it’s confirmed that the archived data is not manipulated.
We think our team still has some disagreements with you over how effective disinformation campaigns can be (especially when the disinformation is technical and the audience is mostly non-technical). That being said, we really appreciate your insights—you’ve made some great points.
I think the typical member of the EA community has more than enough technical skill to understand evidence that a web page has been edited to be different from an archived page, if pointed to both copies from a reliable source
My first impression is that these techniques are pretty obscure and technical, and charities would not think to use them or know how to by default. In fact, sharing them here might make it more likely that charities use them (an infohazard).
EDIT: But maybe if motivated and strategic enough, they would find them through online search.
https://web.archive.org/ seems good enough to me in most cases?
I think it doesn’t do so well for Google Spreadsheets or videos, though:
For example, this archive of ACE’s CEA of ALI only includes the first tab of the sheet. I tried seeing if I could archive the other tabs via their separate links, but it just redirects me here, where only the one tab is archived. Maybe there’s a way around this, or a better acrhiving service.
AFAIK, videos are not archived at all, either.
I’ve also used https://archive.ph/. There’s a browser plugin Archive Page for it, but I often get nginx errors when I try to archive pages with it. You can archive specific tabs by the tab links, e.g. here, but then it doesn’t let you scroll through the sheet, which means you won’t be able to access sheet cells you’d have to scroll to see. It also doesn’t show how cells are calculated.
You could just save the whole sheet and upload it somewhere with a timestamp, and save archives anyway. Maybe there are better options than web.archive.org and archive.ph.
Hi Michael, thanks for the reply.
That archive service is great, we use them all the time. From our understand though, it is actually possible to manipulate web archives.
Didn’t they already address this specific vulnerability with the measures described on that page?
The aforementioned page states that they took action “to mitigate these attacks,” so from our understanding it is still possible to do.
Also, the organization who completed the study still cautions users who rely on Wayback Machine (the archive platform that was manipulated).[1]
https://rewritinghistory.cs.washington.edu/index.html See section “I rely on Wayback Machine—what should I do?”
The original information is still archived, my understanding is that those attacks just inject other data that changes what is shown to the user, but as they mention it’s easily detectable and the original information can still be recovered.
A bigger risk would be that the organization asks the archive to delete their data, but that would look very suspicious, and you could use multiple archives (e.g. https://archive.is/ )
Thank you for your reply and technical insights, Lorenzo.
To clarify, we are actually not that concerned about archived documents being manipulated. From what we understand, this is extremely rare.
What we are quite concerned about is that we will be falsely accused of manipulating archives, and the charity accusing us will be given the benefit of the doubt. They could cite articles like the one we cited earlier, and most people do not have the technical expertise to evaluate disputes over archive integrity.
I think that is extremely unlikely, they have a lot to lose as soon as it’s confirmed that the archived data is not manipulated.
Also, from the page you cite:
So they would need to claim that you took control of a relevant domain as well.
But even if something like that happened, you could show that the archive has not been tampered (e.g. by linking the exact resource containing the information, or mentioning the “about this capture” tool that was added by the web archive to mitigate this)
Not just that, I expect charities to have a lot to lose just from the fight alone, for better or worse. Getting into fights about your integrity generally has negative effects on your reputation and fundraising capacity.
Thanks for the tool! It seems very useful.
We think our team still has some disagreements with you over how effective disinformation campaigns can be (especially when the disinformation is technical and the audience is mostly non-technical). That being said, we really appreciate your insights—you’ve made some great points.
I think the typical member of the EA community has more than enough technical skill to understand evidence that a web page has been edited to be different from an archived page, if pointed to both copies from a reliable source
My first impression is that these techniques are pretty obscure and technical, and charities would not think to use them or know how to by default. In fact, sharing them here might make it more likely that charities use them (an infohazard).
EDIT: But maybe if motivated and strategic enough, they would find them through online search.