AI Welfare Debate Week retrospective

I wrote this retrospective to be shared internally in CEAā€”but in the spirit of more open communication, Iā€™m sharing it here as well. Note that this is a review of the event considered as a product, not a summary or review of the posts from the week.

If you have any questions, or any additional feedback, thatā€™d be appreciated! Iā€™ll be running another debate week soon, and feedback has already been very helpful in preparing for it.

Also, feedback on the retro itself is appreciated- Iā€™d ideally like to pre-register my retros and just have to fill in the graphs and conclusions once the event actually happens, so suggesting data we should measure/ā€‹ questions I should be asking would be very helpful for making better retro templates.

How successful was the event?

In my OKRs (Objectives and Key Results- AKA, my goals for the event), I wanted this event to:

  • Have 50 participants, with ā€œparticipantā€ being anyone taking an event-related action such as voting, commenting, or posting.

    • We did an order of magnitude better than 50. Over 558 people voted during the week, and 27 authors wrote or co-wrote at least one post.

  • Change peopleā€™s minds. I wanted the equivalent of 25 people changing their minds by 25% of the debate slider.

    • We did twice as well as I hoped here- 53 unique users made at least one mind change of 0.25 delta (representing 25% of the slider) or more.

Therefore, on our explicit goals, this event was successful šŸŽŠ. But how successful was it based on our other, non-KR goals and hopes?

Some other goals that we had for the event- either in the ideation phase, or while it was ongoing, were:

  • Create more good content on a particularly important issue to EAs.

    • Successful.

  • Increase engagement.

    • Seems unsuccessful.

  • Bring in some new users.

    • Not noticeably successful.

  • Increase messaging.

    • Not noticeably successful.

In the next four sections, I examine each of these goals in turn.

More good content

We had 28 posts with the debate week tag, with 7 being at or above 50 karma. Of the 7, all but one (JWSā€™s thoughtful critique of the debateā€™s framing) were from authors I had directly spoken to or messaged about the event.

Compared to Draft Amnesty Week (which led to posts from 42 authors, and 10 posts over 50 karma) this isnā€™t that many- however, I think we should count these posts as ex ante more valuable because of their focus on a specific topic.

Ex-post, itā€™s hard to assess how valuable the posts were. None of the posts had very high karma (i.e. the highest was 77). However, I did curate one of the posts, and a couple of others were considered for curation. I would be interested to hear takes from readers about how valuable the posts wereā€”did any of them change your mind, lead to a collaboration, or cause you to think more about the topic?

Engagement

How much engagement did the event get?

In total, debate week posts got 127 hours of engagement during the debate week (or 11.6% of total engagement), and 181 hours from July 1-14 (debate week and the week after), 7.5% of that fortnightā€™s engagement hours.

Did it increase total daily hours of engagement?

Note: Discussion of Manifest controversies happened in June, and led to higher engagement hours per day in the build up to the event. Important dates: June 17: 244 comments, June 18: 349 comments, June 20: 33 comments, June 25: 38 comments

It doesnā€™t look as if the debate week meaningfully increased daily engagement. The average daily engagement for the week after the event is actually higher, although the 3rd day of the event (July 3rd- the day I mentioned that the event was ongoing in the EA Digest) remains the highest hours of engagement between July 1st and the date Iā€™m writing this, August 21st.

Did it get us new users?

Not very noticeably, but slightly. We got a peak of new users on the third day of debate week:

And the average daily new users during debate week was (very marginally) higher than that of any other week between June 24 and today (August 22):

Did it increase messaging?

I donā€™t think so.

Debate week had a lower average of new convos per day (and some of them would have been me discussing posts with authors), and a slightly higher average messages per user than the next two weeks. Itā€™d be cool if we could bump this metric up next time.

Below is all relevant messaging data, with debate week marked in green. Debate week doesnā€™t stand out.

How successful were our new features?

For this event we debuted several new features: The debate week banner, the ā€œmost influential postsā€ score, and the in-post debate slider.

The debate-week banner on the frontpage

The frontpage banner

Overall- I think this was very successful. It got a lot of great feedback during the event, as well as many more votes than I expected.

Distribution of votes

We got votes throughout the week, but more towards the start of the week. We also had many more first votes than vote changes.

Below you can see the graph, with yellow cells representing the largest delta score (change on the debate slider from initial vote) and purple the smallest (generally representing a first vote). Iā€™ve marked the peaks that came organically and from the Digest.

Graph showing votes per hour, and colour-coding to show the delta-score of each vote

Votes on the debate were quite nicely distributed (although see the next section on feedback for concerns on visibility). Iā€™ve put together a hacky chart below to show the distribution:

Histogram lined up with the debate slider to show the final vote distribution.

Feedback on the debate week banner

On the Forum:

  • Lots of discussion about visualising the votes here. Basic takeaway is that with so many votes, it became difficult to see a) the distribution b) all the individual voters.

From CEA slack:

  • A[1] enjoyed the liveliness of debate week, and mentioned that voting is a great way for users who donā€™t post or comment to interact publicly.

  • A and B would like more ways to interact with and visualise data on the banner- specifically, A likes the idea of a lower bar way to engage (a tweet length explanation of your view for instance) and B wants to be able to click on someoneā€™s vote to see what they have written or dm them.

    • I like the idea of adding a little speech bubble which appears on hover, so that people can make a statement along with their vote.

Most influential posts

This didnā€™t work out as planned. I hoped it would sort out the posts that most changed userā€™s minds, allowing us to find the most informative posts from the week. However, only 30 users cited posts when they changed their mind, and the second highest mind changing post is my announcement post (which shouldnā€™t have ranked at all).

For the next debate week, Iā€™d vote for cutting this feature, or just changing it to count a mind-change as same amount of points, no matter how large the mind-change was (in this case, thatā€™d lead to a more rational leaderboard).

The in-post debate slider

Thanks to a suggestion from @EdoArad we had a debate slider in posts, which would give us a delta score which automatically cited the post you were reading, if you changed your vote while on a post. We unfortunately donā€™t have stats which show how often this was used as opposed to the frontpage slider.

Other feedback

There was a lot of feedback clustered around the framing of the debate question, which I responded to in this quicktake. General takeaway: make sure the next debate question is very specific, and nudge people to give feedback on it before it is locked in.

People were generally excited about another debate week. Here is Nathan Youngā€™s question post asking for peopleā€™s ideas. Users also discussed how frequent debate weeks should be (this and other comments pointed to every 6-8 weeks being a good start).

Feature suggestions:

  • Being able to cite comments as changing your mind.

  • From Nathan DMsā€“

    • being able to insert your own debate sliders in posts.

    • letting people vote on the next debate week question.

Takeaways for next time:

  • We should make some changes to our debate-week specific features:

    • The next debate question has to be either unambiguous and empirical, or clear and values-based. With EAs, we should probably go for the former (the latter would likely become empirical anyway).

    • We should go back to the drawing board with the debate slider, to make sure that it can visualise the distribution of hundreds of votes at once.

    • We should either remove, or reform the ā€œmost influential postsā€ ranking to avoid weird rankings and encourage people to use it.

  • We can try to increase engagement:

    • We could take A and Bā€™s suggestions on board:

      • To increase the amount of people contributing to the debate: A quicker and easier way to engage than writing a comment or post (such as a tweet-length summary attached to your avatar as a hover over speech bubble)

      • To seed more messages: An option to message users that disagree with you.

    • Perhaps we would easily see more engagement if we discussed a more popular topic/ā€‹ one that more people have takes on. Next time Iā€™ll go a bit less niche.

    • People wanted to tweet about the debate week- can we make something shareable? For example, a graphic which tells you stats like how popular your opinion was, who agreed with you the most, and how many people changed their mind because of your post. If people wanted to share something like this, it could be a neat way to get more people who rarely check the Forum to come and check out the event.

Impacts and mentions beyond the Forum

Thanks for reading! If you have more feedback about this event, positive or negative, please comment it below or dm me.

  1. ^

    Anonymised just to speed up posting.