ThomasCederborg

Karma: 13

My research focus is Alignment Target Analysis (ATA). I noticed that the most recently published version of CEV (Parliamentarian CEV, or PCEV) gives a large amount of extra influence to people that intrinsically value hurting other individuals. For Yudkowsky’s description of the issue you can search the CEV arbital page for ADDED 2023.

The fact that no one noticed this issue for over a decade shows that ATA is difficult. If PCEV had been successfully implemented, the outcome would have been massively worse than extinction. I think that this illustrates that scenarios where someone successfully hits a bad alignment target pose a serious risk. I also think that it illustrates that ATA can reduce these risks (noticing the issue reduced the probability of PCEV getting successfully implemented). The reason that more ATA is needed is that PCEV is not the only bad alignment target that might end up getting implemented. ATA is however very neglected. There does not exist a single research project dedicated to ATA. In other words: the reason that I am doing ATA is that it is a tractable and neglected way of reducing risks.

So far, the only place I have discussed these issues in public has been on LessWrong. But I have not been very successful in finding people who are interested in working on ATA. So from now on I will also post here.

I am currently looking for collaborators. I am also looking for a grant or a position that would allow me to focus entirely on ATA for an extended period of time. Please don’t hesitate to get in touch if you are curious and would like to have a chat, or if you have any feedback, comments, or questions. You can for example PM me here, or PM me on LW, or email me at thomascederborgsemail@gmail.com (that really is my email address. It’s a Gavagai / Word and Object joke from my grad student days)

My background is physics as an undergrad and then AI research. Links to some papers: P1 P2 P3 P4 P5 P6 P7 P8. (no connection to any form of deep learning)

ThomasCederborg Oct 11, 2024, 1:19 AM
1 point
0 ∶ 0
on: A Pivotal Act AI might not buy a lot of time
I changed the title in response to a comment about terminology on LessWrong by johnswentworth. The original title was: A Pivotal Act AI might not buy a lot of time
Here is a LW comment that discuss the title change a bit more (in brief: while it is true that a Pivotal Act AI might not buy a lot of time, it was a mistake to use that statement as a title for this post).

Shutting down all competing AI projects might not buy a lot of time due to Internal Time Pressure

ThomasCederborgOct 3, 2024, 12:05 AM

6 points

1 comment12 min readEA link

The case for more Alignment Target Analysis (ATA)

ChiSep 20, 2024, 1:14 AM

23 points

0 comments EA link

A necessary Membrane formalism feature

ThomasCederborgSep 10, 2024, 9:03 PM

1 point

0 comments11 min readEA link

A short summary of what I have been posting about on LessWrong

ThomasCederborgSep 10, 2024, 12:26 PM

3 points

0 comments2 min readEA link

ThomasCederborg

Shut­ting down all com­pet­ing AI pro­jects might not buy a lot of time due to In­ter­nal Time Pressure

The case for more Align­ment Tar­get Anal­y­sis (ATA)

A nec­es­sary Mem­brane for­mal­ism feature

A short sum­mary of what I have been post­ing about on LessWrong

Shutting down all competing AI projects might not buy a lot of time due to Internal Time Pressure

The case for more Alignment Target Analysis (ATA)

A necessary Membrane formalism feature

A short summary of what I have been posting about on LessWrong