RSS

Large Lan­guage Models

TagLast edit: 24 Nov 2023 16:38 UTC by Toby Tremlett🔹

This topic is for posts discussing Large Language Models (LLMs) -- for example, the GPT models produced by OpenAI.

Related Entries


AI safety | Artificial intelligence | AI governance | AI forecasting

LLMs are weirder than you think

Derek Shiller20 Nov 2024 13:39 UTC
64 points
3 comments22 min readEA link

The De­creas­ing Value of Chain of Thought in Prompting

Matrice Jacobine🔸🏳️‍⚧️8 Jun 2025 15:11 UTC
5 points
0 comments1 min readEA link
(papers.ssrn.com)

In­tro­duc­ing Senti—An­i­mal Ethics AI Assistant

Animal_Ethics9 May 2024 7:33 UTC
41 points
2 comments2 min readEA link

Ten­ta­tive prac­ti­cal tips for us­ing chat­bots in research

Erich_Grunewald 🔸29 Mar 2023 15:01 UTC
48 points
7 comments5 min readEA link

Im­pact of Quan­ti­za­tion on Small Lan­guage Models (SLMs) for Mul­tilin­gual Math­e­mat­i­cal Rea­son­ing Tasks

Angie Paola Giraldo7 May 2025 21:48 UTC
11 points
0 comments14 min readEA link

My Cur­rent Claims and Cruxes on LLM Fore­cast­ing & Epistemics

Ozzie Gooen26 Jun 2024 0:40 UTC
47 points
7 comments24 min readEA link

Sleeper Agents: Train­ing De­cep­tive LLMs that Per­sist Through Safety Training

evhub12 Jan 2024 19:51 UTC
65 points
0 comments3 min readEA link
(arxiv.org)

The An­i­mal Welfare Case for Open Ac­cess: Break­ing Bar­ri­ers to Scien­tific Knowl­edge and En­hanc­ing LLM Training

Wladimir J. Alonso23 Nov 2024 13:07 UTC
32 points
2 comments3 min readEA link

In­tro­duc­ing Squig­gle AI

Ozzie Gooen3 Jan 2025 17:53 UTC
84 points
13 comments8 min readEA link

Briefly how I’ve up­dated since ChatGPT

rime25 Apr 2023 19:39 UTC
29 points
8 comments2 min readEA link
(www.lesswrong.com)

ChatGPT not so clever or not so ar­tifi­cial as hyped to be?

Haris Shekeris2 Mar 2023 6:16 UTC
−7 points
2 comments1 min readEA link

Prob­lem-solv­ing tasks in Graph The­ory for lan­guage mod­els

Bruno López Orozco1 Oct 2024 12:36 UTC
21 points
1 comment9 min readEA link

The case for more am­bi­tious lan­guage model evals

Jozdien30 Jan 2024 9:24 UTC
7 points
0 comments5 min readEA link

A short con­ver­sa­tion I had with Google Gem­ini on the dan­gers of un­reg­u­lated LLM API use, while mildly drunk in an air­port.

EvanMcCormick17 Dec 2024 12:25 UTC
1 point
0 comments8 min readEA link

Claude 3.5 Sonnet

Zach Stein-Perlman20 Jun 2024 18:00 UTC
31 points
0 comments1 min readEA link
(www.anthropic.com)

An­nounc­ing RoastMyPost: LLMs Eval Blog Posts and More

Ozzie Gooen17 Dec 2025 18:09 UTC
109 points
13 comments5 min readEA link

New Ar­tifi­cial In­tel­li­gence quiz: can you beat ChatGPT?

AndreFerretti3 Mar 2023 15:46 UTC
29 points
3 comments1 min readEA link

Frozen skills aren’t gen­eral intelligence

Yarrow Bouchard 🔸8 Nov 2025 23:27 UTC
8 points
29 comments11 min readEA link

On the fu­ture of lan­guage models

Owen Cotton-Barratt20 Dec 2023 16:58 UTC
125 points
3 comments36 min readEA link

LLM-Se­cured Sys­tems: A Gen­eral-Pur­pose Tool For Struc­tured Transparency

Ozzie Gooen18 Jun 2024 0:20 UTC
37 points
1 comment21 min readEA link

Life of GPT

Odd anon8 Nov 2023 22:31 UTC
−1 points
0 comments5 min readEA link

Utility Eng­ineer­ing: An­a­lyz­ing and Con­trol­ling Emer­gent Value Sys­tems in AIs

Matrice Jacobine🔸🏳️‍⚧️12 Feb 2025 9:15 UTC
13 points
0 comments1 min readEA link
(www.emergent-values.ai)

Pros and Cons of boy­cotting paid Chat GPT

NickLaing18 Mar 2023 8:50 UTC
14 points
11 comments2 min readEA link

Dis­cussing AI-Hu­man Col­lab­o­ra­tion Through Fic­tion: The Story of Laika and GPT-∞

Laika27 Jul 2023 6:04 UTC
1 point
0 comments1 min readEA link

Epoch AI’s top 10 Data In­sights and Gra­di­ent Up­dates of 2025

Vasco Grilo🔸7 Jan 2026 17:30 UTC
25 points
0 comments5 min readEA link
(epoch.ai)

[Question] What am I miss­ing re. open-source LLM’s?

another-anon-do-gooder4 Dec 2023 4:48 UTC
1 point
2 comments1 min readEA link

[Question] Find­ing ‘pivotal ques­tions’ from 80k pod­cast tran­scripts, sug­ges­tions, LLM ap­proaches/​ Is there already an “80k chat­bot”?

david_reinstein8 Jan 2025 17:16 UTC
10 points
2 comments1 min readEA link

AI scal­ing myths

Noah Varley🔸27 Jun 2024 20:29 UTC
30 points
0 comments1 min readEA link
(open.substack.com)

LLMs as a Plan­ning Overhang

Larks14 Jul 2024 4:57 UTC
49 points
3 comments2 min readEA link

Opinion Fuzzing: A Pro­posal for Re­duc­ing & Ex­plor­ing Var­i­ance in LLM Judg­ments Via Sampling

Ozzie Gooen19 Dec 2025 21:40 UTC
20 points
0 comments5 min readEA link

LLMs can­not use­fully be moral patients

LGS2 Jul 2024 4:43 UTC
35 points
24 comments4 min readEA link

Dwarkesh Pa­tel’s thoughts on AI progress (Dec 2025)

Vasco Grilo🔸1 Feb 2026 9:28 UTC
27 points
2 comments8 min readEA link
(www.dwarkesh.com)

Ab­solute Zero: Re­in­forced Self-play Rea­son­ing with Zero Data

Matrice Jacobine🔸🏳️‍⚧️12 May 2025 15:20 UTC
14 points
1 comment1 min readEA link
(www.arxiv.org)

An­i­mal ethics in ChatGPT and Claude

Elijah Whipple16 Jan 2024 21:38 UTC
49 points
2 comments9 min readEA link

LLMs won’t lead to AGI—Fran­cois Chollet

tobycrisford 🔸11 Jun 2024 20:19 UTC
40 points
23 comments1 min readEA link
(www.youtube.com)

Fore­cast­ing With LLMs—An Open and Promis­ing Re­search Direction

Marcel212 Mar 2024 4:23 UTC
13 points
0 comments4 min readEA link

In fa­vor of an AI-pow­ered trans­la­tion but­ton on the EA Forum

Alix Pham6 Jun 2024 20:29 UTC
49 points
4 comments1 min readEA link

On the Dwarkesh/​Chol­let Pod­cast, and the cruxes of scal­ing to AGI

JWS 🔸15 Jun 2024 20:24 UTC
74 points
49 comments17 min readEA link

Wor­ri­some Trends for Digi­tal Mind Evaluations

Derek Shiller20 Feb 2025 15:35 UTC
79 points
10 comments8 min readEA link

Un­solved re­search prob­lems on the road to AGI

Yarrow Bouchard 🔸22 Nov 2025 22:39 UTC
11 points
13 comments7 min readEA link

Roboti­cist Rod­ney Brooks on gen­er­a­tive AI hype

Yarrow Bouchard 🔸4 Dec 2025 5:45 UTC
14 points
0 comments2 min readEA link
(rodneybrooks.com)

LLM Eval­u­a­tors Rec­og­nize and Fa­vor Their Own Generations

Arjun Panickssery17 Apr 2024 21:09 UTC
21 points
4 comments3 min readEA link
(tiny.cc)

Scal­ing of AI train­ing runs will slow down af­ter GPT-5

Maxime Riché 🔸26 Apr 2024 16:06 UTC
10 points
2 comments3 min readEA link

The Prospect of an AI Winter

Erich_Grunewald 🔸27 Mar 2023 20:55 UTC
56 points
13 comments15 min readEA link
(www.erichgrunewald.com)

RAND re­port finds no effect of cur­rent LLMs on vi­a­bil­ity of bioter­ror­ism attacks

Lizka26 Jan 2024 20:10 UTC
108 points
17 comments3 min readEA link
(www.rand.org)

The In­ten­tional Stance, LLMs Edition

Eleni_A1 May 2024 15:22 UTC
8 points
2 comments8 min readEA link

Open Phil re­leases RFPs on LLM Bench­marks and Forecasting

Lawrence Chan11 Nov 2023 3:01 UTC
12 points
0 comments2 min readEA link
(www.openphilanthropy.org)

How to quickly set up Claude as a chat bot for on­line fel­low­ships and courses

Jamie_Harris22 Jul 2023 7:53 UTC
38 points
10 comments4 min readEA link

Pos­si­ble OpenAI’s Q* break­through and Deep­Mind’s AlphaGo-type sys­tems plus LLMs

Burnydelic23 Nov 2023 7:02 UTC
13 points
4 comments2 min readEA link

Knowl­edge, Rea­son­ing, and Superintelligence

Owen Cotton-Barratt26 Mar 2025 23:28 UTC
21 points
3 comments7 min readEA link
(strangecities.substack.com)

EA Ex­plorer GPT: A New Tool to Ex­plore Effec­tive Altruism

Vlad_Tislenko12 Nov 2023 15:36 UTC
12 points
1 comment1 min readEA link

Digi­tal Con­scious­ness Model Re­sults and Key Takeaways

arvomm23 Jan 2026 14:14 UTC
86 points
8 comments6 min readEA link

Bench­mark Scores = Gen­eral Ca­pa­bil­ity + Claudiness

Vasco Grilo🔸25 Nov 2025 17:58 UTC
19 points
0 comments4 min readEA link
(epochai.substack.com)

‘Chat with im­pact­ful re­search & eval­u­a­tions’ (Un­jour­nal Note­bookLMs)

david_reinstein24 Sep 2024 20:19 UTC
8 points
1 comment2 min readEA link

[Question] How would a lan­guage model be­come goal-di­rected?

David M16 Jul 2022 14:50 UTC
113 points
20 comments1 min readEA link

Does Re­in­force­ment Learn­ing Really In­cen­tivize Rea­son­ing Ca­pac­ity in LLMs Beyond the Base Model?

Matrice Jacobine🔸🏳️‍⚧️24 Apr 2025 14:11 UTC
10 points
0 comments1 min readEA link
(limit-of-rlvr.github.io)

The Dis­solu­tion of AI Safety

Roko12 Dec 2024 10:46 UTC
−7 points
0 comments1 min readEA link
(www.transhumanaxiology.com)

[Question] I’m in­ter­view­ing the au­thor of ‘Not Born Yes­ter­day’ — Hugo Mercier. He ar­gues peo­ple are less gullible and more savvy than you think. What should I ask him?

Robert_Wiblin17 Nov 2023 17:43 UTC
16 points
3 comments1 min readEA link

“This might be the first large-scale ap­pli­ca­tion of AI tech­nol­ogy to geopoli­tics.. 4o, o3 high, Gem­ini 2.5 pro, Claude 3.7, Grok all give the same an­swer to the ques­tion on how to im­pose tar­iffs eas­ily.”

Matrice Jacobine🔸🏳️‍⚧️3 Apr 2025 10:50 UTC
3 points
0 comments1 min readEA link
(x.com)

We are in a New Paradigm of AI Progress—OpenAI’s o3 model makes huge gains on the tough­est AI bench­marks in the world

Garrison22 Dec 2024 21:45 UTC
26 points
0 comments4 min readEA link
(garrisonlovely.substack.com)

En­hanc­ing Math­e­mat­i­cal Model­ing with LLMs: Goals, Challenges, and Evaluations

Ozzie Gooen28 Oct 2024 21:37 UTC
11 points
3 comments15 min readEA link

Si­mu­lat­ing a pos­si­ble al­ign­ment solu­tion in GPT2-medium us­ing Archety­pal Trans­fer Learning

Miguel2 May 2023 16:23 UTC
4 points
0 comments18 min readEA link

INTELLECT-1 Re­lease: The First Globally Trained 10B Pa­ram­e­ter Model

Matrice Jacobine🔸🏳️‍⚧️29 Nov 2024 23:03 UTC
2 points
1 comment1 min readEA link
(www.primeintellect.ai)

How much is 1.8 mil­lion years of work?

rosehadshar16 Aug 2024 12:35 UTC
21 points
3 comments2 min readEA link

Jailbreak­ing Claude 4 and Other Fron­tier Lan­guage Models

James-Sullivan15 Jun 2025 1:01 UTC
6 points
0 comments3 min readEA link
(open.substack.com)

Google Deep­Mind re­leases Gemini

Yarrow Bouchard 🔸6 Dec 2023 17:39 UTC
21 points
7 comments1 min readEA link
(deepmind.google)

Open Prob­lems and Fun­da­men­tal Limi­ta­tions of RLHF

stecas17 Aug 2023 16:50 UTC
5 points
0 comments2 min readEA link
(arxiv.org)

Dona­tion offsets for ChatGPT Plus subscriptions

Jeffrey Ladish16 Mar 2023 23:11 UTC
76 points
10 comments3 min readEA link

Claude vs GPT

Maxwell Tabarrok14 Mar 2024 12:44 UTC
14 points
1 comment2 min readEA link
(www.maximum-progress.com)

LLM chat­bots have ~half of the kinds of “con­scious­ness” that hu­mans be­lieve in. Hu­mans should avoid go­ing crazy about that.

Andrew Critch22 Nov 2024 3:26 UTC
11 points
3 comments5 min readEA link

Can­cel­ling GPT subscription

adekcz20 May 2024 16:19 UTC
26 points
14 comments3 min readEA link

In­fer­ence Scal­ing and the Log-x Chart

Toby_Ord2 Feb 2026 8:43 UTC
25 points
2 comments9 min readEA link
(www.tobyord.com)

GPTs are Pre­dic­tors, not Imitators

EliezerYudkowsky8 Apr 2023 19:59 UTC
74 points
12 comments3 min readEA link

Cog­ni­tive Stress Test­ing Gem­ini 2.5 Pro: Em­piri­cal Find­ings from Re­cur­sive Prompt­ing

Tyler Williams23 Jul 2025 22:37 UTC
1 point
0 comments2 min readEA link

[Question] Is Deep­Seek-R1 already bet­ter than o3 when in­fer­ence costs are held con­stant?

Magnus Vinding24 Jan 2025 15:29 UTC
33 points
2 comments1 min readEA link

Ideas for Next-Gen­er­a­tion Writ­ing Plat­forms, us­ing LLMs

Ozzie Gooen4 Jun 2024 18:40 UTC
17 points
0 comments2 min readEA link

ChatGPT is ca­pa­ble of cog­ni­tive em­pa­thy!

Miquel Banchs-Piqué (prev. mikbp)30 Mar 2023 20:42 UTC
3 points
0 comments1 min readEA link
(nonzero.substack.com)

[Question] If an AI fi­nan­cial bub­ble popped, how much would that change your mind about near-term AGI?

Yarrow Bouchard 🔸21 Oct 2025 22:39 UTC
19 points
6 comments2 min readEA link

GPT5 won’t be what kills us all

DPiepgrass28 Sep 2024 17:11 UTC
3 points
3 comments1 min readEA link
(dpiepgrass.medium.com)

LLMs Out­perform Ex­perts on Challeng­ing Biol­ogy Benchmarks

ljusten14 May 2025 16:09 UTC
24 points
1 comment1 min readEA link
(substack.com)

BenchMo­ral: A bench­mark­ing to as­sess the moral sen­si­tivity of large lan­guage mod­els (LLMs) in Span­ish.

Flor Betzabeth Ampa Flores30 Apr 2025 21:26 UTC
1 point
0 comments18 min readEA link

What is scaf­fold­ing?

Vishakha Agrawal27 Mar 2025 9:40 UTC
3 points
0 comments2 min readEA link
(aisafety.info)

ChatGPT un­der­stands, but largely does not gen­er­ate Span­glish (and other code-mixed) text

Milan Weibel🔹4 Jan 2023 22:10 UTC
6 points
0 comments4 min readEA link
(www.lesswrong.com)

Au­to­mated Eval­u­a­tion of LLMs for Math Bench­mark.

CisnerosA30 Oct 2025 20:28 UTC
3 points
0 comments5 min readEA link

We are on an ex­po­nen­tial curve—Claude Son­net 4.5

MountainPath29 Sep 2025 20:12 UTC
−7 points
1 comment1 min readEA link

Fea­si­bil­ity of train­ing and in­fer­ring ad­vanced large lan­guage mod­els (LLMs) in data cen­ters in Mex­ico and Brazil.

Tatiana Sandoval2 May 2025 13:42 UTC
15 points
1 comment24 min readEA link

Large Lan­guage Models Pass the Tur­ing Test

Matrice Jacobine🔸🏳️‍⚧️2 Apr 2025 5:41 UTC
11 points
6 comments1 min readEA link
(arxiv.org)

How to Catch a ChatGPT Cheat: 7 Prac­ti­cal Tips

Marshall27 Dec 2022 16:09 UTC
8 points
3 comments4 min readEA link

An Em­piri­cal De­mon­stra­tion of a New AI Catas­trophic Risk Fac­tor: Me­tapro­gram­matic Hijacking

Hiyagann27 Jun 2025 13:38 UTC
5 points
0 comments1 min readEA link

What is “wire­head­ing”?

Vishakha Agrawal17 Dec 2024 17:59 UTC
1 point
0 comments1 min readEA link
(aisafety.info)

How LLMs Work, in the Style of The Economist

utilistrutil22 Apr 2024 19:06 UTC
17 points
0 comments2 min readEA link

Fa­vorite Re­cent LLM Prompts & Tips?

Ozzie Gooen18 Mar 2025 4:25 UTC
33 points
13 comments1 min readEA link

Straight­for­wardly elic­it­ing prob­a­bil­ities from GPT-3

NunoSempere9 Feb 2023 19:25 UTC
41 points
5 comments4 min readEA link

Fron­tier LLM Race/​Sex Ex­change Rates

Arjun Panickssery19 Oct 2025 18:36 UTC
25 points
1 comment3 min readEA link
(arctotherium.substack.com)

Re: An­thropic Chi­nese Cy­ber-At­tack. How Do We Pro­tect Open-source Models?

Mayowa Osibodu3 Jan 2026 22:14 UTC
16 points
6 comments6 min readEA link

Share your re­quests for ChatGPT

Kate Tran5 Dec 2022 18:43 UTC
8 points
5 comments1 min readEA link

[Question] How in­de­pen­dent is the re­search com­ing out of OpenAI’s pre­pared­ness team?

Earthling10 Feb 2024 16:59 UTC
18 points
0 comments1 min readEA link

Ex­plor­ing Tacit Linked Premises with GPT

RomeoStevens24 Mar 2023 22:50 UTC
5 points
0 comments3 min readEA link

How to in­fluence the AI trainer workforce

Singer Robin25 Aug 2025 5:21 UTC
3 points
0 comments2 min readEA link

François Chol­let on why LLMs won’t scale to AGI

Yarrow Bouchard 🔸15 Apr 2025 23:01 UTC
6 points
2 comments1 min readEA link
(www.youtube.com)

[Question] Could AI-gen­er­ated con­tent help think-tanks & re­search orgs be­come more effec­tive?

Justin Olive10 Jan 2023 22:58 UTC
13 points
0 comments2 min readEA link

Still no strong ev­i­dence that LLMs in­crease bioter­ror­ism risk

freedomandutility2 Nov 2023 21:23 UTC
58 points
9 comments1 min readEA link

AIs Are Ex­pert-Level at Many Virol­ogy Skills

Center for AI Safety2 May 2025 16:07 UTC
22 points
0 comments1 min readEA link

Perfor­mance of Large Lan­guage Models (LLMs) in Com­plex Anal­y­sis: A Bench­mark of Math­e­mat­i­cal Com­pe­tence and its Role in De­ci­sion Mak­ing.

Jaime Esteban Montenegro Barón6 May 2025 21:08 UTC
1 point
0 comments23 min readEA link

En­ergy-Based Trans­form­ers are Scal­able Learn­ers and Thinkers

Matrice Jacobine🔸🏳️‍⚧️8 Jul 2025 13:44 UTC
8 points
0 comments1 min readEA link
(energy-based-transformers.github.io)

“Long-Ter­mism” vs. “Ex­is­ten­tial Risk”

Scott Alexander6 Apr 2022 21:41 UTC
532 points
81 comments3 min readEA link

Is ChatGPT (quietly) chang­ing how we do EA — and should we be wor­ried or op­ti­mistic?

charlesr16 Jun 2025 8:27 UTC
24 points
7 comments1 min readEA link

Scal­able And Trans­fer­able Black-Box Jailbreaks For Lan­guage Models Via Per­sona Modulation

sjp7 Nov 2023 18:00 UTC
10 points
0 comments2 min readEA link
(arxiv.org)

ChatGPT & The EthiSizer Game(s)

Velikovsky_of_Newcastle24 May 2023 20:12 UTC
1 point
0 comments40 min readEA link

Have your timelines changed as a re­sult of ChatGPT?

Chris Leong5 Dec 2022 15:03 UTC
30 points
18 comments1 min readEA link

The Scal­ing Paradox

Toby_Ord30 Jan 2026 13:34 UTC
40 points
0 comments8 min readEA link
(www.tobyord.com)

[Question] State of LLM-pow­ered pri­ori­ti­za­tion research

Itamar Menuhin-Gruman18 Nov 2025 14:26 UTC
3 points
4 comments1 min readEA link

Perfor­mance com­par­i­son of Large Lan­guage Models (LLMs) in code gen­er­a­tion and ap­pli­ca­tion of best prac­tices in fron­tend web development

Diana V. Guaiña A.1 May 2025 14:57 UTC
5 points
0 comments24 min readEA link

Who owns AI-gen­er­ated con­tent?

Johan S Daniel7 Dec 2022 3:03 UTC
−2 points
0 comments2 min readEA link

Scale, schlep, and systems

Ajeya10 Oct 2023 16:59 UTC
59 points
3 comments6 min readEA link

Was Re­leas­ing Claude-3 Net-Negative

Logan Riggs27 Mar 2024 17:41 UTC
12 points
1 comment4 min readEA link

[Linkpost] On the Ori­gins of Al­gorith­mic Progress in AI

alexfogelson9 Jan 2026 19:46 UTC
25 points
0 comments1 min readEA link
(open.substack.com)

Without Align­ment, Is Longter­mism (and Thus, EA) Just Noise?

Krimsey17 Oct 2025 20:05 UTC
3 points
1 comment3 min readEA link

The An­swer Is in the Ques­tion: Prompt Eng­ineer­ing in the Age of AI

Rodo30 May 2025 18:11 UTC
1 point
0 comments4 min readEA link

Is Text Water­mark­ing a lost cause?

Egor Timatkov1 Oct 2024 13:07 UTC
7 points
0 comments10 min readEA link

o3

Zach Stein-Perlman20 Dec 2024 21:00 UTC
84 points
9 comments1 min readEA link

How to get ChatGPT to re­ally thor­oughly re­search something

Kat Woods 🔶 ⏸️15 Aug 2025 12:54 UTC
13 points
3 comments1 min readEA link

[Question] Can we ever en­sure AI al­ign­ment if we can only test AI per­sonas?

Karl von Wendt16 Mar 2025 8:06 UTC
8 points
0 comments1 min readEA link

How do AI agents work to­gether when they can’t trust each other?

James-Sullivan6 Jun 2025 3:24 UTC
4 points
1 comment8 min readEA link
(open.substack.com)

Wor­ri­some mi­s­un­der­stand­ing of the core is­sues with AI transition

Roman Leventov18 Jan 2024 10:05 UTC
4 points
3 comments4 min readEA link

Com­par­i­son of LLM scal­a­bil­ity and perfor­mance be­tween the U.S. and China based on benchmark

Ivanna_alvarado12 Oct 2024 21:51 UTC
8 points
0 comments34 min readEA link

Case study: LLM guardrails failing across ses­sions in a men­tal health crisis context

Arunas1 Sep 2025 14:11 UTC
14 points
4 comments4 min readEA link

Beyond Meta: Large Con­cept Models Will Win

Anthony Repetto30 Dec 2024 0:57 UTC
3 points
0 comments3 min readEA link

The Ex­treme Ineffi­ciency of RL for Fron­tier Models

Toby_Ord2 Feb 2026 8:44 UTC
21 points
0 comments8 min readEA link
(www.tobyord.com)

AISN #35: Lob­by­ing on AI Reg­u­la­tion Plus, New Models from OpenAI and Google, and Le­gal Regimes for Train­ing on Copy­righted Data

Center for AI Safety16 May 2024 14:26 UTC
14 points
0 comments6 min readEA link
(newsletter.safe.ai)

ACS is hiring: why work here and why not

Jan_Kulveit23 Oct 2025 9:38 UTC
39 points
4 comments2 min readEA link

AGI by 2032 is ex­tremely unlikely

Yarrow Bouchard 🔸16 Oct 2025 22:50 UTC
24 points
44 comments7 min readEA link

ChatGPT bug leaked users’ con­ver­sa­tion histories

Ian Turner27 Mar 2023 0:17 UTC
15 points
2 comments1 min readEA link
(www.bbc.com)

Abil­ity to solve long-hori­zon tasks cor­re­lates with want­ing things in the be­hav­iorist sense

So8res24 Nov 2023 17:37 UTC
38 points
1 comment5 min readEA link

ECHO Frame­work: Struc­tured De­bi­as­ing for AI & Hu­man Analysis

Karl Moon7 Jul 2025 14:32 UTC
1 point
0 comments4 min readEA link

New ex­per­i­men­tal pa­per on LLM welfare

LeonardDung11 Sep 2025 8:05 UTC
13 points
0 comments1 min readEA link

“Suc­cess­ful lan­guage model evals” by Ja­son Wei

Arjun Panickssery25 May 2024 9:34 UTC
11 points
0 comments1 min readEA link
(www.jasonwei.net)

LLMs as Trusted Me­di­a­tors – A Path Beyond Co­or­di­na­tion Prob­lems?

Johan Falk8 Jan 2026 16:09 UTC
4 points
0 comments6 min readEA link

Align­ment Fak­ing in Large Lan­guage Models

Ryan Greenblatt18 Dec 2024 17:19 UTC
143 points
9 comments10 min readEA link

Pro­ject Pro­posal Look­ing for Feed­back: Mak­ing Policy Im­pacts Trans­par­ent — A Rea­son­ing Model for Trade, Jobs, and Prices

Echo Huang7 May 2025 17:14 UTC
17 points
4 comments7 min readEA link

Sum­mary: In­tro­spec­tive Ca­pa­bil­ities in LLMs (Robert Long)

rileyharris2 Jul 2024 18:08 UTC
11 points
1 comment4 min readEA link

Γαμινγκ the Al­gorithms: Large Lan­guage Models as Mirrors

Haris Shekeris1 Apr 2023 2:14 UTC
5 points
3 comments4 min readEA link

Lan­guage mod­els re­sem­ble more than just lan­guage cor­tex, show neuroscientists

Mordechai Rorvig13 Jan 2026 18:26 UTC
1 point
0 comments1 min readEA link
(www.foommagazine.org)

Risk Align­ment in Agen­tic AI Systems

Hayley Clatterbuck1 Oct 2024 22:51 UTC
32 points
1 comment3 min readEA link
(static1.squarespace.com)

Digest: three pa­pers that have shaped my un­der­stand­ing of the po­ten­tial for con­scious­ness in AI systems

rileyharris21 Aug 2024 15:09 UTC
5 points
0 comments1 min readEA link
No comments.