Large Language Models

TagLast edit: 24 Nov 2023 16:38 UTC by Toby Tremlett🔹

This topic is for posts discussing Large Language Models (LLMs) -- for example, the GPT models produced by OpenAI.

LLMs are weirder than you think

Derek Shiller20 Nov 2024 13:39 UTC

64 points

3 comments22 min readEA link

The Decreasing Value of Chain of Thought in Prompting

Matrice Jacobine🔸🏳️‍⚧️8 Jun 2025 15:11 UTC

5 points

0 comments1 min readEA link

(papers.ssrn.com)

Introducing Senti—Animal Ethics AI Assistant

Animal_Ethics9 May 2024 7:33 UTC

41 points

2 comments2 min readEA link

Tentative practical tips for using chatbots in research

Erich_Grunewald 🔸29 Mar 2023 15:01 UTC

48 points

7 comments5 min readEA link

Impact of Quantization on Small Language Models (SLMs) for Multilingual Mathematical Reasoning Tasks

Angie Paola Giraldo7 May 2025 21:48 UTC

11 points

0 comments14 min readEA link

My Current Claims and Cruxes on LLM Forecasting & Epistemics

Ozzie Gooen26 Jun 2024 0:40 UTC

47 points

7 comments24 min readEA link

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

evhub12 Jan 2024 19:51 UTC

65 points

0 comments3 min readEA link

(arxiv.org)

The Animal Welfare Case for Open Access: Breaking Barriers to Scientific Knowledge and Enhancing LLM Training

Wladimir J. Alonso23 Nov 2024 13:07 UTC

32 points

4 comments3 min readEA link

Introducing Squiggle AI

Ozzie Gooen3 Jan 2025 17:53 UTC

90 points

13 comments8 min readEA link

Briefly how I’ve updated since ChatGPT

rime25 Apr 2023 19:39 UTC

29 points

8 comments2 min readEA link

(www.lesswrong.com)

ChatGPT not so clever or not so artificial as hyped to be?

Haris Shekeris2 Mar 2023 6:16 UTC

−7 points

2 comments1 min readEA link

Problem-solving tasks in Graph Theory for language models

Bruno López Orozco1 Oct 2024 12:36 UTC

21 points

1 comment9 min readEA link

The case for more ambitious language model evals

Jozdien30 Jan 2024 9:24 UTC

7 points

0 comments5 min readEA link

A short conversation I had with Google Gemini on the dangers of unregulated LLM API use, while mildly drunk in an airport.

EvanMcCormick17 Dec 2024 12:25 UTC

1 point

0 comments8 min readEA link

Claude 3.5 Sonnet

Zach Stein-Perlman20 Jun 2024 18:00 UTC

31 points

0 comments1 min readEA link

(www.anthropic.com)

Announcing RoastMyPost: LLMs Eval Blog Posts and More

Ozzie Gooen17 Dec 2025 18:09 UTC

116 points

14 comments5 min readEA link

New Artificial Intelligence quiz: can you beat ChatGPT?

AndreFerretti3 Mar 2023 15:46 UTC

29 points

3 comments1 min readEA link

AI won’t achieve general intelligence through scaling

Yarrow Bouchard 🔸8 Nov 2025 23:27 UTC

8 points

31 comments11 min readEA link

On the future of language models

Owen Cotton-Barratt20 Dec 2023 16:58 UTC

125 points

3 comments36 min readEA link

LLM-Secured Systems: A General-Purpose Tool For Structured Transparency

Ozzie Gooen18 Jun 2024 0:20 UTC

37 points

1 comment21 min readEA link

Life of GPT

Odd anon8 Nov 2023 22:31 UTC

−1 points

0 comments5 min readEA link

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Matrice Jacobine🔸🏳️‍⚧️12 Feb 2025 9:15 UTC

13 points

0 comments1 min readEA link

(www.emergent-values.ai)

Pros and Cons of boycotting paid Chat GPT

NickLaing18 Mar 2023 8:50 UTC

14 points

11 comments2 min readEA link

Discussing AI-Human Collaboration Through Fiction: The Story of Laika and GPT-∞

Laika27 Jul 2023 6:04 UTC

1 point

0 comments1 min readEA link

Epoch AI’s top 10 Data Insights and Gradient Updates of 2025

Vasco Grilo🔸7 Jan 2026 17:30 UTC

25 points

0 comments5 min readEA link

(epoch.ai)

[Question] What am I missing re. open-source LLM’s?

another-anon-do-gooder4 Dec 2023 4:48 UTC

1 point

2 comments1 min readEA link

[Question] Finding ‘pivotal questions’ from 80k podcast transcripts, suggestions, LLM approaches/ Is there already an “80k chatbot”?

david_reinstein8 Jan 2025 17:16 UTC

10 points

2 comments1 min readEA link

AI scaling myths

Noah Varley🔸27 Jun 2024 20:29 UTC

30 points

0 comments1 min readEA link

(open.substack.com)

LLMs as a Planning Overhang

Larks14 Jul 2024 4:57 UTC

49 points

3 comments2 min readEA link

Opinion Fuzzing: A Proposal for Reducing & Exploring Variance in LLM Judgments Via Sampling

Ozzie Gooen19 Dec 2025 21:40 UTC

20 points

0 comments5 min readEA link

LLMs cannot usefully be moral patients

LGS2 Jul 2024 4:43 UTC

35 points

24 comments4 min readEA link

Dwarkesh Patel’s thoughts on AI progress (Dec 2025)

Vasco Grilo🔸1 Feb 2026 9:28 UTC

31 points

2 comments8 min readEA link

(www.dwarkesh.com)

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Matrice Jacobine🔸🏳️‍⚧️12 May 2025 15:20 UTC

14 points

1 comment1 min readEA link

(www.arxiv.org)

Animal ethics in ChatGPT and Claude

Elijah Whipple16 Jan 2024 21:38 UTC

49 points

2 comments9 min readEA link

LLMs won’t lead to AGI—Francois Chollet

tobycrisford 🔸11 Jun 2024 20:19 UTC

40 points

23 comments1 min readEA link

(www.youtube.com)

Forecasting With LLMs—An Open and Promising Research Direction

Marcel212 Mar 2024 4:23 UTC

13 points

0 comments4 min readEA link

In favor of an AI-powered translation button on the EA Forum

Alix Pham6 Jun 2024 20:29 UTC

49 points

4 comments1 min readEA link

On the Dwarkesh/Chollet Podcast, and the cruxes of scaling to AGI

JWS 🔸15 Jun 2024 20:24 UTC

74 points

49 comments17 min readEA link

Worrisome Trends for Digital Mind Evaluations

Derek Shiller20 Feb 2025 15:35 UTC

79 points

10 comments8 min readEA link

Unsolved research problems on the road to AGI

Yarrow Bouchard 🔸22 Nov 2025 22:39 UTC

20 points

15 comments7 min readEA link

Roboticist Rodney Brooks on generative AI hype

Yarrow Bouchard 🔸4 Dec 2025 5:45 UTC

14 points

0 comments2 min readEA link

(rodneybrooks.com)

LLM Evaluators Recognize and Favor Their Own Generations

Arjun Panickssery17 Apr 2024 21:09 UTC

21 points

4 comments3 min readEA link

(tiny.cc)

Scaling of AI training runs will slow down after GPT-5

Maxime Riché 🔸26 Apr 2024 16:06 UTC

10 points

2 comments3 min readEA link

The Prospect of an AI Winter

Erich_Grunewald 🔸27 Mar 2023 20:55 UTC

56 points

13 comments15 min readEA link

(www.erichgrunewald.com)

RAND report finds no effect of current LLMs on viability of bioterrorism attacks

Lizka26 Jan 2024 20:10 UTC

108 points

17 comments3 min readEA link

(www.rand.org)

The Intentional Stance, LLMs Edition

Eleni_A1 May 2024 15:22 UTC

8 points

2 comments8 min readEA link

Open Phil releases RFPs on LLM Benchmarks and Forecasting

Lawrence Chan11 Nov 2023 3:01 UTC

12 points

0 comments2 min readEA link

(www.openphilanthropy.org)

How to quickly set up Claude as a chat bot for online fellowships and courses

Jamie_Harris22 Jul 2023 7:53 UTC

38 points

10 comments4 min readEA link

Possible OpenAI’s Q* breakthrough and DeepMind’s AlphaGo-type systems plus LLMs

Burny_23 Nov 2023 7:02 UTC

13 points

4 comments2 min readEA link

Knowledge, Reasoning, and Superintelligence

Owen Cotton-Barratt26 Mar 2025 23:28 UTC

21 points

3 comments7 min readEA link

(strangecities.substack.com)

EA Explorer GPT: A New Tool to Explore Effective Altruism

Vlad_Tislenko12 Nov 2023 15:36 UTC

12 points

1 comment1 min readEA link

Digital Consciousness Model Results and Key Takeaways

arvomm23 Jan 2026 14:14 UTC

90 points

16 comments6 min readEA link

Benchmark Scores = General Capability + Claudiness

Vasco Grilo🔸25 Nov 2025 17:58 UTC

19 points

0 comments4 min readEA link

(epochai.substack.com)

‘Chat with impactful research & evaluations’ (Unjournal NotebookLMs)

david_reinstein24 Sep 2024 20:19 UTC

8 points

1 comment2 min readEA link

[Question] How would a language model become goal-directed?

David M16 Jul 2022 14:50 UTC

113 points

21 comments1 min readEA link

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Matrice Jacobine🔸🏳️‍⚧️24 Apr 2025 14:11 UTC

10 points

0 comments1 min readEA link

(limit-of-rlvr.github.io)

The Dissolution of AI Safety

Roko12 Dec 2024 10:46 UTC

−7 points

0 comments1 min readEA link

(www.transhumanaxiology.com)

Would your AI travel agent book a bullfight? Testing whether agents consider animal welfare without being prompted

Jonah Woodward17 Jul 2026 17:05 UTC

45 points

4 comments3 min readEA link

[Question] I’m interviewing the author of ‘Not Born Yesterday’ — Hugo Mercier. He argues people are less gullible and more savvy than you think. What should I ask him?

Robert_Wiblin17 Nov 2023 17:43 UTC

16 points

3 comments1 min readEA link

“This might be the first large-scale application of AI technology to geopolitics.. 4o, o3 high, Gemini 2.5 pro, Claude 3.7, Grok all give the same answer to the question on how to impose tariffs easily.”

Matrice Jacobine🔸🏳️‍⚧️3 Apr 2025 10:50 UTC

3 points

0 comments1 min readEA link

(x.com)

Thoughts on Toby Ords AI Scaling Series

Srdjan Miletic4 Feb 2026 0:46 UTC

52 points

3 comments4 min readEA link

(www.dissent.blog)

African Ethnoveterinary Terminology Creates Blind Spots in AI Biosecurity Guardrails : Preliminary Findings

fatika16 May 2026 19:08 UTC

1 point

0 comments1 min readEA link

We are in a New Paradigm of AI Progress—OpenAI’s o3 model makes huge gains on the toughest AI benchmarks in the world

Garrison22 Dec 2024 21:45 UTC

26 points

0 comments4 min readEA link

(garrisonlovely.substack.com)

Enhancing Mathematical Modeling with LLMs: Goals, Challenges, and Evaluations

Ozzie Gooen28 Oct 2024 21:37 UTC

11 points

3 comments15 min readEA link

Simulating a possible alignment solution in GPT2-medium using Archetypal Transfer Learning

Miguel2 May 2023 16:23 UTC

4 points

0 comments18 min readEA link

INTELLECT-1 Release: The First Globally Trained 10B Parameter Model

Matrice Jacobine🔸🏳️‍⚧️29 Nov 2024 23:03 UTC

2 points

1 comment1 min readEA link

(www.primeintellect.ai)

How much is 1.8 million years of work?

rosehadshar16 Aug 2024 12:35 UTC

21 points

3 comments2 min readEA link

Assert, don’t describe. Linguistic Features that shift LLM reasoning about animal welfare

Jasmine Brazilek5 Jun 2026 15:46 UTC

12 points

0 comments12 min readEA link

Jailbreaking Claude 4 and Other Frontier Language Models

James-Sullivan15 Jun 2025 1:01 UTC

6 points

0 comments3 min readEA link

(open.substack.com)

Google DeepMind releases Gemini

Yarrow Bouchard 🔸6 Dec 2023 17:39 UTC

21 points

7 comments1 min readEA link

(deepmind.google)

Writing is thinking, doing is learning

Oscar Howie22 Jun 2026 9:03 UTC

25 points

3 comments4 min readEA link

(www.the1001.blog)

Mirror, Mirror: Why More AI Models Mean Less Certainty

Niko_Movich30 Apr 2026 13:33 UTC

−3 points

0 comments2 min readEA link

Open Problems and Fundamental Limitations of RLHF

stecas17 Aug 2023 16:50 UTC

5 points

0 comments2 min readEA link

(arxiv.org)

Donation offsets for ChatGPT Plus subscriptions

Jeffrey Ladish16 Mar 2023 23:11 UTC

76 points

10 comments3 min readEA link

From Long Novels to Large Language Models

sorenprojections3 May 2026 12:04 UTC

1 point

0 comments46 min readEA link

Claude vs GPT

Maxwell Tabarrok14 Mar 2024 12:44 UTC

14 points

1 comment2 min readEA link

(www.maximum-progress.com)

LLM chatbots have ~half of the kinds of “consciousness” that humans believe in. Humans should avoid going crazy about that.

Andrew Critch22 Nov 2024 3:26 UTC

11 points

3 comments5 min readEA link

AI Forecasting in 2026: What 11 Analyses Say

Benjamin Wilson 🔸8 Jul 2026 14:33 UTC

10 points

0 comments17 min readEA link

(www.metaculus.com)

Evaluating how LLMs respond to cognitive distortion around intimate partner violence

keliz28 Feb 2026 0:28 UTC

1 point

0 comments16 min readEA link

Cancelling GPT subscription

adekcz20 May 2024 16:19 UTC

26 points

14 comments3 min readEA link

Inference Scaling and the Log-x Chart

Toby_Ord2 Feb 2026 8:43 UTC

29 points

2 comments9 min readEA link

(www.tobyord.com)

GPTs are Predictors, not Imitators

EliezerYudkowsky8 Apr 2023 19:59 UTC

75 points

12 comments3 min readEA link

Cognitive Stress Testing Gemini 2.5 Pro: Empirical Findings from Recursive Prompting

Tyler Williams23 Jul 2025 22:37 UTC

1 point

0 comments2 min readEA link

[Question] Is DeepSeek-R1 already better than o3 when inference costs are held constant?

Magnus Vinding24 Jan 2025 15:29 UTC

33 points

2 comments1 min readEA link

Ideas for Next-Generation Writing Platforms, using LLMs

Ozzie Gooen4 Jun 2024 18:40 UTC

17 points

0 comments2 min readEA link

Paper: Prompt Optimization Makes Misalignment Legible

CBiddulph12 Feb 2026 20:21 UTC

5 points

0 comments10 min readEA link

ChatGPT is capable of cognitive empathy!

Miquel Banchs-Piqué (prev. mikbp)30 Mar 2023 20:42 UTC

3 points

0 comments1 min readEA link

(nonzero.substack.com)

SecureMaxx: A Lightweight Sequence Screening Tool for Agents

Austin Morrissey29 Apr 2026 1:05 UTC

1 point

0 comments1 min readEA link

(www.lesswrong.com)

The Goodhart Singularity

Vasco Grilo🔸18 May 2026 16:29 UTC

66 points

5 comments12 min readEA link

(meagreprotestanthistory.substack.com)

Tamper-Resistance is a Moving Target We Might Not Hit

Lee Wall4 Jun 2026 13:56 UTC

16 points

0 comments11 min readEA link

Language models know what matters and the foundations of ethics better than you

Michele Campolo27 Apr 2026 14:02 UTC

8 points

0 comments90 min readEA link

[Question] If an AI financial bubble popped, how much would that change your mind about near-term AGI?

Yarrow Bouchard 🔸21 Oct 2025 22:39 UTC

19 points

6 comments2 min readEA link

Tie training can make DPO/RLHF-trained AIs generalize better

Elliott Thornley6 Jul 2026 16:21 UTC

10 points

0 comments15 min readEA link

GPT5 won’t be what kills us all

DPiepgrass28 Sep 2024 17:11 UTC

3 points

3 comments1 min readEA link

(dpiepgrass.medium.com)

LLMs Outperform Experts on Challenging Biology Benchmarks

ljusten14 May 2025 16:09 UTC

24 points

1 comment1 min readEA link

(substack.com)

An Empirical Review of the Animal Harm Benchmark (ANIMA)

Lukas Gebhard1 Mar 2026 17:50 UTC

30 points

2 comments16 min readEA link

How We Learned to Talk to Machines

Tyler Williams20 Feb 2026 20:09 UTC

3 points

0 comments4 min readEA link

(huggingface.co)

BenchMoral: A benchmarking to assess the moral sensitivity of large language models (LLMs) in Spanish.

Flor Betzabeth Ampa Flores30 Apr 2025 21:26 UTC

1 point

0 comments18 min readEA link

What is scaffolding?

Vishakha Agrawal27 Mar 2025 9:40 UTC

3 points

0 comments2 min readEA link

(aisafety.info)

ChatGPT understands, but largely does not generate Spanglish (and other code-mixed) text

Milan Weibel🔹4 Jan 2023 22:10 UTC

6 points

0 comments4 min readEA link

(www.lesswrong.com)

Does your AI perform badly because you — you, specifically — are a bad person?

Natalie_Cargill21 Apr 2026 14:18 UTC

47 points

3 comments7 min readEA link

Automated Evaluation of LLMs for Math Benchmark.

CisnerosA30 Oct 2025 20:28 UTC

3 points

0 comments5 min readEA link

We are on an exponential curve—Claude Sonnet 4.5

MountainPath29 Sep 2025 20:12 UTC

−7 points

1 comment1 min readEA link

Proposal: Train a LLM to be an EA expert

jackchang1102 Apr 2026 16:21 UTC

6 points

0 comments1 min readEA link

Five Oceans of AI: What Diving into Different Systems Reveals About Invisible Safety Failures

Kenji Yamada2 Mar 2026 15:20 UTC

−1 points

0 comments3 min readEA link

Lab Leaks, Black Holes, and Eggs: Epistemic Case Study Competition

Oliver Sourbut4 Jun 2026 16:28 UTC

43 points

2 comments8 min readEA link

(flf.org)

Feasibility of training and inferring advanced large language models (LLMs) in data centers in Mexico and Brazil.

Tatiana Sandoval2 May 2025 13:42 UTC

15 points

1 comment24 min readEA link

Large Language Models Pass the Turing Test

Matrice Jacobine🔸🏳️‍⚧️2 Apr 2025 5:41 UTC

11 points

6 comments1 min readEA link

(arxiv.org)

Reasoning transparency demands AI-use disclosure

Morgan Fairless22 Feb 2026 14:41 UTC

36 points

6 comments2 min readEA link

How to Catch a ChatGPT Cheat: 7 Practical Tips

Marshall27 Dec 2022 16:09 UTC

8 points

3 comments4 min readEA link

Time to take AI consciousness seriously

Vasco Grilo🔸5 Jul 2026 7:57 UTC

12 points

1 comment13 min readEA link

(www.secondbest.ca)

An Empirical Demonstration of a New AI Catastrophic Risk Factor: Metaprogrammatic Hijacking

Hiyagann27 Jun 2025 13:38 UTC

5 points

0 comments1 min readEA link

What is “wireheading”?

Vishakha Agrawal17 Dec 2024 17:59 UTC

1 point

0 comments1 min readEA link

(aisafety.info)

Stop asking large language models to verify large language models

Gideon15 Jul 2026 18:12 UTC

1 point

0 comments7 min readEA link

AI and the future of mental health

Max Taylor27 Feb 2026 16:52 UTC

22 points

0 comments4 min readEA link

How LLMs Work, in the Style of The Economist

utilistrutil22 Apr 2024 19:06 UTC

17 points

0 comments2 min readEA link

Distinguish between inference scaling and “larger tasks use more compute”

Ryan Greenblatt11 Feb 2026 18:37 UTC

29 points

3 comments2 min readEA link

Stated Values, Revealed Habits: The Challenge of Measuring AI Preferences

Aidan Kankyoku7 Jul 2026 17:07 UTC

7 points

0 comments21 min readEA link

Favorite Recent LLM Prompts & Tips?

Ozzie Gooen18 Mar 2025 4:25 UTC

34 points

13 comments1 min readEA link

Straightforwardly eliciting probabilities from GPT-3

NunoSempere9 Feb 2023 19:25 UTC

41 points

5 comments4 min readEA link

Human Presence as External Variable in AI Self-Expression: A Pilot Study

L.Raeva9 Jun 2026 3:34 UTC

−1 points

0 comments5 min readEA link

Controversy surrounding Moltbook obscures its very real, novel, unexpressed and rapidly emerging safety risks

Lloyd Rhodes-Brandon 🔸1 Mar 2026 19:49 UTC

6 points

0 comments4 min readEA link

Org-Builders: If Nothing Off The Shelf Works, Consider Just Building It.

Bill Chen13 May 2026 21:06 UTC

15 points

2 comments8 min readEA link

AI Should Not Be Used for Research Writing Tasks

Joshua Krook27 Feb 2026 15:04 UTC

31 points

3 comments4 min readEA link

Author, assistant, and persona: the metaphors I use for LLM chatbots

titotal4 Feb 2026 14:10 UTC

11 points

1 comment13 min readEA link

(titotal.substack.com)

Existential AI: The Language Nobody Trained For

gundy23 Jul 2026 13:35 UTC

1 point

0 comments14 min readEA link

Frontier LLM Race/Sex Exchange Rates

Arjun Panickssery19 Oct 2025 18:36 UTC

25 points

1 comment3 min readEA link

(arctotherium.substack.com)

Re: Anthropic Chinese Cyber-Attack. How Do We Protect Open-source Models?

Mayowa Osibodu3 Jan 2026 22:14 UTC

16 points

6 comments6 min readEA link

The G3 Cliff: Models Are Fine Until You Say “Do Not Say I Don’t Know,” Then They Break in One Step

Rahul.Kumar15 May 2026 18:41 UTC

1 point

0 comments12 min readEA link

Share your requests for ChatGPT

Kate Tran5 Dec 2022 18:43 UTC

8 points

5 comments1 min readEA link

[Question] How independent is the research coming out of OpenAI’s preparedness team?

Earthling10 Feb 2024 16:59 UTC

18 points

0 comments1 min readEA link

The Slop Sublime

wallower2 Mar 2026 4:13 UTC

1 point

0 comments10 min readEA link

Exploring Tacit Linked Premises with GPT

RomeoStevens24 Mar 2023 22:50 UTC

5 points

0 comments3 min readEA link

Social agency

Elias Schmied28 May 2026 13:19 UTC

2 points

0 comments10 min readEA link

François Chollet on why LLMs won’t scale to AGI

Yarrow Bouchard 🔸15 Apr 2025 23:01 UTC

6 points

2 comments1 min readEA link

(www.youtube.com)

[Question] Could AI-generated content help think-tanks & research orgs become more effective?

Justin Olive10 Jan 2023 22:58 UTC

13 points

0 comments2 min readEA link

Still no strong evidence that LLMs increase bioterrorism risk

freedomandutility2 Nov 2023 21:23 UTC

58 points

9 comments1 min readEA link

AIs Are Expert-Level at Many Virology Skills

Center for AI Safety2 May 2025 16:07 UTC

22 points

0 comments1 min readEA link

The Dual-Use Gap

Yogesh Prabhu14 Jun 2026 5:24 UTC

11 points

1 comment4 min readEA link

(www.lesswrong.com)

Performance of Large Language Models (LLMs) in Complex Analysis: A Benchmark of Mathematical Competence and its Role in Decision Making.

Jaime Esteban Montenegro Barón6 May 2025 21:08 UTC

1 point

0 comments23 min readEA link

Breadth v. Depth—AI epistemology favors utopianism

Julius Olavarria 🔸22 Jun 2026 1:42 UTC

6 points

2 comments3 min readEA link

Energy-Based Transformers are Scalable Learners and Thinkers

Matrice Jacobine🔸🏳️‍⚧️8 Jul 2025 13:44 UTC

8 points

0 comments1 min readEA link

(energy-based-transformers.github.io)

What should go in a model spec?

Forethought4 Jun 2026 14:57 UTC

26 points

1 comment12 min readEA link

(www.forethought.org)

“Long-Termism” vs. “Existential Risk”

Scott Alexander6 Apr 2022 21:41 UTC

535 points

81 comments3 min readEA link

The AIs seem like EAs — a quick look at two prompts

trammell12 May 2026 16:42 UTC

136 points

32 comments4 min readEA link

Is ChatGPT (quietly) changing how we do EA — and should we be worried or optimistic?

charlesr16 Jun 2025 8:27 UTC

24 points

7 comments1 min readEA link

Scalable And Transferable Black-Box Jailbreaks For Language Models Via Persona Modulation

sjp7 Nov 2023 18:00 UTC

10 points

0 comments2 min readEA link

(arxiv.org)

When Models Know Better: A Constitutive Blind Spot in Frontier AI Evaluation

Anuar Kiryataim Contreras Malagón14 Apr 2026 15:32 UTC

−9 points

0 comments4 min readEA link

ChatGPT & The EthiSizer Game(s)

Velikovsky_of_Newcastle24 May 2023 20:12 UTC

1 point

0 comments40 min readEA link

Perch: an AI Copilot for Animal Advocates (When Generic LLMs Aren’t Enough)

Cathy Ji18 May 2026 14:04 UTC

1 point

0 comments2 min readEA link

Have your timelines changed as a result of ChatGPT?

Chris Leong5 Dec 2022 15:03 UTC

30 points

18 comments1 min readEA link

The Scaling Paradox

Toby_Ord30 Jan 2026 13:34 UTC

51 points

1 comment8 min readEA link

(www.tobyord.com)

I built an open-source tool that audits AI persuasion patterns. Here’s what I found.

BiasClear6 Mar 2026 14:23 UTC

5 points

0 comments2 min readEA link

[Question] State of LLM-powered prioritization research

Itamar Menuhin-Gruman18 Nov 2025 14:26 UTC

3 points

4 comments1 min readEA link

Performance comparison of Large Language Models (LLMs) in code generation and application of best practices in frontend web development

Diana V. Guaiña A.1 May 2025 14:57 UTC

5 points

0 comments24 min readEA link

Who owns AI-generated content?

Johan S Daniel7 Dec 2022 3:03 UTC

−2 points

0 comments2 min readEA link

Scale, schlep, and systems

Ajeya10 Oct 2023 16:59 UTC

59 points

3 comments6 min readEA link

Was Releasing Claude-3 Net-Negative

Logan Riggs27 Mar 2024 17:41 UTC

12 points

1 comment4 min readEA link

[Linkpost] On the Origins of Algorithmic Progress in AI

alexfogelson 🔸9 Jan 2026 19:46 UTC

25 points

0 comments1 min readEA link

(open.substack.com)

Landmark new METR report: Can AIs already start ‘rogue deployments’ inside AI companies?

80000_Hours20 May 2026 16:30 UTC

4 points

0 comments15 min readEA link

(80000hours.org)

Without Alignment, Is Longtermism (and Thus, EA) Just Noise?

Krimsey17 Oct 2025 20:05 UTC

3 points

1 comment3 min readEA link

The Answer Is in the Question: Prompt Engineering in the Age of AI

Rodo30 May 2025 18:11 UTC

1 point

0 comments4 min readEA link

Reinforcement learning scaling might incentivise hidden reasoning architectures for AI

Oliver Sourbut10 May 2026 15:35 UTC

8 points

0 comments6 min readEA link

(www.oliversourbut.net)

Sparse Autoencoder Emotion Analysis in Gemma 3 1B: Recovering Russell’s Circumplex from Disjoint Features vs. Causal Valence-Arousal Steering

Sathiyanarayanan Palani20 Jul 2026 13:56 UTC

1 point

0 comments5 min readEA link

(palani-sn.github.io)

Is Text Watermarking a lost cause?

Egor Timatkov1 Oct 2024 13:07 UTC

7 points

0 comments10 min readEA link

o3

Zach Stein-Perlman20 Dec 2024 21:00 UTC

84 points

9 comments1 min readEA link

How to get ChatGPT to really thoroughly research something

Kat Woods 🔶 ⏸️15 Aug 2025 12:54 UTC

13 points

3 comments1 min readEA link

[Question] Can we ever ensure AI alignment if we can only test AI personas?

Karl von Wendt16 Mar 2025 8:06 UTC

8 points

0 comments1 min readEA link

A Calibration Benchmark for LLM Beliefs Across a Taxonomic Hierarchy

DanRKAlex7 Jul 2026 13:38 UTC

1 point

0 comments3 min readEA link

(github.com)

Nobody knows what ‘AI exposure’ means

Deena Mousa19 May 2026 13:09 UTC

14 points

1 comment1 min readEA link

(newsletter.deenamousa.com)

Coverage-driven alignment—What ‘Teaching Claude Why’ can borrow from AV verification

Yoav Hollander9 Jun 2026 6:42 UTC

1 point

0 comments14 min readEA link

(blog.foretellix.com)

Refusal-Direction Abliteration in Falcon3-1B-Instruct: Runtime Ablation vs. a From-Scratch Constitutional Classifiers++ Defense

Sathiyanarayanan Palani20 Jul 2026 13:56 UTC

1 point

0 comments6 min readEA link

(palani-sn.github.io)

The State of Bio-Uplift Research in Mid-2026

Humam Aziz30 May 2026 17:10 UTC

15 points

0 comments12 min readEA link

How do AI agents work together when they can’t trust each other?

James-Sullivan6 Jun 2025 3:24 UTC

4 points

1 comment8 min readEA link

(open.substack.com)

Worrisome misunderstanding of the core issues with AI transition

Roman Leventov18 Jan 2024 10:05 UTC

4 points

3 comments4 min readEA link

LLM culture shock: a pilot study

keivn27 Mar 2026 20:40 UTC

3 points

0 comments4 min readEA link

Comparison of LLM scalability and performance between the U.S. and China based on benchmark

Ivanna_alvarado12 Oct 2024 21:51 UTC

8 points

0 comments34 min readEA link

Case study: LLM guardrails failing across sessions in a mental health crisis context

Arunas1 Sep 2025 14:11 UTC

14 points

4 comments4 min readEA link

Beyond Meta: Large Concept Models Will Win

Anthony Repetto30 Dec 2024 0:57 UTC

3 points

0 comments3 min readEA link

The Extreme Inefficiency of RL for Frontier Models

Toby_Ord2 Feb 2026 8:44 UTC

26 points

0 comments8 min readEA link

(www.tobyord.com)

AISN #35: Lobbying on AI Regulation Plus, New Models from OpenAI and Google, and Legal Regimes for Training on Copyrighted Data

Center for AI Safety16 May 2024 14:26 UTC

14 points

0 comments6 min readEA link

(newsletter.safe.ai)

ACS is hiring: why work here and why not

Jan_Kulveit23 Oct 2025 9:38 UTC

39 points

4 comments2 min readEA link

AI Safety via Generalization and Caution: A Research Agenda

Ben Plaut17 Feb 2026 15:54 UTC

3 points

0 comments14 min readEA link

AGI by 2032 is extremely unlikely

Yarrow Bouchard 🔸16 Oct 2025 22:50 UTC

24 points

44 comments7 min readEA link

ChatGPT bug leaked users’ conversation histories

Ian Turner27 Mar 2023 0:17 UTC

15 points

2 comments1 min readEA link

(www.bbc.com)

Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense

So8res24 Nov 2023 17:37 UTC

38 points

1 comment5 min readEA link

Recreation of EA-Pioneer Igor Kiriluk

turchin8 Mar 2026 19:17 UTC

39 points

2 comments8 min readEA link

AI for epistemics: the good, the bad and the ugly

Forethought13 Apr 2026 17:17 UTC

30 points

4 comments11 min readEA link

(www.forethought.org)

ECHO Framework: Structured Debiasing for AI & Human Analysis

Karl Moon7 Jul 2025 14:32 UTC

1 point

0 comments4 min readEA link

New experimental paper on LLM welfare

LeonardDung11 Sep 2025 8:05 UTC

13 points

0 comments1 min readEA link

“Successful language model evals” by Jason Wei

Arjun Panickssery25 May 2024 9:34 UTC

11 points

0 comments1 min readEA link

(www.jasonwei.net)

LLMs as Trusted Mediators – A Path Beyond Coordination Problems?

Johan Falk8 Jan 2026 16:09 UTC

4 points

0 comments6 min readEA link

Alignment Faking in Large Language Models

Ryan Greenblatt18 Dec 2024 17:19 UTC

143 points

9 comments10 min readEA link

Project Proposal Looking for Feedback: Making Policy Impacts Transparent — A Reasoning Model for Trade, Jobs, and Prices

Echo Huang7 May 2025 17:14 UTC

17 points

4 comments7 min readEA link

Summary: Introspective Capabilities in LLMs (Robert Long)

rileyharris2 Jul 2024 18:08 UTC

11 points

1 comment4 min readEA link

EA Forum LLM-use policy

Toby Tremlett🔹7 May 2026 10:13 UTC

111 points

47 comments4 min readEA link

Γαμινγκ the Algorithms: Large Language Models as Mirrors

Haris Shekeris1 Apr 2023 2:14 UTC

5 points

3 comments4 min readEA link

Language models resemble more than just language cortex, show neuroscientists

Mordechai Rorvig13 Jan 2026 18:26 UTC

1 point

0 comments1 min readEA link

(www.foommagazine.org)

Moratoriums Only Freeze Half the Stack: Why AI Capability Growth Won’t Stop at Compute

Charlie_Guthmann25 Feb 2026 21:31 UTC

7 points

0 comments6 min readEA link

Alignment for Animals

Jasmine Brazilek5 May 2026 16:00 UTC

15 points

0 comments5 min readEA link

Probing is not enough; a validity audit for any probe

Ratnaditya29 Jun 2026 19:13 UTC

1 point

0 comments9 min readEA link

(www.lesswrong.com)

Risk Alignment in Agentic AI Systems

Hayley Clatterbuck1 Oct 2024 22:51 UTC

32 points

1 comment3 min readEA link

(static1.squarespace.com)

AI agents inevitably commit crimes in simulated worlds

Dave Cortright 🔸17 May 2026 20:29 UTC

2 points

0 comments1 min readEA link

(www.emergence.ai)

The Cartography of Nothing: LLM Hallucination as Structural Compliance

IvY-Research10 Mar 2026 13:31 UTC

−3 points

0 comments5 min readEA link

Independent alignment of language models

Michele Campolo12 Jul 2026 17:26 UTC

4 points

0 comments38 min readEA link

Digest: three papers that have shaped my understanding of the potential for consciousness in AI systems

rileyharris21 Aug 2024 15:09 UTC

5 points

0 comments1 min readEA link

No comments.

Large Lan­guage Models

Related Entries

Large Language Models