AI Ruined My Year

AI Ruined My Year

Why Does AI Lie, and What Can We Do About It?

Why Does AI Lie, and What Can We Do About It?

We Were Right! Real Inner Misalignment

We Were Right! Real Inner Misalignment

Intro to AI Safety, Remastered

Intro to AI Safety, Remastered

Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...

Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think...

The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

Quantilizers: AI That Doesn't Try Too Hard

Quantilizers: AI That Doesn't Try Too Hard

Sharing the Benefits of AI: The Windfall Clause

Sharing the Benefits of AI: The Windfall Clause

10 Reasons to Ignore AI Safety

10 Reasons to Ignore AI Safety

9 Examples of Specification Gaming

9 Examples of Specification Gaming

Training AI Without Writing A Reward Function, with Reward Modelling

Training AI Without Writing A Reward Function, with Reward Modelling

AI That Doesn't Try Too Hard - Maximizers and Satisficers

AI That Doesn't Try Too Hard - Maximizers and Satisficers

Is AI Safety a Pascal's Mugging?

Is AI Safety a Pascal's Mugging?

A Response to Steven Pinker on AI

A Response to Steven Pinker on AI

How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification

How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification

Why Not Just: Think of AGI Like a Corporation?

Why Not Just: Think of AGI Like a Corporation?

Safe Exploration: Concrete Problems in AI Safety Part 6

Safe Exploration: Concrete Problems in AI Safety Part 6

Friend or Foe? AI Safety Gridworlds extra bit

Friend or Foe? AI Safety Gridworlds extra bit

AI Safety Gridworlds

AI Safety Gridworlds

Experts' Predictions about the Future of AI

Experts' Predictions about the Future of AI

Why Would AI Want to do Bad Things? Instrumental Convergence

Why Would AI Want to do Bad Things? Instrumental Convergence

Superintelligence Mod for Civilization V

Superintelligence Mod for Civilization V

Intelligence and Stupidity: The Orthogonality Thesis

Intelligence and Stupidity: The Orthogonality Thesis

Scalable Supervision: Concrete Problems in AI Safety Part 5

Scalable Supervision: Concrete Problems in AI Safety Part 5

AI Safety at EAGlobal2017 Conference

AI Safety at EAGlobal2017 Conference

AI learns to Create ̵K̵Z̵F̵ ̵V̵i̵d̵e̵o̵s̵ Cat Pictures: Papers in Two Minutes #1

AI learns to Create ̵K̵Z̵F̵ ̵V̵i̵d̵e̵o̵s̵ Cat Pictures: Papers in Two Minutes #1

What can AGI do? I/O and Speed

What can AGI do? I/O and Speed

What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4

What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4

Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5

Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5

The other

The other "Killer Robot Arms Race" Elon Musk should worry about