If Artificial General Intelligence has an okay outcome, what will be the reason? — Hacks like RLHF-ing self-disempowerment into frontier models work long enough to develop better alignment methods, which in turn work long enough to ... etc; we keep ahead of 'alignment escape velocity'

Question

Accepted Answer

On Manifold Markets, "If Artificial General Intelligence has an okay outcome, what will be the reason? — Hacks like RLHF-ing self-disempowerment into frontier models work long enough to develop better alignment methods, which in turn work long enough to ... etc; we keep ahead of 'alignment escape velocity'" has a probability of 1.5%.

If Artificial General Intelligence has an okay outcome, what will be the reason? — Hacks like RLHF-ing self-disempowerment into frontier models work long enough to develop better alignment methods, which in turn work long enough to ... etc; we keep ahead of 'alignment escape velocity'

Single Platform Data