LWMap/Coherence Arguments Do Not Imply Goal Directed Behavior

08 Mar 2022 - 17 Jun 2023

coherence arguments in the Rationalist context are demonstrations that certain ways of thinking are provably optimal, under certain assumptions. The details are complicated, but the upshot is that any decision problem can be cast into terms of maximizing a utility function.

One of the most pleasing things about probability and expected utility theory is that there are many coherence arguments that suggest that these are the “correct” ways to reason. If you deviate from what the theory prescribes, then you must be executing a dominated strategy. There must be some other strategy that never does any worse than your strategy, but does strictly better than your strategy with certainty in at least one situation.

The problem is that the assumptions of this argument are false for the only actual intelligences we know about (humans). The theory assumes preferences that are self-consistent but also don't vary over time. Human preferences are notably inconsistent and vary over time. Rationalism and decision theory describe an idealization of mind, not actual minds.

Humans are subject to inconsistent judgements (eg in the classic Trolley problem). Human desires are often at war with each other (eg in cases of akrasia or addiction). George Ainslie

The Rationalism counter to this, I think, is to say that humans are imperfectly rational due to the accidents of evolution, but AIs, being designed and untroubled by the complexity of biology, will be able to achieve something closer to theoretical rationality. Since this is provably better than what humans do, humans are potentially in deep trouble. Hence they have taken on the dual task of making humans more rational, and figuring out how to constrain AIs so they won't kill us.