Motivation 3

Motivation, 2.

I practiced a certain meditation technique in which I did my best to rouse a goal system that was more selfless than the one I have by default. In Crystal Society terms, I worked on replacing my Safety and Face with a Sacrifice and Heart. As a result, I suffered less.

This leads me into a concern regarding Coherent Extrapolated Volition, which as I understand it is the current candidate for “goal system least likely to be catastrophic if we successfully programmed it into a superintelligent AI.” Having thought about it, it doesn’t seem like a real concern, but I want to preserve my thought process about it in case it’s useful, perhaps either to monastics thinking about AI safety or to AI safety people thinking about monastics thinking about AI safety.

The concern is basically: is CEV just going to extrapolate selfish desire that perpetuates suffering?

The natural response is “no, because you don’t want that.”

So the actual concern is more subtle than this. I’ll do my best here to articulate a hypothesis that I assign moderate confidence to, but that is hard to understand: the hypothesis is that merely getting what I want leads to suffering.

It is, of course, textbook Buddhism.

That’s not why I believe it.

You’ll have to take my word for this, but I didn’t type that because I was thinking about the Four Noble Truths and wanted to figure out how to put the second one into words. I typed that because I was thinking about how to put into words a strange phenomenon I’d noticed in direct experience. Once I typed it, I then could look at it and say “oh, that’s just the second noble truth.” But I actually didn’t notice that until I’d typed it.

(Maybe you think that I’ve been so indoctrinated into Buddhism that it’s become subconscious, that its memes have woven themselves into my deep mind and are biasing my experience at a level below thought. I have no real response to that, aside from to say that I still don’t feel at all comfortable with Buddhism on any level, don’t identify as Buddhist, and haven’t done any intentional study of Buddhist texts. So maybe it’s true, but I don’t see a clear mechanism for it.)

So but the reason I believe that hypothesis is that I’ve tried again and again to experimentally falsify it, and have not been successful. I reliably suffer after getting what I want (and I’m confident I could come up with 10 examples on the spot). I reliably don’t suffer after letting go of what I want (again, confident I could do 10). I rarely or never don’t suffer after getting what I want (number of observed cases where this has clearly happened: 0, although the tendency for this is for it to be kind of hazy—there are far more cases where I’m not sure). And I can’t even imagine a mechanism by which I would suffer from letting go of what I want. (Strawman mechanism: your desires are produced by evolution to keep you in homeostasis, and being outside of homeostasis is suffering. Response: in my experience, no, it isn’t. Being outside of homeostasis and wanting to be in homeostasis is suffering. Goto 10.)

The thing is, I wouldn’t have come to this conclusion by thinking about it. No matter how much I thought about it, I wouldn’t have discovered it. I don’t think it even would’ve occurred to me unless I’d looked carefully at the results of lots of experiments of the sort, and I don’t know where I would’ve gotten the idea to do that investigation unless someone had suggested that I do it.

But of course the AI computing CEV would quickly realize that people wanted things that aligned with actual experience, so it/we would start experimenting, and to the extent that this is real, it/we would realize it. So there’s no concern here. So, great, CEV wins out against this particular uninformed, hand-wavy investigation.