Before going deep into this... What do you mean by "looks like it's getting stronger," and what do you mean by "helps [them] win?" Helps *which of the two* identical players to win? Win against whom? How could they be changing things that outsiders couldn't see? What could they learn to do except continue to seem to be playing the game? I can imagine, say with chess or go, the NN learning to play a fragile but impressive-looking version of the game. Neither side realizes they are making bad moves, they just have a sort of superstitious view of the game that the other side keeps reinforcing. And the game is confusing enough that human observers can't tell it's crazy. But I can't see how the NN could tell whether it was confusing outside observers, and if it can't tell whether it's succeeding, I don't see how a collaboration strategy could be evolutionarilly maintained. The longer time went on, the more likely the fragile game would be noticed. Or the player itself would notice and suddenly go back to learning the basics. Thompson's hack works because everyone trusts the compiler to referee its own development process. Then Thompson hacks the referee. After that, the hack part never learns. So you would need a situation where people have put an NN in charge of running its own training process. Even then it's harder than Thompson's hack, but I'm going to stop being evil now. There is a C compiler whose compiled code has been automatically proven to match the semantics of its source code. I don't know whether the prover was compiled with that very same C compiler. --Steve
From: Andres Valloud <ten@smallinteger.com> Date: 6/23/20, 2:58 PM
Hi, suppose you're training a neural network via self play. It looks like it's getting stronger. How do you know the versions that get promoted do not also encode, in themselves, by chance, a collaboration mechanism that helps then win?
That is, how do you know the strongest nets do not also help the winning side win when they play the losing side?
How do you know they are not implementing Thompson's compiler hack?