Re: [math-fun] AI training
LCZero then changes strategy itself to take advantage of Stockfish's inferred weakness.
It would be equally plausible tell the story that lc0 sees stockfish as moving into a what stockfish perceives as a stronger position. After all, that's what its supposed to do. I remember a review of a game, deepmind vs stockfish I think, where a locked position was reached, and then in the course of 10 or so moves perceived as "shuffling" by the reviewer, the locked position collapsed into a clear advantage. I think this is a better explanation - making an exchange that's 1% in your favor 10 times. The evaluation of those 1%s is key. There is a potential hazard with any training regimen, of training into a zone that doesn't represent the broader reality.
participants (1)
-
Dave Dyer