knuckleheads4 hours ago
I'm not a chess engine guy, but I've talked to some, and, from what I recall, there is a very interesting difference between an engine like Leela Chess Zero (lc0) and Stockfish. Stockfish internally calculates in centipawns while lc0 calculates in WDL's. Stockfish has a model they use that converts their centipawn calculation to WDL's, but it's not _really_ WDL of the position, it's just their estimate of it according to a probabilistic model. Same in reverse applies to lc0. Why I find this interesting is that it shows how they come from different generations, with Stockfish representing the old deterministic style with deep search, and lc0 being directly inspired by Alpha Zero and the new generation of engines based on neural nets. Stockfish has by now adopted the best of both worlds (deep search with a small neural net) and is the better for it, but I still think the developers of both engines banter over who is really producing the True WDL numbers for a given position.
For my part, I find that WDL is more amendable to interpretation. Being up 5 pawns worth of material sort of makes sense, but being told you have a 95% chance of winning makes more sense to me at first blush.
n_e3 hours ago
> but I still think the developers of both engines banter over who is really producing the True WDL numbers for a given position
In fact, stockfish's WDL is very rudimentary: it is a function of the centipawn evaluation of the position and the value of the remaining material.
See https://github.com/official-stockfish/Stockfish/blob/a6d055d...
tarentel3 hours ago
To your last point, the centipawns thing doesn't make a whole lot of sense from an interpretation perspective because it is so shallow. WDL can give you much more insight into how tame or chaotic things are. A 1 pawn evaluated advantage with a 95% chance to win is wildly different from a similar evaluation and a 50% chance to win. The first position likely has an obvious tactic that leads to a win, the latter may require perfect play for 15 moves that only a computer can calculate.
Also, from a computer perspective, a >= 1 pawn is usually sufficient for a computer to win 100% of the time so it's not really interesting and says very little about whether a person could win 100% of the time.
deklesen3 hours ago
FWIW, as an avid chess player, I find the "up 5 pawns" has more intuitive signal.
ramses06 hours ago
"""under perfect play all chess games be a the same single one outcome of the following (we just currently don’t know which one, “A” playing the white pieces):
Mr. A says, “I resign” or Mr. B says, “I resign” or Mr. A says, “I offer a draw,” and Mr. B replies, “I accept.” That is, under perfect play, each chess position is either a forced win, forced draw, or forced loss. The domain of a perfect chess position evaluation function is these three cases as symbols."""
There's an interesting point I've heard of in Backgammon, somewhat related to this statement. Modern Backgammon offers "the doubling cube" as a play option. https://en.wikipedia.org/wiki/Backgammon#Doubling_cube
...basically if you think you're going to win (aka: you have a 200 centi-pawn advantage), you can offer the doubling cube to your opponent (doubling the stakes of losing). If you're playing to win $5, and halfway through you think "yep, 90% chance I'm going to win this one...", you push the doubling cube to 2x (aka: $10 consequence), and kindof like poker your opponent has to evaluate whether it's "worth it" for them to stay in the game.
You might imagine a "2xELO penalty" where White takes a Queen with a Pawn, and then offers "2x, or I'm gonna beat 'ya!". If Black say "Naaah, you just activated my trap card!" and then either accepts "2x" or pushes back at "4x", then it becomes a little more like poker... you think you can beat me, then prove it!
Not that I'm suggesting changing the rules of Chess, but overall I'm really fascinated by the concept of formalized semi-out-of-band risk-taking to potentially end games early.
qsort6 hours ago
The doubling cube works well in Backgammon because it is a rare example of a popular game with randomness, without hidden information (every information set contains exactly one node of the decision tree, if you want to get extremely technical,) and, critically, with "different endings" (normal win, gammon, backgammon.) Doubling decisions are especially interesting because while they're always objective (it could never be the case that perfect players disagree on the correct move, that requires nontrivial information sets,) it could be the case that:
- it's correct for a player to double and for the other to accept;
- it's correct for a player to double and for the other not to accept;
- the position is "too good to double," because the equity from the probability of a double or triple game exceeds the advantage you'd get from a double;
- all of the above being influenced by the match score, e.g. if I'm 3 points away from winning and you're 5 points away from winning, I could make different decisions than if it were the opposite.
Chess has none of them, the doubling cube would be exclusively a psychological power play, something like "it's theoretically drawn but I don't think you can defend it," which is not a great game dynamic.
In general, transplanting the doubling mechanic without a similarly rich context doesn't tend to work well.
jmountop6 hours ago
This is an important point. Thank you.
Games like backgammon (that have betting and the doubling cube to continue), Go (which is calculated in stones), and bridge (again having points) have more natural intermediate scoring systems than chess.
In my opinion the "winner takes all" aspect of chess is similar to what makes analyzing voting systems difficult. In a non game context: Aspnes, Beigel, Furst, and Rudich had some amazing work on how all or nothing calculation really changes things: https://www.cs.yale.edu/homes/aspnes/papers/stoc91voting.pdf .
ramses05 hours ago
For a while I really dug in to multiple player (and teams-of-players) ELO calculations. I got into an argument with my friend about whether second place was any better than last place... specifically in poker, but applicable to multiple games (imagine chinese checkers [race to finish], or carcassonne/ticket-to-ride [semi-hidden scoring until the end]).
His POV was that "if you don't win, you lose" and my POV was "second place is better than last place". His response was: "if I play poker to get first place it's wildly different than playing for second or third place [and I may end up in last place wildly more often due to risk % or bad beats]"
I've been more used to "climbing" type performance games (ie: last place => mid-field => second place => first place) and in my gut I wanted my ELO to reflect that (top-half players are better than bottom-half players), however his very valid point was that different games have different payout matrices (eg: poker is often "top-3 payout", and first may be 10x second or third).
I think in my mind I've settled on EV-payout for multiplayer games should match the "game payout", and that maybe my gut is telling me the difference between "Casual ELO" (aka: top-half > bottom-half), and "Competitive ELO" (aka: only the winner gets paid).
hyperpape6 hours ago
Go is also winner take all. It's psychologically satisfying to have a big win, in the same way that it's psychologically satisfying to achieve a brilliant checkmate, but in any ordinary game or tournament (outside of certain gambling setups), a win by 1/2 point is the same as a win by 20+ points.
aidenn03 hours ago
Yes and no. One could say this of any game with points where the margin of victory doesn't affect long-term outcomes (e.g. most ball games).
A win by 1/2 point or 20 points it suggests a very different relative skill between the two players. Similarly the custom of the stronger player playing white without komi suggests that the point differential matters.
kuboblean hour ago
Not necessarily. In go you often calculate the score and come up with a conclusion that by playing proper moves you will lose by a small margin.
So instead you launch a desperate maneuver in a hope to either turn the game around or lose by 30 points.
paulddraper7 hours ago
Surprised it didn’t mention until the very end, but since chess is deterministic, there is no objective probability.
Every position is objectively plus infinity, minus infinity, or zero.
The “advantage” is an engine-specific notion that helps prune search paths.
Some chess engines don’t even evaluate an advantage.
kuboble6 hours ago
There are also objective measures for more fine position evaluation.
For winning/drawn positions: "What is the smallest program that can guarantee your side to win/draw" probably adding some time constraint.
paulddraper2 hours ago
Measuring the size of a model that produces a win?
Theoretically valid, but that's not going to be a very useful/diable.
kuboblean hour ago
No, but in practice centipawns reported by the imperfect engine are good.
But I want to point out that in theory there is also something more than pure win/ lose/ draw with prefect play.
jmountop6 hours ago
That is a neat variation.
im3w1l5 hours ago
I think program size is probably not a good measure since any heuristic you can put in could be discovered at runtime with a metaheuristic that searches for good heuristics. Time and memory make more sense.
monktastic12 hours ago
Yes, this is a huge omission, because it means that as engines improve, the stated advantage becomes increasingly meaningless to humans (which is the opposite of what we may intuitively expect).
What I really want to know as a player is how easy it will be for me to win from this position against someone of my opponent's strength, which is admittedly a very hard thing to define, let alone compute.
TZubiri5 hours ago
Not only it is mentioned, but it's mentioned that it was mentioned as early as 1950, by none other than Claude Shannon:
>""under perfect play all chess games be a the same single one outcome of the following (we just currently don’t know which one, “A” playing the white pieces): Mr. A says, “I resign” or Mr. B says, “I resign” or Mr. A says, “I offer a draw,” and Mr. B replies, “I accept"
throwaway1988465 hours ago
fractional tempi sounds fascinating