2006-07-30

I have done a little programming lately. I stumbled across a web page that extolled the virtues of Huffman coding as a method for compressing chess positions while dismissing move list methods. I did some analysis using the positions from Reinfeld's tactics books and proved to myself that a move list method outperforms everything else on average:

In the graph, PL is a simple Piece List, PLAC is a Piece List with a form of run length encoding, SLHC is a Square List with Huffman Coding, and MLFC is a Move List with Frequency Coding. For SLHC and MLFC, the lighter colored regions show the range of encoding sizes for each piece count band and the darker lines show the average encoding size. MLFC has a wider range than SLHC, but the average encoding size for MLFC is smaller than SLHC for every piece count. PL* is better at lower piece counts due to smaller overhead.

I am not going to release the text for the analysis yet. I suspect that the nature of the test positions is distorting the frequency analysis in certain areas. Huffman coding suggests that a king on g1/g8 deserves a single bit! Reinfeld's books date from 1955. I should test my MLFC format against games from the last fifty years.