Posted on behalf of Eugenie Samuel Reich.
An Irish mathematician has used a complex algorithm and millions of hours of supercomputing time to solve an important open problem in the mathematics of Sudoku, the game popularized in Japan that involves filling out a 9X9 grid of squares with the numbers 1-9 according to certain rules.
Gary McGuire of University College Dublin shows in a proof posted online 1 January that the minimum number of clues – or starting digits – needed to complete a puzzle is 17 (see sample puzzle, pictured, from McGuire’s paper), as puzzles with 16 clues or less do not have an unique solution. In comparison most newspaper puzzles have around 25 clues, with the difficulty of the puzzle decreasing as more clues are given.
The emerging consensus among mathematicians at a conference in Boston on 7 January was that McGuire’s proof is likely valid and an important advance in the growing field of Sudoku maths.
“The approach is reasonable and it’s plausible. I’d say the attitude is one of cautious optimism,” says Jason Rosenhouse, a mathematician at James Madison University in Harrisonburg, Virginia, and the co-author of a newly released book on the maths of Sudoku.
The rules of Sudoku require puzzlers to fill out a 9X9 grid with the numbers 1-9 in such a way that no digits is repeated within the same column, row, or within one of nine 3X3 subgrids. The clues are numbers that are filled in to begin with and enthusiasts have long observed that while there are a small number of puzzles with 17 clues, no one has been able to come up with a valid 16 clue puzzle.That lead to the conjecture that 16-clue puzzles with unique solutions simply do not exist. A potential way to demonstrate that could be to check all possible completed grids for every 16-clue puzzle, but that would take too much computing time. So McGuire simplified the problem by designing a so-called “hitting set algorithm”. The idea behind this was to search for what he calls unavoidable sets, or arrangements of numbers within the completed puzzle that are interchangeable and so could result in multiple solutions. In order to prevent the unavoidable sets from causing multiple solutions the clues must overlap, or “hit”, the unavoidable sets. Having found the unavoidable sets it is a much smaller, although still non-trivial, computing task to show that there are no 16 clue puzzles that can hit them all.
Having spent 2 years testing the algorithm McGuire and his team used about 7 million CPU hours at the Irish High End Computing Center in Dublin searching through possible grids with the hitting set algorithm. “The only realistic way to do it was the brute force approach,” says Gordon Royle, a mathematician at the University of Western Australian in Perth who had been working on the problem of counting 17 clue puzzles using different algorithms, “it’s a challenging problem that inspires people to push computing and mathematical techniques to the limit. It’s like climbing the highest mountain.”
A consequence of the approach taken is that it will take some time for others to get enough computing time to check the proof, says Laura Taalman, also a mathematician at James Madison University who co-authored the book Taking Sudoku Seriously : The Math Behind the World’s Most Popular Pencil Puzzle with Rosenhouse. Taalman comments that the book, which came out last week, is already now outdated: it says the problem remains open and that whoever solves it will be a “rock star.”
McGuire his approach may pay off in other ways. The idea of hitting sets that he developed for the proof has been used in papers on gene sequencing analysis and cellular networks and he looks forward to seeing if his algorithm can be usefully adapted by other researchers. “Hopefully this will stimulate more interest,” he says.
But he says that ironically as he dedicated more of his time to the maths of the conundrum he spent less time enjoying the puzzle. “I still find it a nice way to relax now and then but to be honest I prefer doing the crossword,” he says.
Correction: The amount of supercomputing power needed to solve the problem has been corrected from 700 million CPU hours to 7 million CPU hours.