Close

Math, SQL, and Chess – Lessons from the 8 Queens Puzzle

I might (eventually) post various implementations of the 8 Queens problem; but that solution is not what this article is about. Instead I want to focus on the mathematical nature of the problem, how it relates to SQL, and how a little bit of math-think can help identify problems of intractability which translate into performance issues in queries.

First – a definition. The 8 Queens problem is a chess puzzle where the player tries to arrange 8 queens on an otherwise empty standard board such that no two queens attack each other.

With that definition let’s do a little math breakdown with successive refinement.

There are 8 pieces and 64 places to put them. So that’s 64*64*64*64*64*64*64*64 possible piece positions or 281,474,976,710,656 (281 Trillion) that’s a lot. Probably more than we want to try to process. Even at a million positions per second it would take almost 9 years to finish.
Can we make it smaller?

Yes! we can only put one piece per square. Each queen placed takes a square away from the next. So it’s not really 64 to the eighth power; but more like 64*63*62*61*60*59*58*57 or 178,462,987,637,760 (178 Trillion) possibilities. Removing over a hundred trillion is definitely helpful but that’s still over 5 years to process them all at a rate of one million per second.
Can we make it smaller?

Yes! The previous numbers consider each queen to be distinct; but for the purposes of this puzzle they are not. A piece is a piece. So there are a lot of permutations that are effectively identical. There is a handy formula for calculating these types of combinations. Choosing 8 squares from a population of 64 is represented as: C(64,8) = 64!/((64-8)!*8!) = 4,426,165,368 (4 Billion.) That’s a huge improvement. Now it would only take a little over an hour at a million positions per second to evaluate them all.
Can we make it smaller?

Yes! To this point all we’ve been doing is looking at putting queens anywhere on the board; but we know from the rules of chess that we can’t just put the queens anywhere. There can only be one queen per rank (row.) With 8 queens and 8 ranks, that’s one rank per queen. So 8*8*8*8*8*8*8*8=16,777,216 (16 Milllion.) Similarly though, we also can only have one queen per file (column.) So we have 8 squares in the first row to choose from for the first queen, and then in second row we have one column no longer allowed, thus 7 squares to choose from in the second row for the next queen. In the third row we have two columns not allowed so 6 are left, and so on. Each succeeding queen has one column fewer to work with as the previous queens claim them. We don’t need to know which column was chosen in each row. It is sufficient to know that a queen choosing any column in a row means that column can’t be used in any other rows. No matter which column is chosen, we know each queen will claim one, meaning the following will have one fewer. Thus the number of positions counting distinct row and column choices has been reduced to 8*7*6*5*4*3*2*1=40320 (8 factorial or 8!). Forty-thousand is a much more manageable number. I wouldn’t want to, but that’s small enough to be conceivably solvable by hand. Definitely something that a computer should be able to iterate through. Since the board is a square, we will get the same number if iterate through 8 columns, claiming 1 row each time.
Can we make it smaller?

Yes, but here it gets trickier since the number of positions eliminated on diagonals varies depending on which square is occupied. The calculations above all worked regardless of the squares chosen for each queen. To continue eliminating choices here, we would need to start examining some of the specific prior choices. So, at this point I’ll cease the math breakdown. I’ll call the 10 orders of magnitude reduction sufficient and move on.

How does this pertain to SQL? How does the math of a chess puzzle translate to query performance? First, you can look at this problem as an 8-way join. Each queen selects legal squares from a table. If you look at just one queen in isolation it has 64 choices. With a little where clause to narrow down rows, we can can say it has 8 choices. Each join acts as a multiplier. In the worst case, a Cartesian join, we will get the large numbers seen above. If the joins are structured so choices are eliminated where the queens would attack each other, the results start to collapse to the smaller numbers we see above.

When faced with a query that seems to take forever, or the plan has some outrageous estimated cardinality, look at your conditions. Are you allowing illegal values to be included in your multiplication? Are your joins not specific enough, causing greater multiplicative effect than is appropriate?

The 8 queens problem, when compared to modern data sets is trivial in terms of data volume. Even if we implement it as the worst possible option an 8 way join of 64 unfiltered choices. That’s an 8-way join of a 64-row tables! When viewed that way, it’s almost ludicrously small, and yet, the multiplicative nature of joins meant the result set exploded into an intractably large pool to evaluate.

Looking again at real-world problems with real data sets in the thousands, millions, and billions of rows – unbounded or insufficiently bounded joins will quickly grow into impossibly large results. When you encounter this it could simply be a matter of a typo or similar small error allowing joins to pair too many rows in each pass. It may also be a requirements/problem description failure.

I ran into such a problem several years ago in a market analysis problem. Business analysts came to me with a set of queries that were taking too long. They came to me for some SQL tuning suggestions, thinking it was just a matter of some syntax limitations in their query design.

After looking at their queries, asking some questions, and checking out the source data I ran a few calculations and determined the problem was they were attempting to process over 5 Quintillion permutations of prices, timings, positions, and other factors. That is, if they somehow were able to generate the full result set, it would be over 5,000,000,000,000,000,000 rows. Even if I magically tuned the query so it could crank out a trillion rows per second it would still take almost 2 months to get the results.

So, instead of trying to rewrite their query as is… I went back and informed them there had to be pieces of the story missing; because the data analysis rules were unusable in there current form. In other words…
Can we make it smaller?

It took several meetings across multiple departments, data owners, and subject matter experts to get the answers needed; but in the end we reduced the problem from quintillions to millions of permutations. There were still a lot of complicated business rules and math to be applied so the total processing time was still between 2 and 3 hours for the full analysis; but it was a usable framework. Using materialized views of partial results allowed for adjustment to some of the inputs and much faster recalculation.

The same type of logic can be applied to smaller, more reasonable queries and tasks. It doesn’t even need to be SQL. It could be iteration in pl/sql or in some other language. If you have loops within loops within loops, either explicitly in your code or inherited by invocation of subroutines that themselves have iteration. When the multiplication yields unworkable results – you will likely have to rethink the problem. It’s not a matter of throwing more hardware at it, or just tweaking the syntax a little.

So, again, this was never really about chess, but using the 8 queens problem gave me a concrete example to reference and compare to. I hear many people say they never use math; but they probably use more than they think they do. Also, these scenarios did not immediately presented themselves as a math problem. But, if you’re open to viewing a problem through your mathematical lens – you can see a little multiplication and division allows you to determine scope of a problem and approach it accordingly.

1 thought on “Math, SQL, and Chess – Lessons from the 8 Queens Puzzle

  1. Hello Sean,

    Your blog on Oracle is interesting and well written

    Please see Helmut Simonis tutorial (Constraint Programming and Partial Search)
    http://www.eclipseclp.org/ELearning/nqueen/handout.pdf

    On our side, we now handle 30×30 sudokus.
    The live integration with SQL (or no-sql) Database is an interesting subject. For example, if we want to store all solutions of any puzzle problem.

Comments are closed.