Thread: Searchable chess positions in a Postgress DB

Searchable chess positions in a Postgress DB

From

Sidney Cadot

Date:

11 April 2012, 04:16:15

Dear all,

As a hobby project, I am toying around with a database containing
about 5 million chess games. On average, these games have about 80
positions (~ 40 moves by both black and white), which means there are
about 400 million chess positions in there.

I have written code to extract these positions, and now I want to put
them into a Postgres database. Specifically, I want to do this in a
way that allows *fast* lookups of positions, e.g. "give me all
positions that have a White King on c4 and either a Black Bishop or
White Knight on f7".

Currently, my "Positions" table looks like this:

      Column       |  Type   | Modifiers
-------------------+---------+-----------
 gameindex         | integer | not null
 plyindex          | integer | not null
 pseudofenboard    | text    | not null
 fenside           | text    | not null
 fencastling       | text    | not null
 fenenpassant      | text    | not null
 possiblemovecount | integer | not null
 isincheck         | boolean | not null
Indexes:
    "positions_pkey" PRIMARY KEY, btree (gameindex, plyindex)
Foreign-key constraints:
    "positions_gameindex_fkey" FOREIGN KEY (gameindex) REFERENCES
games(gameindex)

The "PseudoFenBoard" field currently holds a string describing the
position. For example, the starting position of chess looks like this:

"rnbqkbnr/pppppppp/________/________/________/________/PPPPPPPP/RNBQKBNR"

This design allows me to formulate the kind of positional queries that
I want (by using regular expression matching), but executing them will
involve a slow, linear traversal of the 400M table rows, which is not
desirable.

I am toying around with the ugly idea to make a "Positions" table that
has a single field for each of the squares, e.g.

CREATE TABLE Position2 (
    GameIndex INTEGER NOT NULL,
    PlyIndex  INTEGER NOT NULL,
    a1        "char"  NOT NULL,
    a2        "char"  NOT NULL,
                                    -- (60 fields defs omitted)
    h7        "char"  NOT NULL,
    h8        "char"  NOT NULL
);

This would allow the creation of indices on each of the 64 fields
separately, which should help to achieve near-instantaneous position
query performance, especially after gathering proper statistics for
all the field-specific indices.

I realize that this design is quite ugly, so I would be interested to
hear if there are nicer alternatives that can perform equally well.

Also, above I use the 1-byte "char" type. Is this the only type in
PostGres that is guaranteed to be just a single byte, or are there
better alternatives? A 13-state enum would be best (listing the 6
white pieces, 6 black pieces, and 'empty' states for every square on
the board) but as I understand from the documentation, enums always up
take 4 bytes per entry.

Any ideas for improvement would be greatly appreciated.

Re: Searchable chess positions in a Postgress DB

From

Alban Hertroys

Date:

11 April 2012, 06:14:27

On 11 April 2012 09:15, Sidney Cadot <sidney@jigsaw.nl> wrote:
> Dear all,
>
> As a hobby project, I am toying around with a database containing
> about 5 million chess games. On average, these games have about 80
> positions (~ 40 moves by both black and white), which means there are
> about 400 million chess positions in there.
>
> I have written code to extract these positions, and now I want to put
> them into a Postgres database. Specifically, I want to do this in a
> way that allows *fast* lookups of positions, e.g. "give me all
> positions that have a White King on c4 and either a Black Bishop or
> White Knight on f7".
>
> Currently, my "Positions" table looks like this:
>
>      Column       |  Type   | Modifiers
> -------------------+---------+-----------
>  gameindex         | integer | not null
>  plyindex          | integer | not null
>  pseudofenboard    | text    | not null
>  fenside           | text    | not null
>  fencastling       | text    | not null
>  fenenpassant      | text    | not null
>  possiblemovecount | integer | not null
>  isincheck         | boolean | not null
> Indexes:
>    "positions_pkey" PRIMARY KEY, btree (gameindex, plyindex)
> Foreign-key constraints:
>    "positions_gameindex_fkey" FOREIGN KEY (gameindex) REFERENCES
> games(gameindex)
>
> The "PseudoFenBoard" field currently holds a string describing the
> position. For example, the starting position of chess looks like this:
>
> "rnbqkbnr/pppppppp/________/________/________/________/PPPPPPPP/RNBQKBNR"
>
> This design allows me to formulate the kind of positional queries that
> I want (by using regular expression matching), but executing them will
> involve a slow, linear traversal of the 400M table rows, which is not
> desirable.
>
> I am toying around with the ugly idea to make a "Positions" table that
> has a single field for each of the squares, e.g.
>
> CREATE TABLE Position2 (
>    GameIndex INTEGER NOT NULL,
>    PlyIndex  INTEGER NOT NULL,
>    a1        "char"  NOT NULL,
>    a2        "char"  NOT NULL,
>                                    -- (60 fields defs omitted)
>    h7        "char"  NOT NULL,
>    h8        "char"  NOT NULL
> );
>
> This would allow the creation of indices on each of the 64 fields
> separately, which should help to achieve near-instantaneous position
> query performance, especially after gathering proper statistics for
> all the field-specific indices.
>
> I realize that this design is quite ugly, so I would be interested to
> hear if there are nicer alternatives that can perform equally well.
>
> Also, above I use the 1-byte "char" type. Is this the only type in
> PostGres that is guaranteed to be just a single byte, or are there
> better alternatives?

No, you're using the unlimited length "char" type. You probably mean
to use either char(1) or varchar(1).

It wouldn't hurt to make those chars a foreign key to a chess-piece
table, so that you both define what a certain code means on the
chess-board and constrain which codes are available.

You could possibly take that a bit further by finding a constraint
that limits the number of each piece on the board (exactly 1 white
king, up to 2 white towers, up to 8 white pawns, etc), but I have no
ready idea of how you could achieve that...

> A 13-state enum would be best (listing the 6
> white pieces, 6 black pieces, and 'empty' states for every square on
> the board) but as I understand from the documentation, enums always up
> take 4 bytes per entry.

> Any ideas for improvement would be greatly appreciated.

Well, since a chess-board is a matrix, it seems to make sense to
describe it as a two-dimensional array of varchar(1)'s. I don't know
off the top of my head what the storage requirements for that are
though.

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

Re: Searchable chess positions in a Postgress DB

From

Gianni Ciolli

Date:

11 April 2012, 06:33:13

On Wed, Apr 11, 2012 at 09:15:59AM +0200, Sidney Cadot wrote:
> Dear all,
>
> As a hobby project, I am toying around with a database containing
> about 5 million chess games. On average, these games have about 80
> positions (~ 40 moves by both black and white), which means there are
> about 400 million chess positions in there.

Sounds very interesting!

> I am toying around with the ugly idea to make a "Positions" table that
> has a single field for each of the squares, e.g.
>
> CREATE TABLE Position2 (
>     GameIndex INTEGER NOT NULL,
>     PlyIndex  INTEGER NOT NULL,
>     a1        "char"  NOT NULL,
>     a2        "char"  NOT NULL,
>                                     -- (60 fields defs omitted)
>     h7        "char"  NOT NULL,
>     h8        "char"  NOT NULL
> );
>
> This would allow the creation of indices on each of the 64 fields
> separately, which should help to achieve near-instantaneous position
> query performance, especially after gathering proper statistics for
> all the field-specific indices.
>
> I realize that this design is quite ugly, so I would be interested to
> hear if there are nicer alternatives that can perform equally well.

You could instead create 64 partial indexes on the specific position:

  CREATE INDEX i_a1 ON Positions (substring(PseudoFenBoard FROM 1 FOR 1))
    WHERE substring(PseudoFenBoard FROM 1 FOR 1) != ' ';

  CREATE INDEX i_a2 ON Positions (substring(PseudoFenBoard FROM 2 FOR 1))
    WHERE substring(PseudoFenBoard FROM 2 FOR 1) != ' ';

  ...

so that (a) you don't have to split PseudoFenBoard in more than 1
column, and (b) the indexes are smaller, because they don't index rows
having that position empty (I am assuming that you never ask for games
where a certain position is empty; if you do, then you need to remove
the WHERE clause).

> Also, above I use the 1-byte "char" type. Is this the only type in
> PostGres that is guaranteed to be just a single byte, or are there
> better alternatives? A 13-state enum would be best (listing the 6
> white pieces, 6 black pieces, and 'empty' states for every square on
> the board) but as I understand from the documentation, enums always up
> take 4 bytes per entry.

I think using the 1-byte char is a fairly good choice; you could pack
up your structure in a smaller bit string, but then you add complexity
elsewhere and it might be desirable to keep things simple for now.

Regards,
Dr. Gianni Ciolli - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
gianni.ciolli@2ndquadrant.it | www.2ndquadrant.it

Re: Searchable chess positions in a Postgress DB

From

Bèrto ëd Sèra

Date:

11 April 2012, 06:43:46

Hi!

As a hobby project, I am toying around with a database containing
about 5 million chess games. On average, these games have about 80
positions (~ 40 moves by both black and white), which means there are
about 400 million chess positions in there.

What happens if you think of it in terms of "space of phases", so that we define a single "position" as a record, and build a game as a path from the common starting position to the end game position? This would, among other things, tell you what games passed through exactly the same position. You'd need some sort of soundex algorithm to find out similar positions, through a ranking function. Once you have that you could analyse different players' strategic thought.

It would probably also pretty much compress the volume of your data. It would grow a lot in the beginning, but once most common positions are mapped the growth should slow down a lot.

Besides, I'd use a point type to build the chessboard, rather than a traditional chess Letter/Number notation.

Bèrto

On 11 April 2012 08:15, Sidney Cadot <sidney@jigsaw.nl> wrote:

Dear all,

As a hobby project, I am toying around with a database containing
about 5 million chess games. On average, these games have about 80
positions (~ 40 moves by both black and white), which means there are
about 400 million chess positions in there.

I have written code to extract these positions, and now I want to put
them into a Postgres database. Specifically, I want to do this in a
way that allows *fast* lookups of positions, e.g. "give me all
positions that have a White King on c4 and either a Black Bishop or
White Knight on f7".

Currently, my "Positions" table looks like this:

Column | Type | Modifiers
-------------------+---------+-----------
gameindex | integer | not null
plyindex | integer | not null
pseudofenboard | text | not null
fenside | text | not null
fencastling | text | not null
fenenpassant | text | not null
possiblemovecount | integer | not null
isincheck | boolean | not null
Indexes:
"positions_pkey" PRIMARY KEY, btree (gameindex, plyindex)
Foreign-key constraints:
"positions_gameindex_fkey" FOREIGN KEY (gameindex) REFERENCES
games(gameindex)

The "PseudoFenBoard" field currently holds a string describing the
position. For example, the starting position of chess looks like this:

"rnbqkbnr/pppppppp/________/________/________/________/PPPPPPPP/RNBQKBNR"

This design allows me to formulate the kind of positional queries that
I want (by using regular expression matching), but executing them will
involve a slow, linear traversal of the 400M table rows, which is not
desirable.

I am toying around with the ugly idea to make a "Positions" table that
has a single field for each of the squares, e.g.

CREATE TABLE Position2 (
GameIndex INTEGER NOT NULL,
PlyIndex INTEGER NOT NULL,
a1 "char" NOT NULL,
a2 "char" NOT NULL,
-- (60 fields defs omitted)
h7 "char" NOT NULL,
h8 "char" NOT NULL
);

This would allow the creation of indices on each of the 64 fields
separately, which should help to achieve near-instantaneous position
query performance, especially after gathering proper statistics for
all the field-specific indices.

I realize that this design is quite ugly, so I would be interested to
hear if there are nicer alternatives that can perform equally well.

Also, above I use the 1-byte "char" type. Is this the only type in
PostGres that is guaranteed to be just a single byte, or are there
better alternatives? A 13-state enum would be best (listing the 6
white pieces, 6 black pieces, and 'empty' states for every square on
the board) but as I understand from the documentation, enums always up
take 4 bytes per entry.

Any ideas for improvement would be greatly appreciated.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
==============================
If Pac-Man had affected us as kids, we'd all be running around in a darkened room munching pills and listening to repetitive music.

Re: Searchable chess positions in a Postgress DB

From

Ondrej Ivanič

Date:

11 April 2012, 07:02:09

Hi,

On 11 April 2012 17:15, Sidney Cadot <sidney@jigsaw.nl> wrote:
> I have written code to extract these positions, and now I want to put
> them into a Postgres database. Specifically, I want to do this in a
> way that allows *fast* lookups of positions, e.g. "give me all
> positions that have a White King on c4 and either a Black Bishop or
> White Knight on f7".

I would try to use single table with 16 columns like:
white_pawn char(2)[] -- like {'c1', 'd3', ... }, max 8 elements
white_rook char(2)[] -- max 2 elements
white_bishop char(2)[] -- max 2 elements
white_knight char(2)[] -- max 2 elements
white_queen char(2)
white_king char(2)
black_pawn_1 char(2)[]
...
black_king char(2)

and each column; char(2) and char(2)[] should have btree and GiST
index respectively. The query should looks like this:
select * from positions where white_king = 'c4' and (white_bishop &&
ARRAY['f7'] or white_knight && ARRAY['f7'])

Another alternative might be to use hstore (and GiST index):
http://www.postgresql.org/docs/9.1/static/hstore.html

--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)

Re: Searchable chess positions in a Postgress DB

From

Gavin Flower

Date:

11 April 2012, 09:50:44

On 11/04/12 19:15, Sidney Cadot wrote:

Dear all,

As a hobby project, I am toying around with a database containing
about 5 million chess games. On average, these games have about 80
positions (~ 40 moves by both black and white), which means there are
about 400 million chess positions in there.

I have written code to extract these positions, and now I want to put
them into a Postgres database. Specifically, I want to do this in a
way that allows *fast* lookups of positions, e.g. "give me all
positions that have a White King on c4 and either a Black Bishop or
White Knight on f7".

The "PseudoFenBoard" field currently holds a string describing the
position. For example, the starting position of chess looks like this:

"rnbqkbnr/pppppppp/________/________/________/________/PPPPPPPP/RNBQKBNR"

This design allows me to formulate the kind of positional queries that
I want (by using regular expression matching), but executing them will
involve a slow, linear traversal of the 400M table rows, which is not
desirable.

I am toying around with the ugly idea to make a "Positions" table that
has a single field for each of the squares, e.g.

CREATE TABLE Position2 ( GameIndex INTEGER NOT NULL, PlyIndex INTEGER NOT NULL, a1 "char" NOT NULL, a2 "char" NOT NULL, -- (60 fields defs omitted) h7 "char" NOT NULL, h8 "char" NOT NULL
);

This would allow the creation of indices on each of the 64 fields
separately, which should help to achieve near-instantaneous position
query performance, especially after gathering proper statistics for
all the field-specific indices.

I realize that this design is quite ugly, so I would be interested to
hear if there are nicer alternatives that can perform equally well.

Also, above I use the 1-byte "char" type. Is this the only type in
PostGres that is guaranteed to be just a single byte, or are there
better alternatives? A 13-state enum would be best (listing the 6
white pieces, 6 black pieces, and 'empty' states for every square on
the board) but as I understand from the documentation, enums always up
take 4 bytes per entry.

Any ideas for improvement would be greatly appreciated.

How aboutsomething like the following (game and postion would have more fields in practice, like comments and where played)?

DROP TABLE IF EXISTS game CASCADE;

CREATE TABLE game
(
    id int PRIMARY KEY,
    name_white text,
    name_black text,
    played timestamptz
);

CREATE TABLE position
(
   id int PRIMARY KEY,
   game_id int REFERENCES game (id),
   ply int
);

CREATE TABLE piece
(
    id int PRIMARY KEY,
    position_id int REFERENCES position (id),
    rank char, -- 1...8 from white's perspective
    file char, -- a...h
    white boolean,
    type char -- P.R,N,B,K,Q
);

CREATE UNIQUE INDEX square ON piece (rank, file, type, white);

SELECT
   p.position_id
FROM
    piece p
WHERE
   (     p.white
       AND p.type = 'K'
       AND p.file = 'c'
       AND p.rank = '4'
   )
   AND
   (
       ((NOT p.white AND p.type = 'B') OR (p.white AND p.type = 'K'))
       AND p.file = 'f'
       AND p.rank = '7'
   );

Cheers,
Gavin

Re: Searchable chess positions in a Postgress DB

From

Cédric Villemain

Date:

11 April 2012, 11:09:24

Le mercredi 11 avril 2012 09:15:59, Sidney Cadot a écrit :
> Dear all,
>
> As a hobby project, I am toying around with a database containing
> about 5 million chess games. On average, these games have about 80
> positions (~ 40 moves by both black and white), which means there are
> about 400 million chess positions in there.
>
> I have written code to extract these positions, and now I want to put
> them into a Postgres database. Specifically, I want to do this in a
> way that allows *fast* lookups of positions, e.g. "give me all
> positions that have a White King on c4 and either a Black Bishop or
> White Knight on f7".
>
> Currently, my "Positions" table looks like this:
>
>       Column       |  Type   | Modifiers
> -------------------+---------+-----------
>  gameindex         | integer | not null
>  plyindex          | integer | not null
>  pseudofenboard    | text    | not null
>  fenside           | text    | not null
>  fencastling       | text    | not null
>  fenenpassant      | text    | not null
>  possiblemovecount | integer | not null
>  isincheck         | boolean | not null
> Indexes:
>     "positions_pkey" PRIMARY KEY, btree (gameindex, plyindex)
> Foreign-key constraints:
>     "positions_gameindex_fkey" FOREIGN KEY (gameindex) REFERENCES
> games(gameindex)
>
> The "PseudoFenBoard" field currently holds a string describing the
> position. For example, the starting position of chess looks like this:
>
> "rnbqkbnr/pppppppp/________/________/________/________/PPPPPPPP/RNBQKBNR"
>
> This design allows me to formulate the kind of positional queries that
> I want (by using regular expression matching), but executing them will
> involve a slow, linear traversal of the 400M table rows, which is not
> desirable.
>
> I am toying around with the ugly idea to make a "Positions" table that
> has a single field for each of the squares, e.g.
>
> CREATE TABLE Position2 (
>     GameIndex INTEGER NOT NULL,
>     PlyIndex  INTEGER NOT NULL,
>     a1        "char"  NOT NULL,
>     a2        "char"  NOT NULL,
>                                     -- (60 fields defs omitted)
>     h7        "char"  NOT NULL,
>     h8        "char"  NOT NULL
> );
>
> This would allow the creation of indices on each of the 64 fields
> separately, which should help to achieve near-instantaneous position
> query performance, especially after gathering proper statistics for
> all the field-specific indices.
>
> I realize that this design is quite ugly, so I would be interested to
> hear if there are nicer alternatives that can perform equally well.
>
> Also, above I use the 1-byte "char" type. Is this the only type in
> PostGres that is guaranteed to be just a single byte, or are there
> better alternatives? A 13-state enum would be best (listing the 6
> white pieces, 6 black pieces, and 'empty' states for every square on
> the board) but as I understand from the documentation, enums always up
> take 4 bytes per entry.
>
> Any ideas for improvement would be greatly appreciated.

I'll go test with BloomFiltering and multiple columns.
http://www.sai.msu.su/~megera/wiki/bloom

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

Re: Searchable chess positions in a Postgress DB

From

Merlin Moncure

Date:

12 April 2012, 10:55:42

2012/4/11 Ondrej Ivanič <ondrej.ivanic@gmail.com>:
> Hi,
>
> On 11 April 2012 17:15, Sidney Cadot <sidney@jigsaw.nl> wrote:
>> I have written code to extract these positions, and now I want to put
>> them into a Postgres database. Specifically, I want to do this in a
>> way that allows *fast* lookups of positions, e.g. "give me all
>> positions that have a White King on c4 and either a Black Bishop or
>> White Knight on f7".
>
> I would try to use single table with 16 columns like:
> white_pawn char(2)[] -- like {'c1', 'd3', ... }, max 8 elements
> white_rook char(2)[] -- max 2 elements
> white_bishop char(2)[] -- max 2 elements
> white_knight char(2)[] -- max 2 elements
> white_queen char(2)
> white_king char(2)
> black_pawn_1 char(2)[]
> ...
> black_king char(2)
>
> and each column; char(2) and char(2)[] should have btree and GiST
> index respectively. The query should looks like this:
> select * from positions where white_king = 'c4' and (white_bishop &&
> ARRAY['f7'] or white_knight && ARRAY['f7'])
>
> Another alternative might be to use hstore (and GiST index):
> http://www.postgresql.org/docs/9.1/static/hstore.html

yeah -- if you want fast searching of positions (or even games) using
phrases you should immediately be thinking GIST.  GIST can optimize
quals such as  'A contains B' or 'A overlaps B'.   This is a
non-trival but interesting project and I highly encourage you to give
it a go if you're so inclined.  Before banging on the schema, I'd
start thinking about to organize the position into a type such that
you can engineer GIST operations.

merlin

Re: Searchable chess positions in a Postgress DB

From

Gavin Flower

Date:

12 April 2012, 20:27:48

On 11/04/12 21:24, Gavin Flower wrote:

On 11/04/12 19:15, Sidney Cadot wrote:

Dear all,

The "PseudoFenBoard" field currently holds a string describing the
position. For example, the starting position of chess looks like this:

"rnbqkbnr/pppppppp/________/________/________/________/PPPPPPPP/RNBQKBNR"

I am toying around with the ugly idea to make a "Positions" table that
has a single field for each of the squares, e.g.

CREATE TABLE Position2 ( GameIndex INTEGER NOT NULL, PlyIndex INTEGER NOT NULL, a1 "char" NOT NULL, a2 "char" NOT NULL, -- (60 fields defs omitted) h7 "char" NOT NULL, h8 "char" NOT NULL
);

I realize that this design is quite ugly, so I would be interested to
hear if there are nicer alternatives that can perform equally well.

Any ideas for improvement would be greatly appreciated.

How aboutsomething like the following (game and postion would have more fields in practice, like comments and where played)?

DROP TABLE IF EXISTS game CASCADE;

CREATE TABLE game
(
    id int PRIMARY KEY,
    name_white text,
    name_black text,
    played timestamptz
);

CREATE TABLE position
(
   id int PRIMARY KEY,
   game_id int REFERENCES game (id),
   ply int
);

CREATE UNIQUE INDEX square ON piece (rank, file, type, white);

Cheers,
Gavin

There was a blatantly obvious flaw in the above query: the pices checked, should belong to the same position!

That I only discovered the flaw when I mentally checked the SQL on the way to work the folowing day.

The following, hopefully, fixes the problem

SELECT
    p1.position_id
FROM
    piece AS p1 JOIN piece AS p2 USING (position_id)
WHERE
    (
        p1.white
        AND p1.type = 'K'
        AND p1.file = 'c'
        AND p1.rank = '4'
    )
    AND
    (
        ((NOT p2.white AND p2.type = 'B') OR (p2.white AND p2.type = 'K'))
        AND p2.file = 'f'
        AND p2.rank = '7'
    );

Re: Searchable chess positions in a Postgress DB

From

Michael Nolan

Date:

12 April 2012, 21:07:54

On Thu, Apr 12, 2012 at 4:50 PM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:

On 11/04/12 21:24, Gavin Flower wrote:

On 11/04/12 19:15, Sidney Cadot wrote:

Dear all,

As a hobby project, I am toying around with a database containing
about 5 million chess games. On average, these games have about 80
positions (~ 40 moves by both black and white), which means there are
about 400 million chess positions in there.

If you haven't done so already, you should read through the literature on chess and computers. I'm quite a few years out of date, but there's been a lot of research into efficient ways to store and search chess positions, and some of it may have dealt with SQL database structures.
--
Mike Nolan