Thread: Searchable chess positions in a Postgress DB
Dear all, As a hobby project, I am toying around with a database containing about 5 million chess games. On average, these games have about 80 positions (~ 40 moves by both black and white), which means there are about 400 million chess positions in there. I have written code to extract these positions, and now I want to put them into a Postgres database. Specifically, I want to do this in a way that allows *fast* lookups of positions, e.g. "give me all positions that have a White King on c4 and either a Black Bishop or White Knight on f7". Currently, my "Positions" table looks like this: Column | Type | Modifiers -------------------+---------+----------- gameindex | integer | not null plyindex | integer | not null pseudofenboard | text | not null fenside | text | not null fencastling | text | not null fenenpassant | text | not null possiblemovecount | integer | not null isincheck | boolean | not null Indexes: "positions_pkey" PRIMARY KEY, btree (gameindex, plyindex) Foreign-key constraints: "positions_gameindex_fkey" FOREIGN KEY (gameindex) REFERENCES games(gameindex) The "PseudoFenBoard" field currently holds a string describing the position. For example, the starting position of chess looks like this: "rnbqkbnr/pppppppp/________/________/________/________/PPPPPPPP/RNBQKBNR" This design allows me to formulate the kind of positional queries that I want (by using regular expression matching), but executing them will involve a slow, linear traversal of the 400M table rows, which is not desirable. I am toying around with the ugly idea to make a "Positions" table that has a single field for each of the squares, e.g. CREATE TABLE Position2 ( GameIndex INTEGER NOT NULL, PlyIndex INTEGER NOT NULL, a1 "char" NOT NULL, a2 "char" NOT NULL, -- (60 fields defs omitted) h7 "char" NOT NULL, h8 "char" NOT NULL ); This would allow the creation of indices on each of the 64 fields separately, which should help to achieve near-instantaneous position query performance, especially after gathering proper statistics for all the field-specific indices. I realize that this design is quite ugly, so I would be interested to hear if there are nicer alternatives that can perform equally well. Also, above I use the 1-byte "char" type. Is this the only type in PostGres that is guaranteed to be just a single byte, or are there better alternatives? A 13-state enum would be best (listing the 6 white pieces, 6 black pieces, and 'empty' states for every square on the board) but as I understand from the documentation, enums always up take 4 bytes per entry. Any ideas for improvement would be greatly appreciated.
On 11 April 2012 09:15, Sidney Cadot <sidney@jigsaw.nl> wrote: > Dear all, > > As a hobby project, I am toying around with a database containing > about 5 million chess games. On average, these games have about 80 > positions (~ 40 moves by both black and white), which means there are > about 400 million chess positions in there. > > I have written code to extract these positions, and now I want to put > them into a Postgres database. Specifically, I want to do this in a > way that allows *fast* lookups of positions, e.g. "give me all > positions that have a White King on c4 and either a Black Bishop or > White Knight on f7". > > Currently, my "Positions" table looks like this: > > Column | Type | Modifiers > -------------------+---------+----------- > gameindex | integer | not null > plyindex | integer | not null > pseudofenboard | text | not null > fenside | text | not null > fencastling | text | not null > fenenpassant | text | not null > possiblemovecount | integer | not null > isincheck | boolean | not null > Indexes: > "positions_pkey" PRIMARY KEY, btree (gameindex, plyindex) > Foreign-key constraints: > "positions_gameindex_fkey" FOREIGN KEY (gameindex) REFERENCES > games(gameindex) > > The "PseudoFenBoard" field currently holds a string describing the > position. For example, the starting position of chess looks like this: > > "rnbqkbnr/pppppppp/________/________/________/________/PPPPPPPP/RNBQKBNR" > > This design allows me to formulate the kind of positional queries that > I want (by using regular expression matching), but executing them will > involve a slow, linear traversal of the 400M table rows, which is not > desirable. > > I am toying around with the ugly idea to make a "Positions" table that > has a single field for each of the squares, e.g. > > CREATE TABLE Position2 ( > GameIndex INTEGER NOT NULL, > PlyIndex INTEGER NOT NULL, > a1 "char" NOT NULL, > a2 "char" NOT NULL, > -- (60 fields defs omitted) > h7 "char" NOT NULL, > h8 "char" NOT NULL > ); > > This would allow the creation of indices on each of the 64 fields > separately, which should help to achieve near-instantaneous position > query performance, especially after gathering proper statistics for > all the field-specific indices. > > I realize that this design is quite ugly, so I would be interested to > hear if there are nicer alternatives that can perform equally well. > > Also, above I use the 1-byte "char" type. Is this the only type in > PostGres that is guaranteed to be just a single byte, or are there > better alternatives? No, you're using the unlimited length "char" type. You probably mean to use either char(1) or varchar(1). It wouldn't hurt to make those chars a foreign key to a chess-piece table, so that you both define what a certain code means on the chess-board and constrain which codes are available. You could possibly take that a bit further by finding a constraint that limits the number of each piece on the board (exactly 1 white king, up to 2 white towers, up to 8 white pawns, etc), but I have no ready idea of how you could achieve that... > A 13-state enum would be best (listing the 6 > white pieces, 6 black pieces, and 'empty' states for every square on > the board) but as I understand from the documentation, enums always up > take 4 bytes per entry. > Any ideas for improvement would be greatly appreciated. Well, since a chess-board is a matrix, it seems to make sense to describe it as a two-dimensional array of varchar(1)'s. I don't know off the top of my head what the storage requirements for that are though. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
On Wed, Apr 11, 2012 at 09:15:59AM +0200, Sidney Cadot wrote: > Dear all, > > As a hobby project, I am toying around with a database containing > about 5 million chess games. On average, these games have about 80 > positions (~ 40 moves by both black and white), which means there are > about 400 million chess positions in there. Sounds very interesting! > I am toying around with the ugly idea to make a "Positions" table that > has a single field for each of the squares, e.g. > > CREATE TABLE Position2 ( > GameIndex INTEGER NOT NULL, > PlyIndex INTEGER NOT NULL, > a1 "char" NOT NULL, > a2 "char" NOT NULL, > -- (60 fields defs omitted) > h7 "char" NOT NULL, > h8 "char" NOT NULL > ); > > This would allow the creation of indices on each of the 64 fields > separately, which should help to achieve near-instantaneous position > query performance, especially after gathering proper statistics for > all the field-specific indices. > > I realize that this design is quite ugly, so I would be interested to > hear if there are nicer alternatives that can perform equally well. You could instead create 64 partial indexes on the specific position: CREATE INDEX i_a1 ON Positions (substring(PseudoFenBoard FROM 1 FOR 1)) WHERE substring(PseudoFenBoard FROM 1 FOR 1) != ' '; CREATE INDEX i_a2 ON Positions (substring(PseudoFenBoard FROM 2 FOR 1)) WHERE substring(PseudoFenBoard FROM 2 FOR 1) != ' '; ... so that (a) you don't have to split PseudoFenBoard in more than 1 column, and (b) the indexes are smaller, because they don't index rows having that position empty (I am assuming that you never ask for games where a certain position is empty; if you do, then you need to remove the WHERE clause). > Also, above I use the 1-byte "char" type. Is this the only type in > PostGres that is guaranteed to be just a single byte, or are there > better alternatives? A 13-state enum would be best (listing the 6 > white pieces, 6 black pieces, and 'empty' states for every square on > the board) but as I understand from the documentation, enums always up > take 4 bytes per entry. I think using the 1-byte char is a fairly good choice; you could pack up your structure in a smaller bit string, but then you add complexity elsewhere and it might be desirable to keep things simple for now. Regards, Dr. Gianni Ciolli - 2ndQuadrant Italia PostgreSQL Training, Services and Support gianni.ciolli@2ndquadrant.it | www.2ndquadrant.it
Hi!
As a hobby project, I am toying around with a database containing
about 5 million chess games. On average, these games have about 80
positions (~ 40 moves by both black and white), which means there are
about 400 million chess positions in there.
Besides, I'd use a point type to build the chessboard, rather than a traditional chess Letter/Number notation.
Dear all,
As a hobby project, I am toying around with a database containing
about 5 million chess games. On average, these games have about 80
positions (~ 40 moves by both black and white), which means there are
about 400 million chess positions in there.
I have written code to extract these positions, and now I want to put
them into a Postgres database. Specifically, I want to do this in a
way that allows *fast* lookups of positions, e.g. "give me all
positions that have a White King on c4 and either a Black Bishop or
White Knight on f7".
Currently, my "Positions" table looks like this:
Column | Type | Modifiers
-------------------+---------+-----------
gameindex | integer | not null
plyindex | integer | not null
pseudofenboard | text | not null
fenside | text | not null
fencastling | text | not null
fenenpassant | text | not null
possiblemovecount | integer | not null
isincheck | boolean | not null
Indexes:
"positions_pkey" PRIMARY KEY, btree (gameindex, plyindex)
Foreign-key constraints:
"positions_gameindex_fkey" FOREIGN KEY (gameindex) REFERENCES
games(gameindex)
The "PseudoFenBoard" field currently holds a string describing the
position. For example, the starting position of chess looks like this:
"rnbqkbnr/pppppppp/________/________/________/________/PPPPPPPP/RNBQKBNR"
This design allows me to formulate the kind of positional queries that
I want (by using regular expression matching), but executing them will
involve a slow, linear traversal of the 400M table rows, which is not
desirable.
I am toying around with the ugly idea to make a "Positions" table that
has a single field for each of the squares, e.g.
CREATE TABLE Position2 (
GameIndex INTEGER NOT NULL,
PlyIndex INTEGER NOT NULL,
a1 "char" NOT NULL,
a2 "char" NOT NULL,
-- (60 fields defs omitted)
h7 "char" NOT NULL,
h8 "char" NOT NULL
);
This would allow the creation of indices on each of the 64 fields
separately, which should help to achieve near-instantaneous position
query performance, especially after gathering proper statistics for
all the field-specific indices.
I realize that this design is quite ugly, so I would be interested to
hear if there are nicer alternatives that can perform equally well.
Also, above I use the 1-byte "char" type. Is this the only type in
PostGres that is guaranteed to be just a single byte, or are there
better alternatives? A 13-state enum would be best (listing the 6
white pieces, 6 black pieces, and 'empty' states for every square on
the board) but as I understand from the documentation, enums always up
take 4 bytes per entry.
Any ideas for improvement would be greatly appreciated.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
==============================
If Pac-Man had affected us as kids, we'd all be running around in a darkened room munching pills and listening to repetitive music.
Hi, On 11 April 2012 17:15, Sidney Cadot <sidney@jigsaw.nl> wrote: > I have written code to extract these positions, and now I want to put > them into a Postgres database. Specifically, I want to do this in a > way that allows *fast* lookups of positions, e.g. "give me all > positions that have a White King on c4 and either a Black Bishop or > White Knight on f7". I would try to use single table with 16 columns like: white_pawn char(2)[] -- like {'c1', 'd3', ... }, max 8 elements white_rook char(2)[] -- max 2 elements white_bishop char(2)[] -- max 2 elements white_knight char(2)[] -- max 2 elements white_queen char(2) white_king char(2) black_pawn_1 char(2)[] ... black_king char(2) and each column; char(2) and char(2)[] should have btree and GiST index respectively. The query should looks like this: select * from positions where white_king = 'c4' and (white_bishop && ARRAY['f7'] or white_knight && ARRAY['f7']) Another alternative might be to use hstore (and GiST index): http://www.postgresql.org/docs/9.1/static/hstore.html -- Ondrej Ivanic (ondrej.ivanic@gmail.com)
Dear all, As a hobby project, I am toying around with a database containing about 5 million chess games. On average, these games have about 80 positions (~ 40 moves by both black and white), which means there are about 400 million chess positions in there. I have written code to extract these positions, and now I want to put them into a Postgres database. Specifically, I want to do this in a way that allows *fast* lookups of positions, e.g. "give me all positions that have a White King on c4 and either a Black Bishop or White Knight on f7". Currently, my "Positions" table looks like this: Column | Type | Modifiers -------------------+---------+-----------gameindex | integer | not nullplyindex | integer | not nullpseudofenboard | text | not nullfenside | text | not nullfencastling | text | not nullfenenpassant | text | not nullpossiblemovecount | integer | not nullisincheck | boolean | not null Indexes: "positions_pkey" PRIMARY KEY, btree (gameindex, plyindex) Foreign-key constraints: "positions_gameindex_fkey" FOREIGN KEY (gameindex) REFERENCES games(gameindex) The "PseudoFenBoard" field currently holds a string describing the position. For example, the starting position of chess looks like this: "rnbqkbnr/pppppppp/________/________/________/________/PPPPPPPP/RNBQKBNR" This design allows me to formulate the kind of positional queries that I want (by using regular expression matching), but executing them will involve a slow, linear traversal of the 400M table rows, which is not desirable. I am toying around with the ugly idea to make a "Positions" table that has a single field for each of the squares, e.g. CREATE TABLE Position2 ( GameIndex INTEGER NOT NULL, PlyIndex INTEGER NOT NULL, a1 "char" NOT NULL, a2 "char" NOT NULL, -- (60 fields defs omitted) h7 "char" NOT NULL, h8 "char" NOT NULL ); This would allow the creation of indices on each of the 64 fields separately, which should help to achieve near-instantaneous position query performance, especially after gathering proper statistics for all the field-specific indices. I realize that this design is quite ugly, so I would be interested to hear if there are nicer alternatives that can perform equally well. Also, above I use the 1-byte "char" type. Is this the only type in PostGres that is guaranteed to be just a single byte, or are there better alternatives? A 13-state enum would be best (listing the 6 white pieces, 6 black pieces, and 'empty' states for every square on the board) but as I understand from the documentation, enums always up take 4 bytes per entry. Any ideas for improvement would be greatly appreciated.
How aboutsomething like the following (game and postion would have more fields in practice, like comments and where played)?
DROP TABLE IF EXISTS game CASCADE;
CREATE TABLE game
(
id int PRIMARY KEY,
name_white text,
name_black text,
played timestamptz
);
CREATE TABLE position
(
id int PRIMARY KEY,
game_id int REFERENCES game (id),
ply int
);
CREATE TABLE piece
(
id int PRIMARY KEY,
position_id int REFERENCES position (id),
rank char, -- 1...8 from white's perspective
file char, -- a...h
white boolean,
type char -- P.R,N,B,K,Q
);
CREATE UNIQUE INDEX square ON piece (rank, file, type, white);
SELECT
p.position_id
FROM
piece p
WHERE
( p.white
AND p.type = 'K'
AND p.file = 'c'
AND p.rank = '4'
)
AND
(
((NOT p.white AND p.type = 'B') OR (p.white AND p.type = 'K'))
AND p.file = 'f'
AND p.rank = '7'
);
Cheers,
Gavin
Le mercredi 11 avril 2012 09:15:59, Sidney Cadot a écrit : > Dear all, > > As a hobby project, I am toying around with a database containing > about 5 million chess games. On average, these games have about 80 > positions (~ 40 moves by both black and white), which means there are > about 400 million chess positions in there. > > I have written code to extract these positions, and now I want to put > them into a Postgres database. Specifically, I want to do this in a > way that allows *fast* lookups of positions, e.g. "give me all > positions that have a White King on c4 and either a Black Bishop or > White Knight on f7". > > Currently, my "Positions" table looks like this: > > Column | Type | Modifiers > -------------------+---------+----------- > gameindex | integer | not null > plyindex | integer | not null > pseudofenboard | text | not null > fenside | text | not null > fencastling | text | not null > fenenpassant | text | not null > possiblemovecount | integer | not null > isincheck | boolean | not null > Indexes: > "positions_pkey" PRIMARY KEY, btree (gameindex, plyindex) > Foreign-key constraints: > "positions_gameindex_fkey" FOREIGN KEY (gameindex) REFERENCES > games(gameindex) > > The "PseudoFenBoard" field currently holds a string describing the > position. For example, the starting position of chess looks like this: > > "rnbqkbnr/pppppppp/________/________/________/________/PPPPPPPP/RNBQKBNR" > > This design allows me to formulate the kind of positional queries that > I want (by using regular expression matching), but executing them will > involve a slow, linear traversal of the 400M table rows, which is not > desirable. > > I am toying around with the ugly idea to make a "Positions" table that > has a single field for each of the squares, e.g. > > CREATE TABLE Position2 ( > GameIndex INTEGER NOT NULL, > PlyIndex INTEGER NOT NULL, > a1 "char" NOT NULL, > a2 "char" NOT NULL, > -- (60 fields defs omitted) > h7 "char" NOT NULL, > h8 "char" NOT NULL > ); > > This would allow the creation of indices on each of the 64 fields > separately, which should help to achieve near-instantaneous position > query performance, especially after gathering proper statistics for > all the field-specific indices. > > I realize that this design is quite ugly, so I would be interested to > hear if there are nicer alternatives that can perform equally well. > > Also, above I use the 1-byte "char" type. Is this the only type in > PostGres that is guaranteed to be just a single byte, or are there > better alternatives? A 13-state enum would be best (listing the 6 > white pieces, 6 black pieces, and 'empty' states for every square on > the board) but as I understand from the documentation, enums always up > take 4 bytes per entry. > > Any ideas for improvement would be greatly appreciated. I'll go test with BloomFiltering and multiple columns. http://www.sai.msu.su/~megera/wiki/bloom -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation
2012/4/11 Ondrej Ivanič <ondrej.ivanic@gmail.com>: > Hi, > > On 11 April 2012 17:15, Sidney Cadot <sidney@jigsaw.nl> wrote: >> I have written code to extract these positions, and now I want to put >> them into a Postgres database. Specifically, I want to do this in a >> way that allows *fast* lookups of positions, e.g. "give me all >> positions that have a White King on c4 and either a Black Bishop or >> White Knight on f7". > > I would try to use single table with 16 columns like: > white_pawn char(2)[] -- like {'c1', 'd3', ... }, max 8 elements > white_rook char(2)[] -- max 2 elements > white_bishop char(2)[] -- max 2 elements > white_knight char(2)[] -- max 2 elements > white_queen char(2) > white_king char(2) > black_pawn_1 char(2)[] > ... > black_king char(2) > > and each column; char(2) and char(2)[] should have btree and GiST > index respectively. The query should looks like this: > select * from positions where white_king = 'c4' and (white_bishop && > ARRAY['f7'] or white_knight && ARRAY['f7']) > > Another alternative might be to use hstore (and GiST index): > http://www.postgresql.org/docs/9.1/static/hstore.html yeah -- if you want fast searching of positions (or even games) using phrases you should immediately be thinking GIST. GIST can optimize quals such as 'A contains B' or 'A overlaps B'. This is a non-trival but interesting project and I highly encourage you to give it a go if you're so inclined. Before banging on the schema, I'd start thinking about to organize the position into a type such that you can engineer GIST operations. merlin
On 11/04/12 19:15, Sidney Cadot wrote:There was a blatantly obvious flaw in the above query: the pices checked, should belong to the same position!Dear all, As a hobby project, I am toying around with a database containing about 5 million chess games. On average, these games have about 80 positions (~ 40 moves by both black and white), which means there are about 400 million chess positions in there. I have written code to extract these positions, and now I want to put them into a Postgres database. Specifically, I want to do this in a way that allows *fast* lookups of positions, e.g. "give me all positions that have a White King on c4 and either a Black Bishop or White Knight on f7". Currently, my "Positions" table looks like this: Column | Type | Modifiers -------------------+---------+-----------gameindex | integer | not nullplyindex | integer | not nullpseudofenboard | text | not nullfenside | text | not nullfencastling | text | not nullfenenpassant | text | not nullpossiblemovecount | integer | not nullisincheck | boolean | not null Indexes: "positions_pkey" PRIMARY KEY, btree (gameindex, plyindex) Foreign-key constraints: "positions_gameindex_fkey" FOREIGN KEY (gameindex) REFERENCES games(gameindex) The "PseudoFenBoard" field currently holds a string describing the position. For example, the starting position of chess looks like this: "rnbqkbnr/pppppppp/________/________/________/________/PPPPPPPP/RNBQKBNR" This design allows me to formulate the kind of positional queries that I want (by using regular expression matching), but executing them will involve a slow, linear traversal of the 400M table rows, which is not desirable. I am toying around with the ugly idea to make a "Positions" table that has a single field for each of the squares, e.g. CREATE TABLE Position2 ( GameIndex INTEGER NOT NULL, PlyIndex INTEGER NOT NULL, a1 "char" NOT NULL, a2 "char" NOT NULL, -- (60 fields defs omitted) h7 "char" NOT NULL, h8 "char" NOT NULL ); This would allow the creation of indices on each of the 64 fields separately, which should help to achieve near-instantaneous position query performance, especially after gathering proper statistics for all the field-specific indices. I realize that this design is quite ugly, so I would be interested to hear if there are nicer alternatives that can perform equally well. Also, above I use the 1-byte "char" type. Is this the only type in PostGres that is guaranteed to be just a single byte, or are there better alternatives? A 13-state enum would be best (listing the 6 white pieces, 6 black pieces, and 'empty' states for every square on the board) but as I understand from the documentation, enums always up take 4 bytes per entry. Any ideas for improvement would be greatly appreciated.How aboutsomething like the following (game and postion would have more fields in practice, like comments and where played)?
DROP TABLE IF EXISTS game CASCADE;
CREATE TABLE game
(
id int PRIMARY KEY,
name_white text,
name_black text,
played timestamptz
);
CREATE TABLE position
(
id int PRIMARY KEY,
game_id int REFERENCES game (id),
ply int
);
CREATE TABLE piece
(
id int PRIMARY KEY,
position_id int REFERENCES position (id),
rank char, -- 1...8 from white's perspective
file char, -- a...h
white boolean,
type char -- P.R,N,B,K,Q
);
CREATE UNIQUE INDEX square ON piece (rank, file, type, white);
SELECT
p.position_id
FROM
piece p
WHERE
( p.white
AND p.type = 'K'
AND p.file = 'c'
AND p.rank = '4'
)
AND
(
((NOT p.white AND p.type = 'B') OR (p.white AND p.type = 'K'))
AND p.file = 'f'
AND p.rank = '7'
);
Cheers,
Gavin
That I only discovered the flaw when I mentally checked the SQL on the way to work the folowing day.
The following, hopefully, fixes the problem
SELECT
p1.position_id
FROM
piece AS p1 JOIN piece AS p2 USING (position_id)
WHERE
(
p1.white
AND p1.type = 'K'
AND p1.file = 'c'
AND p1.rank = '4'
)
AND
(
((NOT p2.white AND p2.type = 'B') OR (p2.white AND p2.type = 'K'))
AND p2.file = 'f'
AND p2.rank = '7'
);
On 11/04/12 21:24, Gavin Flower wrote:On 11/04/12 19:15, Sidney Cadot wrote:Dear all, As a hobby project, I am toying around with a database containing about 5 million chess games. On average, these games have about 80 positions (~ 40 moves by both black and white), which means there are about 400 million chess positions in there.
--
Mike Nolan