Thread: Slaves show different results for query

Slaves show different results for query

From

Musall Maik

Date:

20 May 2015, 07:11:20

Hi,

I have a strange case where a SELECT for a primary key returns 0 rows on one slave, while it returns the correct 1 row
onanother slave and on the master. It does however return that row on all slaves when queried with LIKE and trailing or
leadingwildcard. 

psql version is 9.3.5, because that's what comes with Ubuntu 14.04 LTS
Master runs Ubuntu 14.04 LTS
Slave 1 runs also Ubuntu 14.04 LTS
Slave 2 runs Mac OS X 10.7, pgsql installed via homebrew

Both slaves are configured with streaming replication, and I've been using that setup for years, starting with psql
9.1,with no problems so far. Suspecting some weird problem, I already re-initialized slave 2 with a fresh backup and
startedreplication from the beginning, so the database is fresh from a master copy, and is verified to be current. 

2015-05-19 20:53:43.937 CEST LOG:  entering standby mode
2015-05-19 20:53:43.974 CEST LOG:  redo starts at 31/3F000028
2015-05-19 20:53:45.522 CEST LOG:  consistent recovery state reached at 31/40CCE6E8
2015-05-19 20:53:45.523 CEST LOG:  database system is ready to accept read only connections
2015-05-19 20:53:45.604 CEST LOG:  started streaming WAL from primary at 31/41000000 on timeline 1


So here's the query.

    SELECT * FROM MyTable WHERE email = 'foo@example.com';

This returns 1 row on master and slave 1, but 0 on slave 2, while this query:

    SELECT * FROM MyTable WHERE email LIKE 'foo@example.com%';

or this one

    SELECT * FROM MyTable WHERE email LIKE '%foo@example.com';

returns the correct 1 row on all three systems. Note that this works with the wildcard on either end, or also somewhere
inthe middle, doesn't matter. Note: "email" is the primary key on this table. 

This behaviour is the same with any address to be queried, and is also the same on a similar second table. This does
NOToccur on any other table, which all have integer primary keys. There is also no problem when I select for other
attributeson these tables. 

Does anyone have a hint?

Thanks
Maik

Re: Slaves show different results for query

From

Melvin Davidson

Date:

20 May 2015, 12:26:09

First, are your postgresql.conf options the same on master and slave2 with regard to memory allocation and all Planner Method Configuration options?

Next, is it possible you have a corrupted index on slave2.?

I would suggest verifying there is no difference in the query plan between master and slave2.

IE: EXPLAIN SELECT * FROM MyTable WHERE email = 'foo@example.com';

Check your indexes with:

SELECT n.nspname as schema,
       i.relname as table,
       i.indexrelname as index,
       i.idx_scan,
       i.idx_tup_read,
       i.idx_tup_fetch,
       CASE WHEN idx.indisprimary
            THEN 'pkey'
            WHEN idx.indisunique
            THEN 'uidx'
            ELSE 'idx'
            END AS type,
       pg_get_indexdef(idx.indexrelid),
       CASE WHEN idx.indisvalid
            THEN 'valid'
            ELSE 'INVALID'
            END as istatus,
       pg_relation_size(quote_ident(n.nspname)|| '.' || quote_ident(i.relname)) as size_in_bytes,
       pg_size_pretty(pg_relation_size(quote_ident(n.nspname)|| '.' || quote_ident(i.relname))) as size
FROM pg_stat_all_indexes i
JOIN pg_class c ON (c.oid = i.relid)
JOIN pg_namespace n ON (n.oid = c.relnamespace)
JOIN pg_index idx ON (idx.indexrelid = i.indexrelid )
WHERE i.relname = 'MyTable'
ORDER BY 1, 2, 3;

On Wed, May 20, 2015 at 2:52 AM, Musall Maik <lists@musall.de> wrote:

Hi,

I have a strange case where a SELECT for a primary key returns 0 rows on one slave, while it returns the correct 1 row on another slave and on the master. It does however return that row on all slaves when queried with LIKE and trailing or leading wildcard.

psql version is 9.3.5, because that's what comes with Ubuntu 14.04 LTS
Master runs Ubuntu 14.04 LTS
Slave 1 runs also Ubuntu 14.04 LTS
Slave 2 runs Mac OS X 10.7, pgsql installed via homebrew

Both slaves are configured with streaming replication, and I've been using that setup for years, starting with psql 9.1, with no problems so far. Suspecting some weird problem, I already re-initialized slave 2 with a fresh backup and started replication from the beginning, so the database is fresh from a master copy, and is verified to be current.

2015-05-19 20:53:43.937 CEST LOG: entering standby mode
2015-05-19 20:53:43.974 CEST LOG: redo starts at 31/3F000028
2015-05-19 20:53:45.522 CEST LOG: consistent recovery state reached at 31/40CCE6E8
2015-05-19 20:53:45.523 CEST LOG: database system is ready to accept read only connections
2015-05-19 20:53:45.604 CEST LOG: started streaming WAL from primary at 31/41000000 on timeline 1

So here's the query.

SELECT * FROM MyTable WHERE email = 'foo@example.com';

This returns 1 row on master and slave 1, but 0 on slave 2, while this query:

SELECT * FROM MyTable WHERE email LIKE 'foo@example.com%';

or this one

SELECT * FROM MyTable WHERE email LIKE '%foo@example.com';

returns the correct 1 row on all three systems. Note that this works with the wildcard on either end, or also somewhere in the middle, doesn't matter. Note: "email" is the primary key on this table.

This behaviour is the same with any address to be queried, and is also the same on a similar second table. This does NOT occur on any other table, which all have integer primary keys. There is also no problem when I select for other attributes on these tables.

Does anyone have a hint?

Thanks
Maik

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

Re: Slaves show different results for query

From

Musall Maik

Date:

20 May 2015, 12:59:44

Hi Melvin,

thanks for the response.

Am 20.05.2015 um 14:26 schrieb Melvin Davidson <melvin6925@gmail.com>:

First, are your postgresql.conf options the same on master and slave2 with regard to memory allocation and all Planner Method Configuration options?

slave2 has less shared_buffers (256m vs 2048m), temp_buffers (16m vs 128m), work_mem (8m vs 16m) and maintenance_work_mem (8m vs 16m) due to hardware constraints. All other settings are the same and mostly default.

Next, is it possible you have a corrupted index on slave2.?

Unlikely, as I replaced the slave2 db with a fresh backup from master yesterday to rule this out.

I would suggest verifying there is no difference in the query plan between master and slave2.
IE: EXPLAIN SELECT * FROM MyTable WHERE email = 'foo@example.com';

All three (master, slave1, slave2) use the same plan:

Index Scan using mytable_pk on mytable (cost=0.42..8.44 rows=1 width=205)

Index Cond: ((email)::text = 'foo@example.com'::text)

Check your indexes with:

SELECT n.nspname as schema,
       i.relname as table,
       i.indexrelname as index,
       i.idx_scan,
       i.idx_tup_read,
       i.idx_tup_fetch,
       CASE WHEN idx.indisprimary
            THEN 'pkey'
            WHEN idx.indisunique
            THEN 'uidx'
            ELSE 'idx'
            END AS type,
       pg_get_indexdef(idx.indexrelid),
       CASE WHEN idx.indisvalid
            THEN 'valid'
            ELSE 'INVALID'
            END as istatus,
       pg_relation_size(quote_ident(n.nspname)|| '.' || quote_ident(i.relname)) as size_in_bytes,
       pg_size_pretty(pg_relation_size(quote_ident(n.nspname)|| '.' || quote_ident(i.relname))) as size
FROM pg_stat_all_indexes i
JOIN pg_class c ON (c.oid = i.relid)
JOIN pg_namespace n ON (n.oid = c.relnamespace)
JOIN pg_index idx ON (idx.indexrelid = i.indexrelid )
WHERE i.relname = 'MyTable'
ORDER BY 1, 2, 3;

master:

--------+---------+-----------------------------+----------+--------------+---------------+------+-----------------------------------------------------------------------------------+---------+---------------+--------

public | mytable | mytable_emailaddressref_idx | 6 | 11 | 4 | idx | CREATE INDEX mytable_emailaddressref_idx ON mytable USING btree (emailaddressref) | valid | 123609088 | 118 MB

public | mytable | mytable_pk | 1291541 | 1305655 | 1291371 | pkey | CREATE UNIQUE INDEX mytable_pk ON mytable USING btree (email) | valid | 123609088 | 118 MB

public | mytable | mytable_syncstatus_idx | 3710 | 2250428 | 0 | idx | CREATE INDEX mytable_syncstatus_idx ON mytable USING btree (syncstatus) | valid | 123609088 | 118 MB

public | mytable | mytable_userref_idx | 0 | 0 | 0 | idx | CREATE INDEX mytable_userref_idx ON mytable USING btree (userref) | valid | 123609088 | 118 MB

slave1:

public | mytable | mytable_emailaddressref_idx | 0 | 0 | 0 | idx | CREATE INDEX mytable_emailaddressref_idx ON mytable USING btree (emailaddressref) | valid | 123609088 | 118 MB

public | mytable | mytable_pk | 3 | 13 | 3 | pkey | CREATE UNIQUE INDEX mytable_pk ON mytable USING btree (email) | valid | 123609088 | 118 MB

public | mytable | mytable_syncstatus_idx | 0 | 0 | 0 | idx | CREATE INDEX mytable_syncstatus_idx ON mytable USING btree (syncstatus) | valid | 123609088 | 118 MB

public | mytable | mytable_userref_idx | 0 | 0 | 0 | idx | CREATE INDEX mytable_userref_idx ON mytable USING btree (userref) | valid | 123609088 | 118 MB

slave2:

public | mytable | mytable_emailaddressref_idx | 1 | 3 | 1 | idx | CREATE INDEX mytable_emailaddressref_idx ON mytable USING btree (emailaddressref) | valid | 123609088 | 118 MB

public | mytable | mytable_pk | 15 | 0 | 0 | pkey | CREATE UNIQUE INDEX mytable_pk ON mytable USING btree (email) | valid | 123609088 | 118 MB

public | mytable | mytable_syncstatus_idx | 0 | 0 | 0 | idx | CREATE INDEX mytable_syncstatus_idx ON mytable USING btree (syncstatus) | valid | 123609088 | 118 MB

public | mytable | mytable_userref_idx | 0 | 0 | 0 | idx | CREATE INDEX mytable_userref_idx ON mytable USING btree (userref) | valid | 123609088 | 118 MB

Now there are a few different numbers, although I wasn't really successful trying to quickly read up what that means. And I can understand that different stats there can lead to different performance, but I think whatever the stats are and the execution plan is, this must not lead to a false result, right?

Besides, the scales look similar in slave1 and slave2, although the query results are different.

Maik

On Wed, May 20, 2015 at 2:52 AM, Musall Maik <lists@musall.de> wrote:
Hi,

I have a strange case where a SELECT for a primary key returns 0 rows on one slave, while it returns the correct 1 row on another slave and on the master. It does however return that row on all slaves when queried with LIKE and trailing or leading wildcard.

psql version is 9.3.5, because that's what comes with Ubuntu 14.04 LTS
Master runs Ubuntu 14.04 LTS
Slave 1 runs also Ubuntu 14.04 LTS
Slave 2 runs Mac OS X 10.7, pgsql installed via homebrew

Both slaves are configured with streaming replication, and I've been using that setup for years, starting with psql 9.1, with no problems so far. Suspecting some weird problem, I already re-initialized slave 2 with a fresh backup and started replication from the beginning, so the database is fresh from a master copy, and is verified to be current.

2015-05-19 20:53:43.937 CEST LOG: entering standby mode
2015-05-19 20:53:43.974 CEST LOG: redo starts at 31/3F000028
2015-05-19 20:53:45.522 CEST LOG: consistent recovery state reached at 31/40CCE6E8
2015-05-19 20:53:45.523 CEST LOG: database system is ready to accept read only connections
2015-05-19 20:53:45.604 CEST LOG: started streaming WAL from primary at 31/41000000 on timeline 1

So here's the query.

SELECT * FROM MyTable WHERE email = 'foo@example.com';

This returns 1 row on master and slave 1, but 0 on slave 2, while this query:

SELECT * FROM MyTable WHERE email LIKE 'foo@example.com%';

or this one

SELECT * FROM MyTable WHERE email LIKE '%foo@example.com';

returns the correct 1 row on all three systems. Note that this works with the wildcard on either end, or also somewhere in the middle, doesn't matter. Note: "email" is the primary key on this table.

This behaviour is the same with any address to be queried, and is also the same on a similar second table. This does NOT occur on any other table, which all have integer primary keys. There is also no problem when I select for other attributes on these tables.

Does anyone have a hint?

Thanks
Maik

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

Re: Slaves show different results for query

From

Adrian Klaver

Date:

20 May 2015, 13:17:49

On 05/19/2015 11:52 PM, Musall Maik wrote:
> Hi,
>
> I have a strange case where a SELECT for a primary key returns 0 rows on one slave, while it returns the correct 1
rowon another slave and on the master. It does however return that row on all slaves when queried with LIKE and
trailingor leading wildcard. 
>
> psql version is 9.3.5, because that's what comes with Ubuntu 14.04 LTS
> Master runs Ubuntu 14.04 LTS
> Slave 1 runs also Ubuntu 14.04 LTS
> Slave 2 runs Mac OS X 10.7, pgsql installed via homebrew
>
> Both slaves are configured with streaming replication, and I've been using that setup for years, starting with psql
9.1,with no problems so far. Suspecting some weird problem, I already re-initialized slave 2 with a fresh backup and
startedreplication from the beginning, so the database is fresh from a master copy, and is verified to be current. 
>
> 2015-05-19 20:53:43.937 CEST LOG:  entering standby mode
> 2015-05-19 20:53:43.974 CEST LOG:  redo starts at 31/3F000028
> 2015-05-19 20:53:45.522 CEST LOG:  consistent recovery state reached at 31/40CCE6E8
> 2015-05-19 20:53:45.523 CEST LOG:  database system is ready to accept read only connections
> 2015-05-19 20:53:45.604 CEST LOG:  started streaming WAL from primary at 31/41000000 on timeline 1
>
>
> So here's the query.
>
>     SELECT * FROM MyTable WHERE email = 'foo@example.com';
>
> This returns 1 row on master and slave 1, but 0 on slave 2, while this query:
>
>     SELECT * FROM MyTable WHERE email LIKE 'foo@example.com%';
>
> or this one
>
>     SELECT * FROM MyTable WHERE email LIKE '%foo@example.com';
>
> returns the correct 1 row on all three systems. Note that this works with the wildcard on either end, or also
somewherein the middle, doesn't matter. Note: "email" is the primary key on this table. 
>
> This behaviour is the same with any address to be queried, and is also the same on a similar second table. This does
NOToccur on any other table, which all have integer primary keys. There is also no problem when I select for other
attributeson these tables. 
>
> Does anyone have a hint?

What are the encodings on the various machines and in the databases?

>
> Thanks
> Maik
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Slaves show different results for query

From

Musall Maik

Date:

20 May 2015, 13:34:58

> Am 20.05.2015 um 15:17 schrieb Adrian Klaver <adrian.klaver@aklaver.com>:
>
> On 05/19/2015 11:52 PM, Musall Maik wrote:
>> Hi,
>>
>> I have a strange case where a SELECT for a primary key returns 0 rows on one slave, while it returns the correct 1
rowon another slave and on the master. It does however return that row on all slaves when queried with LIKE and
trailingor leading wildcard. 
>>
>> […]

>> Does anyone have a hint?
>
> What are the encodings on the various machines and in the databases?

All encodings UTF8, all collate and ctype en_US.UTF-8. What do you mean by "machines" exactly? This is not dependent on
shellenvironment or something, I get this also via JDBC. 

Maik

Re: Slaves show different results for query

From

Adrian Klaver

Date:

20 May 2015, 13:43:25

On 05/20/2015 06:34 AM, Musall Maik wrote:
>
>> Am 20.05.2015 um 15:17 schrieb Adrian Klaver <adrian.klaver@aklaver.com>:
>>
>> On 05/19/2015 11:52 PM, Musall Maik wrote:
>>> Hi,
>>>
>>> I have a strange case where a SELECT for a primary key returns 0 rows on one slave, while it returns the correct 1
rowon another slave and on the master. It does however return that row on all slaves when queried with LIKE and
trailingor leading wildcard. 
>>>
>>> […]
>
>>> Does anyone have a hint?
>>
>> What are the encodings on the various machines and in the databases?
>
> All encodings UTF8, all collate and ctype en_US.UTF-8. What do you mean by "machines" exactly? This is not dependent
onshell environment or something, 

Actually it is:

http://www.postgresql.org/docs/9.4/interactive/locale.html

"Locale support refers to an application respecting cultural preferences
regarding alphabets, sorting, number formatting, etc. PostgreSQL uses
the standard ISO C and POSIX locale facilities provided by the server
operating system. For additional information refer to the documentation
of your system."

The reason I ask is that the machine you are having problems with has OS
X. Over the years I have seen quite a few reports on this list of OS X
locale/encoding issues.

I get this also via JDBC.
>
> Maik
>
>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Slaves show different results for query

From

Tom Lane

Date:

20 May 2015, 14:05:25

Adrian Klaver <adrian.klaver@aklaver.com> writes:
> The reason I ask is that the machine you are having problems with has OS
> X. Over the years I have seen quite a few reports on this list of OS X
> locale/encoding issues.

Yes.  Here's the problem: OS X UTF8 locales (other than C) don't sort the
same as UTF8 locales on Linux.  Because of this, the index created by the
master is effectively corrupt from the standpoint of the OS X slave; it's
not in the correct sort order.  It might sometimes find the right results
anyway, but often not.

You might be able to get away with the described configuration if you
recreate the database using C locale, but personally I wouldn't risk it.
Masters and slaves in a WAL-shipping replication configuration should be
on substantially identical platforms, else you're just asking for trouble.

            regards, tom lane

Re: Slaves show different results for query

From

Musall Maik

Date:

20 May 2015, 14:10:01

Hi Tom,

that's very useful information. Very cool to get such results so quickly here.

Thanks
Maik


> Am 20.05.2015 um 16:05 schrieb Tom Lane <tgl@sss.pgh.pa.us>:
>
> Adrian Klaver <adrian.klaver@aklaver.com> writes:
>> The reason I ask is that the machine you are having problems with has OS
>> X. Over the years I have seen quite a few reports on this list of OS X
>> locale/encoding issues.
>
> Yes.  Here's the problem: OS X UTF8 locales (other than C) don't sort the
> same as UTF8 locales on Linux.  Because of this, the index created by the
> master is effectively corrupt from the standpoint of the OS X slave; it's
> not in the correct sort order.  It might sometimes find the right results
> anyway, but often not.
>
> You might be able to get away with the described configuration if you
> recreate the database using C locale, but personally I wouldn't risk it.
> Masters and slaves in a WAL-shipping replication configuration should be
> on substantially identical platforms, else you're just asking for trouble.
>
>             regards, tom lane
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general