Re: BUG #14020: row_number() over(partition by order by) - weird behavior - Mailing list pgsql-bugs
| From | Boyko Yordanov | 
|---|---|
| Subject | Re: BUG #14020: row_number() over(partition by order by) - weird behavior | 
| Date | |
| Msg-id | 926E6EDE-8959-4209-B3F0-E894987FD4E0@gmail.com Whole thread Raw  | 
		
| In response to | Re: BUG #14020: row_number() over(partition by order by) - weird behavior ("David G. Johnston" <david.g.johnston@gmail.com>) | 
| Responses | 
                	
            		Re: BUG #14020: row_number() over(partition by order by) - weird behavior
            		
            		 | 
		
| List | pgsql-bugs | 
Hi and thanks for your time on this.
You haven't proven to us that a single row in offers_testing cannot =
match more than one row in offers_past_data.  Assuming a 1-to-many =
situation the update count for offers_past_data can definitely be more =
than the number of rows returned by the sub-query.
It is a one-to-one relationship between the tables as there is a primary =
key on (id, feed) on both tables (which I missed to point out):
Indexes:
    "offers_past_data_id_feed" PRIMARY KEY, btree (id, feed)
Indexes:
    "offers_testing_id_feed" PRIMARY KEY, btree (id, feed)
I assume that this guarantees that a single grossprice change in =
offers_testing where product =3D 2 translates to up to (count(id,feed) =
where product =3D 2) position updates in both offers_testing and =
offers_past_data.
Adding "returning *" to the questionable query, it seems to update rows =
that are not related to product 2 (and on my opinion should not have =
changed positions).
Also, "ORDER BY grossprice" seems inadequate.  The potential for =
duplicates here - which would then make the assignment of row numbers =
within the product partition random - is non-zero and is a quite likely =
source of your problem - along with the probable one-to-many =
relationship between offers_testing and offers_past_data.
Dismissing the one-to-many relationship suggestion as it isn't the case.
Your point on duplicate grossprices is valid, but I believe that if I =
update a single grossprice, even in the case of duplicate grossprices, =
this should not translate in more position updates than the rows in the =
modified product partition. And in offers_testing there are no more than =
148 rows per product partition:
db=3D# select max(partition_count) from (select count(*) over (partition =
by product) as partition_count from offers_testing) sq;
 max
-----
 148
(1 row)
And yet the update query updates 28k records for some reason, most of =
which are outside the modified product partition.
Boyko
--
Boyko
2016-03-15 6:00 GMT+02:00 David G. Johnston <david.g.johnston@gmail.com =
<mailto:david.g.johnston@gmail.com>>:
On Mon, Mar 14, 2016 at 1:43 PM, <b.yordanov2@gmail.com =
<mailto:b.yordanov2@gmail.com>> wrote:
db=3D# update offers_past_data a set position =3D b.position from =
(select id,
feed, row_number() over(partition by product order by grossprice asc) as
position from offers_testing) b where a.id <http://a.id/> =3D b.id =
<http://b.id/> and a.feed =3D b.feed and
a.position <> b.position;
UPDATE 0
=E2=80=8BUpdating offers_past_data
=E2=80=8B=20
This should update every row in offers_past_data when its =E2=80=9Cpositio=
n=E2=80=9D
changes. In the example above no changes were introduced since the last =
run
so nothing is updated (expected).
db=3D# select count(*) from offers_testing where product =3D 2;
 count
-------
    99
(1 row)
So there are 99 offers for product 2.
=E2=80=8BCounting offers_testing=E2=80=8B
Getting a single offer:
db=3D# select id,grossprice from offers_testing where product =3D 2 =
limit 1;
   id    | grossprice
---------+------------
 4127918 |    5000.00
(1 row)
=E2=80=8BCounting offers_testing=E2=80=8B
Updating its grossprice:
db=3D# update offers_testing set grossprice =3D 20 where id =3D 4127918;
UPDATE 1
=E2=80=8BUpdating offers_testing=E2=80=8B
Now when executing the first query again I expect that no more than 99 =
rows
get updated in offers_past_data since this is the maximum amount of
positions that would be affected by offer 4127918 grossprice change.
You haven't proven to us that a single row in offers_testing cannot =
match more than one row in offers_past_data.  Assuming a 1-to-many =
situation the update count for offers_past_data can definitely be more =
than the number of rows returned by the sub-query.
=E2=80=8B=E2=80=8B
db=3D# update offers_past_data a set position =3D b.position from =
(select id,
feed, row_number() over(partition by product order by grossprice asc) as
position from offers_testing) b where a.id <http://a.id/> =3D b.id =
<http://b.id/> and a.feed =3D b.feed and
a.position <> b.position;
UPDATE 104
104 rows get updated.
Executing the same query again a few minutes later (no changes meanwhile =
in
either table):
db=3D# update offers_past_data a set position =3D b.position from =
(select id,
feed, row_number() over(partition by product order by grossprice asc) as
position from offers_testing) b where a.id <http://a.id/> =3D b.id =
<http://b.id/> and a.feed =3D b.feed and
a.position <> b.position;
UPDATE 28058
This time it updates 28058 rows.
This is a test environment and nothing reads or writes to these tables.
Is this a bug or am I missing something obvious?
=E2=80=8BIts likely data related, not a bug.
Using the "UPDATE ... RETURNING *" form should provide good insight.  =
Specifically, look for all rows having the same (id, feed) pair.
Also, "ORDER BY grossprice" seems inadequate.  The potential for =
duplicates here - which would then make the assignment of row numbers =
within the product partition random - is non-zero and is a quite likely =
source of your problem - along with the probable one-to-many =
relationship between offers_testing and offers_past_data.
David J.
=E2=80=8B=20
		
	pgsql-bugs by date: