Re: Using ctid column changes plan drastically - Mailing list pgsql-performance
From | Thomas Kellerer |
---|---|
Subject | Re: Using ctid column changes plan drastically |
Date | |
Msg-id | juo9lb$4hm$1@dough.gmane.org Whole thread Raw |
In response to | Re: Using ctid column changes plan drastically (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Using ctid column changes plan drastically
|
List | pgsql-performance |
Tom Lane, 24.07.2012 19:12: > Well, it would only help if you're running a PG version that's new > enough to recognize the NOT EXISTS as an anti-join; and even then, > it's possible that joining on a tid column forecloses enough plan > types that you don't get any real benefit. But I'm just guessing. > Can you show exactly what you tried and what EXPLAIN ANALYZE results > you got? > I am using 9.1.4 (as I said in my initial post). I finally found a solution that runs fine: DELETE FROM dupes a WHERE EXISTS (SELECT 1 FROM dupes b WHERE b.first_name = a.first_name AND b.last_name = a.last_name AND b.ctid > a.ctid); The execution plan for this is: Delete on public.dupes a (cost=14575.95..16978.87 rows=25000 width=12) (actual time=2419.334..2419.334 rows=0 loops=1) Buffers: shared hit=18029 -> Merge Semi Join (cost=14575.95..16978.87 rows=25000 width=12) (actual time=2043.674..2392.707 rows=17097 loops=1) Output: a.ctid, b.ctid Merge Cond: ((a.first_name = b.first_name) AND (a.last_name = b.last_name)) Join Filter: (b.ctid > a.ctid) Buffers: shared hit=930 -> Sort (cost=7287.98..7475.48 rows=75000 width=20) (actual time=1024.195..1030.051 rows=75000 loops=1) Output: a.ctid, a.first_name, a.last_name Sort Key: a.first_name, a.last_name Sort Method: quicksort Memory: 8870kB Buffers: shared hit=465 -> Seq Scan on public.dupes a (cost=0.00..1215.00 rows=75000 width=20) (actual time=0.025..23.234 rows=75000loops=1) Output: a.ctid, a.first_name, a.last_name Buffers: shared hit=465 -> Sort (cost=7287.98..7475.48 rows=75000 width=20) (actual time=1019.148..1028.483 rows=105841 loops=1) Output: b.ctid, b.first_name, b.last_name Sort Key: b.first_name, b.last_name Sort Method: quicksort Memory: 8870kB Buffers: shared hit=465 -> Seq Scan on public.dupes b (cost=0.00..1215.00 rows=75000 width=20) (actual time=0.017..19.133 rows=75000loops=1) Output: b.ctid, b.first_name, b.last_name Buffers: shared hit=465 Total runtime: 2420.953 ms Which is a lot better than the plan using "WHERE ctid NOT IN (.....)": Delete on public.dupes (cost=1777.50..4925055.00 rows=37500 width=6) (actual time=582515.094..582515.094 rows=0 loops=1) Buffers: shared hit=18027 -> Seq Scan on public.dupes (cost=1777.50..4925055.00 rows=37500 width=6) (actual time=1038.164..582332.927 rows=17097loops=1) Output: dupes.ctid Filter: (NOT (SubPlan 1)) Buffers: shared hit=930 SubPlan 1 -> Materialize (cost=1777.50..1890.00 rows=7500 width=20) (actual time=0.001..2.283 rows=35552 loops=75000) Output: (min(b.ctid)), b.first_name, b.last_name Buffers: shared hit=465 -> HashAggregate (cost=1777.50..1852.50 rows=7500 width=20) (actual time=90.964..120.228 rows=57903 loops=1) Output: min(b.ctid), b.first_name, b.last_name Buffers: shared hit=465 -> Seq Scan on public.dupes b (cost=0.00..1215.00 rows=75000 width=20) (actual time=0.008..25.515rows=75000 loops=1) Output: b.ctid, b.first_name, b.last_name Buffers: shared hit=465 Total runtime: 582517.711 ms Using "WHERE id NOT IN (...)" is the fastest way: Delete on public.dupes (cost=1871.25..3273.75 rows=37500 width=6) (actual time=187.949..187.949 rows=0 loops=1) Buffers: shared hit=18490 -> Seq Scan on public.dupes (cost=1871.25..3273.75 rows=37500 width=6) (actual time=125.351..171.108 rows=17097 loops=1) Output: dupes.ctid Filter: (NOT (hashed SubPlan 1)) Buffers: shared hit=930 SubPlan 1 -> HashAggregate (cost=1777.50..1852.50 rows=7500 width=18) (actual time=73.131..93.421 rows=57903 loops=1) Output: min(b.id), b.first_name, b.last_name Buffers: shared hit=465 -> Seq Scan on public.dupes b (cost=0.00..1215.00 rows=75000 width=18) (actual time=0.004..8.515 rows=75000loops=1) Output: b.id, b.first_name, b.last_name Buffers: shared hit=465 Total runtime: 189.222 ms Regards Thomas
pgsql-performance by date: