Select with grouping plan question - Mailing list pgsql-performance
From | Brad Might |
---|---|
Subject | Select with grouping plan question |
Date | |
Msg-id | E387E2E9622FDD408359F98BF183879E222A6A@dc1.storediq.com Whole thread Raw |
Responses |
Re: Select with grouping plan question
|
List | pgsql-performance |
This seems to me to be an expensive plan and I'm wondering if there's a way to improve it or a better way to do what I'm trying to do here (get a count of distinct values for each record_id and map that value to the entity type) entity_type_id_mapping is 56 rows volume_node_entity_data_values is approx 500,000,000 rows vq_record_id has approx 11,000,000 different values vq_entity_type is a value in entity_type_id_mapping.entity_type I thought that the idx_vq_entities_1 index would allow an ordered scan of the table. I created it based pon the sort key given in the explain statement. Thanks in advance. Table "data_schema.volume_queue_entities" Column | Type | Modifiers -----------------+-------------------+---------------------------------- -----------------+-------------------+------------- vq_record_id | bigint | default currval('seq_vq_fsmd_auto'::regclass) vq_entity_type | character varying | vq_entity_value | character varying | Indexes: "idx_vq_entities_1" btree (vq_record_id, vq_entity_type, vq_entity_value) Table "volume_8.entity_type_id_mapping" Column | Type | Modifiers -------------+-------------------+-------------------------------------- -------------+-------------------+-------------------- entity_id | integer | default nextval('volume_8.entity_id_sequence'::regclass) entity_type | character varying | explain insert into volume_8.volume_node_entity_data_values (vs_volume_id, vs_latest_node_synthetic_id, vs_base_entity_id, vs_value, vs_value_count, vs_base_entity_revision_id) select 8, vq_record_id, entity_id , vq_entity_value, count(vq_entity_value),1 from data_schema.volume_queue_entities qe, volume_8.entity_type_id_mapping emap where qe.vq_entity_type = emap.entity_type group by vq_record_id, vq_entity_type, vq_entity_value, entity_id ; ------------------------------------------------------------------------ ---------------------------------------- Subquery Scan "*SELECT*" (cost=184879640.90..210689876.26 rows=543373376 width=60) -> GroupAggregate (cost=184879640.90..199822408.74 rows=543373376 width=37) -> Sort (cost=184879640.90..186238074.34 rows=543373376 width=37) Sort Key: qe.vq_record_id, qe.vq_entity_type, qe.vq_entity_value, emap.entity_id -> Hash Join (cost=1.70..18234833.10 rows=543373376 width=37) Hash Cond: (("outer".vq_entity_type)::text = ("inner".entity_type)::text) -> Seq Scan on volume_queue_entities qe (cost=0.00..10084230.76 rows=543373376 width=33) -> Hash (cost=1.56..1.56 rows=56 width=16) -> Seq Scan on entity_type_id_mapping emap (cost=0.00..1.56 rows=56 width=16) (9 rows)
pgsql-performance by date: