Thread: Too few rows expected by Planner on partitioned tables
Hello,
A description of what you are trying to achieve and what results you expect:
Our database is growing on a daily basis by about 2.5million rows per table (2 at the moment). Because of that, we decided to partition the data, especially, as we are pre-aggregating the data for weeks, months, quarters and years. Every aggregation is stored in a separate partition:
Days: ..._yYYYYmMMd (base data)
Weeks: ..._yYYYYmMMw (aggregated all weeks of the month)
month: ..._yYYYYmMM (aggregated month)
etc.
Our problem is, that the planner always predicts one row to be returned, although only a part of the primary key is queried. This problem exceeds feasibility of performance rapidly - a query only involving a few days already takes dozens of seconds. All tables are analyzed and pg_stats looks reasonable IMHO.
PostgreSQL version number you are running: postgres=# SELECT version();version------------------------------------------------------------------------------------------------------------------PostgreSQL 12.3 (Debian 12.3-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit(1 Zeile)postgres=# SELECT name, current_setting(name), sourcepostgres-# FROM pg_settingspostgres-# WHERE source NOT IN ('default', 'override');postgres=# SELECT name, current_setting(name), sourceFROM pg_settingsWHERE source NOT IN ('default', 'override');name | current_setting | source--------------------------------+-----------------------------------------+----------------------application_name | psql | clientcheckpoint_completion_target | 0.9 | configuration fileclient_encoding | UTF8 | clientcluster_name | 12/main | configuration fileDateStyle | ISO, DMY | configuration filedefault_text_search_config | pg_catalog.german | configuration filedynamic_shared_memory_type | posix | configuration fileeffective_cache_size | 6GB | configuration fileeffective_io_concurrency | 200 | configuration fileenable_partitionwise_aggregate | on | configuration fileenable_partitionwise_join | on | configuration fileexternal_pid_file | /var/run/postgresql/12-main.pid | configuration filelc_messages | de_DE.UTF-8 | configuration filelc_monetary | de_DE.UTF-8 | configuration filelc_numeric | de_DE.UTF-8 | configuration filelc_time | de_DE.UTF-8 | configuration filelisten_addresses | * | configuration filelog_line_prefix | %m [%p] %q%u@%d | configuration filelog_timezone | Etc/UTC | configuration filemaintenance_work_mem | 512MB | configuration filemax_connections | 300 | configuration filemax_parallel_workers | 2 | configuration filemax_stack_depth | 2MB | environment variablemax_wal_size | 2GB | configuration filemax_worker_processes | 2 | configuration filemin_wal_size | 256MB | configuration fileport | 5432 | configuration filerandom_page_cost | 1.1 | configuration fileshared_buffers | 2GB | configuration filessl | on | configuration filessl_cert_file | /etc/ssl/certs/ssl-cert-snakeoil.pem | configuration filessl_key_file | /etc/ssl/private/ssl-cert-snakeoil.key | configuration filestats_temp_directory | /var/run/postgresql/12-main.pg_stat_tmp | configuration filetemp_buffers | 256MB | configuration fileTimeZone | Etc/UTC | configuration fileunix_socket_directories | /var/run/postgresql | configuration filework_mem | 128MB | configuration file(37 Zeilen)Operating system and version: Linux dev 5.4.44-2-pve #1 SMP PVE 5.4.44-2 (Wed, 01 Jul 2020 16:37:57 +0200) x86_64 GNU/Linux
On a quad core virtualized machine with SSD storage and 16GB RAM.
What program you're using to connect to PostgreSQL: psql and IntelliJ
I'm trying to gather as much information as possible and focus just on one of the two tables (the problem persists in both though):-------------------------------------------------------------------------------------------------------
Stucture:CREATE TABLE location_statistics
(
daterange daterange NOT NULL,
spatial_feature_id INTEGER,
visitor_profile_id INTEGER,
activity_type_combination_id INTEGER,
activity_chain_id INTEGER NOT NULL,
visitors REAL,
dwell_time INTEGER,
travel_time INTEGER,
n INTEGER NOT NULL DEFAULT 1,
PRIMARY KEY ( daterange, spatial_feature_id, visitor_profile_id, activity_type_combination_id,
activity_chain_id),
FOREIGN KEY (daterange) REFERENCES dateranges (daterange) ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (spatial_feature_id) REFERENCES spatial_features (id) ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (visitor_profile_id) REFERENCES visitor_profiles (id) ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (activity_type_combination_id) REFERENCES activity_type_combinations (id) ON DELETE RESTRICT ON UPDATE CASCADE,
FOREIGN KEY (activity_chain_id) REFERENCES activity_chains (id) ON DELETE CASCADE ON UPDATE CASCADE
) PARTITION BY LIST (daterange);-------------------------------------------------------------------------------------------------------
Creating of partitions:CREATE OR REPLACE FUNCTION create_partition_tables(additional_dates TEXT[] = NULL)
RETURNS VOID
VOLATILE
LANGUAGE plpgsql
AS
$$
DECLARE
new_partition RECORD;
BEGIN
FOR new_partition IN
(
SELECT for_values_str,
master_table,
partition_name
FROM resolve_existing_partitions((additional_dates))
WHERE NOT existing
)
LOOP
EXECUTE ' CREATE TABLE '
|| new_partition.partition_name
|| ' PARTITION OF '
|| new_partition.master_table
|| ' FOR VALUES IN (' || new_partition.for_values_str || ')';
RAISE NOTICE 'Partition % for % created',new_partition.partition_name, new_partition.master_table;
END LOOP;
END
$$;-------------------------------------------------------------------------------------------------------
Size of table:
SELECT schemaname,relname,n_live_tup
FROM pg_stat_user_tables
where relname like 'location_statistics_y2019m03%'
ORDER BY n_live_tup DESC;
schemaname relname n_live_tupmobility_insights location_statistics_y2019m03d 23569853mobility_insights location_statistics_y2019m03w 19264373mobility_insights location_statistics_y2019m03 18105295-------------------------------------------------------------------------------------------------------
select * from pg_stats
where tablename = 'location_statistics_y2019m03w';
schemaname tablename attname inherited null_frac avg_width n_distinct most_common_vals most_common_freqs histogram_bounds correlation most_common_elems most_common_elem_freqs elem_count_histogrammobility_insights location_statistics_y2019m03w daterange false 0 14 -1mobility_insights location_statistics_y2019m03w spatial_feature_id false 0 4 600 {12675,7869,7867,7892,7915,7963,12677,12683,12237,7909,7868,9478,7914,11309,7913,7911,12509,9510,7962,10547,9559,10471,11782,10590,9552,10554,9527,10488,12680,9546,11330,11409,9595,12293,10845,11469,10531,10467,9525,7927,11115,10541,10544,9509,9515,10637,10486,10859,9703,9591,11195,11657,7878,7938,7910,9560,9565,9532,11016,12435,12525,9578,7973,9558,10536,12650,9516,9547,7871,10537,10923,10812,12546,9574,12454,9511,10435,11840,7926,12540,8187,10469,7935,9504,9536,11203,7964,9484,10534,10538,12391,10888,8237,9501,9517,12516,10927,11102,7985,10527} {0.11813333630561829,0.06599999964237213,0.03723333403468132,0.031433332711458206,0.027033332735300064,0.023233333602547646,0.022333333268761635,0.0212333332747221,0.021166667342185974,0.02083333395421505,0.02033333294093609,0.0201666671782732,0.02006666734814644,0.019200000911951065,0.018833333626389503,0.01823333278298378,0.01510000042617321,0.014333332888782024,0.013633333146572113,0.013399999588727951,0.01146666705608368,0.011300000362098217,0.011233333498239517,0.011033332906663418,0.009666666388511658,0.009233333170413971,0.008433333598077297,0.007966666482388973,0.007966666482388973,0.007466666866093874,0.007300000172108412,0.007199999876320362,0.006566666532307863,0.006500000134110451,0.005799999926239252,0.00570000009611249,0.005166666582226753,0.004833333194255829,0.004766666796058416,0.004666666500270367,0.00423333328217268,0.0041333334520459175,0.004100000020116568,0.003966666758060455,0.0038333332631736994,0.0037666666321456432,0.003700000001117587,0.0035000001080334187,0.003433333244174719,0.0033666666131466627,0.0033333334140479565,0.003100000089034438,0.002933333395048976,0.00286666676402092,0.00283333333209157,0.00283333333209157,0.0026666666381061077,0.0024666667450219393,0.0024333333130925894,0.0024333333130925894,0.0024333333130925894,0.0023333332501351833,0.002266666619107127,0.002266666619107127,0.002266666619107127,0.002266666619107127,0.002233333420008421,0.002233333420008421,0.002199999988079071,0.002199999988079071,0.002199999988079071,0.002166666556149721,0.002166666556149721,0.002133333357051015,0.002099999925121665,0.0020666667260229588,0.0020666667260229588,0.0020666667260229588,0.002033333294093609,0.002033333294093609,0.0019333333475515246,0.0018666667165234685,0.0018333332845941186,0.0018333332845941186,0.0018333332845941186,0.0017999999690800905,0.0017666666535660625,0.0017666666535660625,0.0017666666535660625,0.0017666666535660625,0.0017666666535660625,0.0017333333380520344,0.0017000000225380063,0.0017000000225380063,0.0016666667070239782,0.0016333333915099502,0.0015999999595806003,0.0015999999595806003,0.001500000013038516,0.001500000013038516} {7870,7891,7906,7917,7954,7965,7966,7969,7974,7977,7979,7984,7986,8132,8171,8194,9479,9482,9488,9491,9493,9496,9498,9499,9503,9507,9512,9513,9520,9521,9524,9526,9530,9534,9537,9541,9544,9554,9562,9570,9573,9577,9581,9583,9586,9599,9675,9736,10436,10442,10450,10464,10482,10491,10495,10510,10513,10515,10516,10523,10529,10535,10539,10543,10553,10575,10602,10718,10756,10816,10882,10902,10928,11008,11025,11064,11158,11276,11316,11382,11486,11538,11602,11673,11731,11766,11775,11835,11906,12052,12088,12130,12277,12356,12383,12397,12408,12471,12545,12627,12678} 0.11252771mobility_insights location_statistics_y2019m03w visitor_profile_id false 0 4 9806 {3081,3114,2739,3642,2445,103,1625,1874,4005,2282,1550,3792,5564,750,1526,4427,2993,4881,1498,2682,5345,5601,8210,1613,2407,5019,1944,2266,3690,4529,4354,1218,11605,4126,5453,11698,11988,4207,6935,559,9151,12020,12048,12006,12049,3695,4874,5596,5945,6740,1366,7186,101,2026,5694,9152,4446,5788,8892,9365,11619,12027,871,5943,7567,7936,7939,8653,437,3971,5733,5961,7872,2728,3358,4154,4605,6187,9057,1967,4625,4837,5784,8910,1482,2036,6268,7557,8835,9,576,933,1686,2145,2229,3000,3692,4645,4666,5386} {0.0024666667450219393,0.0023333332501351833,0.002300000051036477,0.002199999988079071,0.0020666667260229588,0.002033333294093609,0.002033333294093609,0.002033333294093609,0.0019666666630655527,0.0019333333475515246,0.0019000000320374966,0.0018666667165234685,0.0018666667165234685,0.0018333332845941186,0.0018333332845941186,0.0018333332845941186,0.0017999999690800905,0.0017333333380520344,0.0016333333915099502,0.0015999999595806003,0.0015666666440665722,0.0015666666440665722,0.0015666666440665722,0.0015333333285525441,0.001500000013038516,0.001466666697524488,0.00143333338201046,0.00143333338201046,0.00143333338201046,0.00143333338201046,0.00139999995008111,0.001366666634567082,0.001366666634567082,0.0013333333190530539,0.0013333333190530539,0.0013333333190530539,0.0013333333190530539,0.0012666666880249977,0.0012666666880249977,0.0012333333725109696,0.0012333333725109696,0.0012333333725109696,0.0012333333725109696,0.0012000000569969416,0.0012000000569969416,0.0011666666250675917,0.0011666666250675917,0.0011666666250675917,0.0011666666250675917,0.0011666666250675917,0.0011333333095535636,0.0011333333095535636,0.0010999999940395355,0.0010999999940395355,0.0010999999940395355,0.0010999999940395355,0.0010666666785255075,0.0010666666785255075,0.0010666666785255075,0.0010666666785255075,0.0010666666785255075,0.0010666666785255075,0.0010333333630114794,0.0010333333630114794,0.0010333333630114794,0.0010333333630114794,0.0010333333630114794,0.0010333333630114794,0.0010000000474974513,0.0010000000474974513,0.0010000000474974513,0.0010000000474974513,0.0010000000474974513,0.0009666666737757623,0.0009666666737757623,0.0009666666737757623,0.0009666666737757623,0.0009666666737757623,0.0009666666737757623,0.0009333333582617342,0.0009333333582617342,0.0009333333582617342,0.0009333333582617342,0.0009333333582617342,0.0008999999845400453,0.0008999999845400453,0.0008999999845400453,0.0008999999845400453,0.0008999999845400453,0.0008666666690260172,0.0008666666690260172,0.0008666666690260172,0.0008666666690260172,0.0008666666690260172,0.0008666666690260172,0.0008666666690260172,0.0008666666690260172,0.0008666666690260172,0.0008666666690260172,0.0008666666690260172} {1,89,222,365,497,628,786,886,987,1108,1200,1320,1459,1584,1677,1812,1953,2080,2183,2306,2436,2581,2690,2798,2871,3018,3138,3294,3391,3525,3678,3783,3917,3992,4097,4253,4362,4442,4564,4693,4788,4897,5045,5157,5285,5414,5520,5630,5722,5843,5941,6041,6217,6444,6683,6892,7117,7330,7544,7730,7906,8076,8273,8471,8645,8789,8931,9063,9227,9378,9519,9610,9657,10667,10998,11483,11760,11960,12181,12262,12336,12440,12519,12629,13608,13782,13974,14116,14278,15670,16742,17892,18814,20657,23107,26119,31244,39466,59333,68728,83799} -0.03462254mobility_insights location_statistics_y2019m03w activity_type_combination_id false 0 4 145 {6,1,8,10,59,28,5,2,67,14,4,11,12,3,9,133,23,90,25,45,92,32,213,37,50,182,71,89,29,33,46,195,61,84,43,17,20,106,18,160,95,137,15,125,203,214,206,218,107,105,143,85,211,27,38,221,126,79,135,217,175,128,42,108,120,159,208,76,130} {0.15360000729560852,0.14463333785533905,0.11789999902248383,0.06403333693742752,0.056533332914114,0.04636666551232338,0.035466667264699936,0.033533334732055664,0.02669999934732914,0.026133334264159203,0.023900000378489494,0.0203000009059906,0.019866665825247765,0.019233332946896553,0.01876666583120823,0.011966666206717491,0.01126666646450758,0.010066666640341282,0.009533333592116833,0.009499999694526196,0.00860000029206276,0.008366666734218597,0.0077666668221354485,0.00706666661426425,0.006899999920278788,0.006866666488349438,0.006599999964237213,0.006466666702181101,0.00566666666418314,0.004999999888241291,0.004533333238214254,0.004533333238214254,0.004333333112299442,0.0041333334520459175,0.004000000189989805,0.0033333334140479565,0.0031999999191612005,0.0031333332881331444,0.0025333333760499954,0.002166666556149721,0.0020666667260229588,0.0016333333915099502,0.0015333333285525441,0.00143333338201046,0.0013000000035390258,0.0013000000035390258,0.0012666666880249977,0.0012666666880249977,0.0011666666250675917,0.0010333333630114794,0.0010333333630114794,0.0009333333582617342,0.0009333333582617342,0.0007666666642762721,0.000733333348762244,0.000733333348762244,0.000699999975040555,0.0006666666595265269,0.0006666666595265269,0.0006666666595265269,0.0006333333440124989,0.0006000000284984708,0.0004666666791308671,0.0004666666791308671,0.0004666666791308671,0.0004333333345130086,0.0004333333345130086,0.000366666674381122,0.000366666674381122} {22,26,36,54,54,54,64,64,70,77,87,88,96,97,98,98,101,112,114,114,118,119,127,127,131,138,145,148,148,151,151,153,153,155,155,155,163,164,164,165,166,169,169,170,170,173,176,180,184,187,187,187,194,194,201,201,201,219,227,227,228,231,231,232,233,233,251,256,272,274,286,303,303,315,324,490} 0.027344994mobility_insights location_statistics_y2019m03w activity_chain_id false 0 4 75638 {5161,5206,5162,5184,5195,5323,5397,5815,6530,5216,7603,6545,5153,6332,6981,7432,5818,5415,5596,7121,7531,5359,5618,5967,6393,7884,14611,21593,355,5325,5986,6407,23475,5213,6039,6385,6621,6849,9910,10026,11114,15860,164,165,200,5165,5262,5890,6043,6231,6659,6950,7251,7284,8228,8456,8923,9212,9851,9886,12203,12983,14685,16472,21550,43,271,307,992,5220,5243,5481,5482,5509,5516,5532,5603,5621,5757,5917,6026,6063,6139,6146,6210,6214,6464,6499,6671,6728,6758,6889,7010,7173,7643,8032,8081,8290,9676,10875} {0.002133333357051015,0.0017999999690800905,0.00143333338201046,0.0011333333095535636,0.0010333333630114794,0.000699999975040555,0.000699999975040555,0.000699999975040555,0.0006666666595265269,0.0006333333440124989,0.0006333333440124989,0.0005000000237487257,0.0004666666791308671,0.0004666666791308671,0.0004666666791308671,0.0004666666791308671,0.0004333333345130086,0.000366666674381122,0.000366666674381122,0.000366666674381122,0.000366666674381122,0.00033333332976326346,0.00033333332976326346,0.00033333332976326346,0.00033333332976326346,0.00033333332976326346,0.00033333332976326346,0.00033333332976326346,0.0003000000142492354,0.0003000000142492354,0.0003000000142492354,0.0003000000142492354,0.0003000000142492354,0.00026666666963137686,0.00026666666963137686,0.00026666666963137686,0.00026666666963137686,0.00026666666963137686,0.00026666666963137686,0.00026666666963137686,0.00026666666963137686,0.00026666666963137686,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503} {16,3832,5935,6980,8254,9534,11187,13024,15280,17910,20278,23752,27191,30933,35166,39736,44912,84588,87937,91731,96462,98710,99978,101481,102822,104232,105743,107178,108599,109896,111309,112882,114244,115636,117258,118951,120523,122033,123500,124882,126475,127916,129472,131137,132751,134476,135966,137506,139103,140651,142235,143923,145489,147256,148803,150223,151772,153331,155019,156745,158504,160131,161734,163321,164954,166505,168223,169899,171482,173009,174615,176117,177796,179595,181180,182924,184591,186335,188152,189909,191799,193278,194998,196949,198845,200761,202607,204272,206366,208030,209664,211457,213181,214854,216416,218122,219912,221852,223592,225495,227061} -0.13226064mobility_insights location_statistics_y2019m03w visitors false 0 4 141556 {2.231728,2.515927,1.690992,2.716124,1.666667,4.006526,4.547657,2.685691,2.042206,2.0369,2.907664,3.202489,3.321924,5,2.21855,0.357143,1.781995,2.773392,2.430318,3.585561,0.251593,0.294118,0.333333,0.416667,0.47619,1.997838,2.901269,3.665649,0.083864,0.166667,0.228721,0.278577,0.284229,0.3125,0.375056,0.833333,2.434593,2.616505,2.744186,2.95092,3.26703,3.7,3.959243} {0.0008999999845400453,0.0007999999797903001,0.0006333333440124989,0.0005666666547767818,0.0005333333392627537,0.0005333333392627537,0.0005333333392627537,0.0005000000237487257,0.0004333333345130086,0.00039999998989515007,0.00039999998989515007,0.000366666674381122,0.000366666674381122,0.000366666674381122,0.00033333332976326346,0.0003000000142492354,0.0003000000142492354,0.0003000000142492354,0.00026666666963137686,0.00026666666963137686,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503,0.00019999999494757503} {2e-06,0.00196,0.003629,0.00529,0.00717,0.00941,0.011622,0.013755,0.016387,0.019173,0.022388,0.02522,0.028369,0.031243,0.03431,0.037177,0.04011,0.043427,0.046591,0.04976,0.052685,0.05561,0.058774333,0.061956,0.065245,0.068608,0.072032,0.0754775,0.078632,0.081756,0.084959,0.088382,0.091822,0.095209,0.098459,0.102495,0.106105,0.109757,0.113244,0.116785,0.120467,0.124337,0.128564,0.132854,0.136804,0.140986,0.145268,0.149572,0.153727,0.157896,0.162,0.166096,0.170477,0.174326,0.178639,0.182968,0.187422,0.191749,0.19638,0.200433,0.205387,0.209918,0.214573,0.218993,0.224327,0.229155,0.234454,0.239658,0.244123,0.249223,0.254667,0.260309,0.265922,0.271871,0.277339,0.283247,0.289332,0.296549,0.303343,0.309744,0.317473,0.325838,0.335268,0.344108,0.352898,0.363003,0.3743145,0.387081,0.401563,0.420192,0.440096,0.461973,0.490929,0.528797,0.574014,0.652174,0.7746,1.056453,1.79342,2.771285,14.935622} 0.010959746mobility_insights location_statistics_y2019m03w dwell_time false 0 4 45441 {84600,82800,3600,4500,5400,8100,22500,24300,19800,85499,6300,7200,20700,23400,28800,3722,9000,15300,21600,10800,9900,10802,17100,79200,85500,11700,13500,14400,18900,25200,12600,16200,18000,83700,900,3672,3785,3885,5395,5803,5882,7227,27000,27900,43200,80100} {0.002199999988079071,0.0010000000474974513,0.0008333333535119891,0.0007999999797903001,0.0006666666595265269,0.0006000000284984708,0.0006000000284984708,0.0006000000284984708,0.0005666666547767818,0.0005666666547767818,0.0004666666791308671,0.0004666666791308671,0.0004666666791308671,0.0004666666791308671,0.0004666666791308671,0.00039999998989515007,0.00039999998989515007,0.00039999998989515007,0.00039999998989515007,0.000366666674381122,0.00033333332976326346,0.00033333332976326346,0.00033333332976326346,0.00033333332976326346,0.00033333332976326346,0.0003000000142492354,0.0003000000142492354,0.0003000000142492354,0.0003000000142492354,0.0003000000142492354,0.00026666666963137686,0.00026666666963137686,0.00026666666963137686,0.00026666666963137686,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356} {901,2191,3633,3768,3915,4052,4205,4339,4491,4656,4827,5001,5185,5397,5634,5858,6082,6301,6551,6807,7083,7382,7726,8047,8396,8763,9194,9619,9983,10422,10807,11222,11641,12068,12558,13041,13493,13974,14398,14902,15401,15892,16457,16919,17431,17930,18442,18975,19508,20077,20672,21238,21709,22227,22779,23430,24002,24556,25239,26011,26758,27547,28312,29178,29973,30780,31617,32484,33460,34584,35745,36979,38294,39664,41203,42960,44652,46476,48492,50223,52200,54421,56359,58815,61658,64739,67538,70443,73060,75490,77594,79466,80991,82197,83188,83999,84836,85406,85738,86091,86400} -0.066642396mobility_insights location_statistics_y2019m03w travel_time false 0 4 11756 {0,5,2700,900,3600,1800,3599,10,425,810,1680,2245} {0.5346666574478149,0.0006666666595265269,0.0005000000237487257,0.0004666666791308671,0.00039999998989515007,0.000366666674381122,0.00026666666963137686,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356,0.00023333333956543356} {2,139,279,423,551,648,752,852,937,1024,1112,1195,1286,1375,1451,1540,1631,1760,1861,1958,2058,2162,2264,2367,2470,2575,2683,2805,2912,3013,3146,3270,3373,3513,3604,3709,3824,3951,4067,4205,4328,4437,4532,4681,4841,5002,5147,5291,5452,5602,5763,5924,6060,6223,6390,6554,6719,6917,7109,7294,7490,7698,7904,8095,8299,8537,8724,8982,9242,9536,9775,10066,10363,10632,10933,11273,11643,12014,12368,12776,13176,13580,14021,14450,14922,15462,15934,16468,17097,17693,18538,19456,20254,21245,22403,23780,25470,27648,31072,36178,62080} 0.31811374mobility_insights location_statistics_y2019m03w n false 0 4 7 {1,2,3,4,5,6,7} {0.9218999743461609,0.04879999905824661,0.014600000344216824,0.0075333332642912865,0.0038666666951030493,0.0026000000070780516,0.000699999975040555} 0.85469824-------------------------------------------------------------------------------------------------------
Query:EXPLAIN ( ANALYZE , BUFFERS )QUERY PLANSELECT sum(visitors * n)
FROM location_statistics st
WHERE st.daterange = '[2019-03-04,2019-03-11)'::DATERANGE
AND spatial_feature_id = 12675Aggregate (cost=2.79..2.80 rows=1 width=8) (actual time=143.073..143.073 rows=1 loops=1)Buffers: shared hit=67334-> Index Scan using location_statistics_y2019m03w_pkey on location_statistics_y2019m03w st (cost=0.56..2.78 rows=1 width=8) (actual time=0.026..117.284 rows=516277 loops=1)Index Cond: ((daterange = '[2019-03-04,2019-03-11)'::daterange) AND (spatial_feature_id = 12675))Buffers: shared hit=67334Planning Time: 0.082 msExecution Time: 143.095 ms
For completeness sake:EXPLAIN (ANALYZE , BUFFERS)QUERY PLAN
SELECT sum(visitors * n)
FROM location_statistics_y2019m03w st
WHERE st.daterange = '[2019-03-04,2019-03-11)'::DATERANGE
AND spatial_feature_id = 12675Aggregate (cost=2.79..2.80 rows=1 width=8) (actual time=156.304..156.305 rows=1 loops=1)Buffers: shared hit=66602 read=732-> Index Scan using location_statistics_y2019m03w_pkey on location_statistics_y2019m03w st (cost=0.56..2.78 rows=1 width=8) (actual time=0.194..111.464 rows=516277 loops=1)Index Cond: ((daterange = '[2019-03-04,2019-03-11)'::daterange) AND (spatial_feature_id = 12675))Buffers: shared hit=66602 read=732Planning Time: 0.058 msExecution Time: 156.326 ms
As can be seen, the planner predicts one row to be returned, although it should be around 3% (11% of the entries are of the given ID, which are distributed over 4 weeks = date ranges) of the table. Using the partition table directly, does not change this fact.
How can I solve this problem?
Thank you very much in advance.
![]() | Julian P. Wolf | Invenium Data Insights GmbH julian.wolf@invenium.io | +43 664 88 199 013 Herrengasse 28 | 8010 Graz | www.invenium.io |
On Tue, Jul 21, 2020 at 01:09:22PM +0000, Julian Wolf wrote: > Our problem is, that the planner always predicts one row to be returned, although only a part of the primary key is queried.This problem exceeds feasibility of performance rapidly - a query only involving a few days already takes dozensof seconds. All tables are analyzed and pg_stats looks reasonable IMHO. > daterange daterange NOT NULL, > spatial_feature_id INTEGER, > Aggregate (cost=2.79..2.80 rows=1 width=8) (actual time=143.073..143.073 rows=1 loops=1) > Buffers: shared hit=67334 > -> Index Scan using location_statistics_y2019m03w_pkey on location_statistics_y2019m03w st (cost=0.56..2.78 rows=1width=8) (actual time=0.026..117.284 rows=516277 loops=1) > Index Cond: ((daterange = '[2019-03-04,2019-03-11)'::daterange) AND (spatial_feature_id = 12675)) > Buffers: shared hit=67334 > > As can be seen, the planner predicts one row to be returned, although it should be around 3% (11% of the entries are ofthe given ID, which are distributed over 4 weeks = date ranges) of the table. Using the partition table directly, doesnot change this fact. Is there a correlation between daterange and spacial_feature_id ? Are the estimates good if you query on *only* daterange? spacial_feature_id ? Maybe what you need is: https://www.postgresql.org/docs/devel/sql-createstatistics.html CREATE STATISTICS stats (dependencies) ON daterange, spacial_feature_id FROM location_statistics; ANALYZE location_statistics; -- Justin
Hello Justin,
thank you very much for your fast response.
> Is there a correlation between daterange and spacial_feature_id ?
I am not entirely sure, what you mean by that. Basically, no, they are not correlated - spatial features are places on a map, date ranges are time periods. But, as they are both part of a primary key in this particular table, they are correlated in some way as to be a part of uniquely identifying a row.
> Are the estimates good if you query on *only* daterange? spacial_feature_id ?
Unfortunately no, they are not:
------------------------------------------------------------------------------------------------------------------------------------------------
EXPLAIN (ANALYZE , BUFFERS)
SELECT sum(visitors * n)
FROM location_statistics st
WHERE st.daterange = '[2019-03-04,2019-03-11)'::DATERANGE
QUERY PLAN
Aggregate (cost=2.79..2.80 rows=1 width=8) (actual time=1143.393..1143.393 rows=1 loops=1)
Buffers: shared hit=304958
-> Index Scan using location_statistics_y2019m03w_pkey on location_statistics_y2019m03w st (cost=0.56..2.78 rows=1 width=8) (actual time=0.024..931.645 rows=4296639 loops=1)
Index Cond: (daterange = '[2019-03-04,2019-03-11)'::daterange)
Buffers: shared hit=304958
Planning Time: 0.080 ms
Execution Time: 1143.421 ms
------------------------------------------------------------------------------------------------------------------------------------------------
EXPLAIN (ANALYZE , BUFFERS)
SELECT sum(visitors * n)
FROM location_statistics_y2019m03w st
WHERE st.daterange = '[2019-03-04,2019-03-11)'::DATERANGE
QUERY PLAN
Aggregate (cost=2.79..2.80 rows=1 width=8) (actual time=1126.819..1126.820 rows=1 loops=1)
Buffers: shared hit=304958
-> Index Scan using location_statistics_y2019m03w_pkey on location_statistics_y2019m03w st (cost=0.56..2.78 rows=1 width=8) (actual time=0.023..763.852 rows=4296639 loops=1)
Index Cond: (daterange = '[2019-03-04,2019-03-11)'::daterange)
Buffers: shared hit=304958
Planning Time: 0.046 ms
Execution Time: 1126.845 ms
------------------------------------------------------------------------------------------------------------------------------------------------
Checking only on the spatial_feature is not the same query, as the table contains 4 different date ranges. Furthermore, there is no index for this operation. Because of that, I can only invoke this query on one partition, otherwise the query would take days.
EXPLAIN (ANALYZE , BUFFERS)
SELECT sum(visitors * n)
FROM location_statistics_y2019m03w st
WHERE spatial_feature_id = 12675
QUERY PLAN
Finalize Aggregate (cost=288490.25..288490.26 rows=1 width=8) (actual time=1131.593..1131.593 rows=1 loops=1)
Buffers: shared hit=40156 read=139887
-> Gather (cost=288490.03..288490.24 rows=2 width=8) (actual time=1131.499..1148.872 rows=2 loops=1)
Workers Planned: 2
Workers Launched: 1
Buffers: shared hit=40156 read=139887
-> Partial Aggregate (cost=287490.03..287490.04 rows=1 width=8) (actual time=1118.578..1118.579 rows=1 loops=2)
Buffers: shared hit=40156 read=139887
-> Parallel Seq Scan on location_statistics_y2019m03w st (cost=0.00..280378.27 rows=948235 width=8) (actual time=3.544..1032.899 rows=1134146 loops=2)
Filter: (spatial_feature_id = 12675)
Rows Removed by Filter: 8498136
Buffers: shared hit=40156 read=139887
Planning Time: 0.218 ms
JIT:
Functions: 12
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 0.929 ms, Inlining 0.000 ms, Optimization 0.426 ms, Emission 6.300 ms, Total 7.655 ms
Execution Time: 1191.741 ms
The estimates seem to be good though.
Thanks in Advance
Julian
![]() | Julian P. Wolf | Invenium Data Insights GmbH julian.wolf@invenium.io | +43 664 88 199 013 Herrengasse 28 | 8010 Graz | www.invenium.io |
From: Justin Pryzby <pryzby@telsasoft.com>
Sent: Tuesday, July 21, 2020 7:27 PM
To: Julian Wolf <julian.wolf@invenium.io>
Cc: pgsql-performance Postgres Mailing List <pgsql-performance@lists.postgresql.org>
Subject: Re: Too few rows expected by Planner on partitioned tables
Sent: Tuesday, July 21, 2020 7:27 PM
To: Julian Wolf <julian.wolf@invenium.io>
Cc: pgsql-performance Postgres Mailing List <pgsql-performance@lists.postgresql.org>
Subject: Re: Too few rows expected by Planner on partitioned tables
On Tue, Jul 21, 2020 at 01:09:22PM +0000, Julian Wolf wrote:
> Our problem is, that the planner always predicts one row to be returned, although only a part of the primary key is queried. This problem exceeds feasibility of performance rapidly - a query only involving a few days already takes dozens of seconds. All tables are analyzed and pg_stats looks reasonable IMHO.
> daterange daterange NOT NULL,
> spatial_feature_id INTEGER,
> Aggregate (cost=2.79..2.80 rows=1 width=8) (actual time=143.073..143.073 rows=1 loops=1)
> Buffers: shared hit=67334
> -> Index Scan using location_statistics_y2019m03w_pkey on location_statistics_y2019m03w st (cost=0.56..2.78 rows=1 width=8) (actual time=0.026..117.284 rows=516277 loops=1)
> Index Cond: ((daterange = '[2019-03-04,2019-03-11)'::daterange) AND (spatial_feature_id = 12675))
> Buffers: shared hit=67334
>
> As can be seen, the planner predicts one row to be returned, although it should be around 3% (11% of the entries are of the given ID, which are distributed over 4 weeks = date ranges) of the table. Using the partition table directly, does not change this fact.
Is there a correlation between daterange and spacial_feature_id ?
Are the estimates good if you query on *only* daterange? spacial_feature_id ?
Maybe what you need is:
https://www.postgresql.org/docs/devel/sql-createstatistics.html
CREATE STATISTICS stats (dependencies) ON daterange, spacial_feature_id FROM location_statistics;
ANALYZE location_statistics;
--
Justin
> Our problem is, that the planner always predicts one row to be returned, although only a part of the primary key is queried. This problem exceeds feasibility of performance rapidly - a query only involving a few days already takes dozens of seconds. All tables are analyzed and pg_stats looks reasonable IMHO.
> daterange daterange NOT NULL,
> spatial_feature_id INTEGER,
> Aggregate (cost=2.79..2.80 rows=1 width=8) (actual time=143.073..143.073 rows=1 loops=1)
> Buffers: shared hit=67334
> -> Index Scan using location_statistics_y2019m03w_pkey on location_statistics_y2019m03w st (cost=0.56..2.78 rows=1 width=8) (actual time=0.026..117.284 rows=516277 loops=1)
> Index Cond: ((daterange = '[2019-03-04,2019-03-11)'::daterange) AND (spatial_feature_id = 12675))
> Buffers: shared hit=67334
>
> As can be seen, the planner predicts one row to be returned, although it should be around 3% (11% of the entries are of the given ID, which are distributed over 4 weeks = date ranges) of the table. Using the partition table directly, does not change this fact.
Is there a correlation between daterange and spacial_feature_id ?
Are the estimates good if you query on *only* daterange? spacial_feature_id ?
Maybe what you need is:
https://www.postgresql.org/docs/devel/sql-createstatistics.html
CREATE STATISTICS stats (dependencies) ON daterange, spacial_feature_id FROM location_statistics;
ANALYZE location_statistics;
--
Justin
On Wed, Jul 22, 2020 at 06:33:17AM +0000, Julian Wolf wrote: > Hello Justin, > > > thank you very much for your fast response. > > > Is there a correlation between daterange and spacial_feature_id ? > > I am not entirely sure, what you mean by that. Basically, no, they are not correlated - spatial features are places ona map, date ranges are time periods. But, as they are both part of a primary key in this particular table, they are correlatedin some way as to be a part of uniquely identifying a row. > > > > Are the estimates good if you query on *only* daterange? spacial_feature_id ? > Unfortunately no, they are not: I checked and found that range types don't have "normal" statistics, and in particular seem to use a poor ndistinct estimate.. /* Estimate that non-null values are unique */ stats->stadistinct = -1.0 * (1.0 - stats->stanullfrac); You could try to cheat and hardcode a different ndistinct that's "less wrong" by doing something like this: ALTER TABLE t ALTER a SET (N_DISTINCT=-0.001); ANALYZE t; Maybe a better way is to create an index ON: lower(range),upper(range) And then query: WHERE (lower(a),upper(a)) = (1,112); Since you'd be storing the values separately in the index anyway, maybe this means that range types won't work well for you for primary, searchable columns. But if you're stuck with the schema, another kludge, if you want to do something extra weird, is to remove statistics entirely by disabling autoanalyze on the table and then manually run ANALYZE(columns) where columns doesn't include the range column. You'd have to remove the stats: begin; DELETE FROM pg_statistic s USING pg_attribute a WHERE s.staattnum=a.attnum AND s.starelid=a.attrelid AND starelid='t'::regclassAND a.attname='a'; -- Justin
On Tue, Jul 21, 2020 at 01:09:22PM +0000, Julian Wolf wrote: > Hello, > > A description of what you are trying to achieve and what results you expect: > Our database is growing on a daily basis by about 2.5million rows per table (2 at the moment). Because of that, we decidedto partition the data, especially, as we are pre-aggregating the data for weeks, months, quarters and years. Everyaggregation is stored in a separate partition: > ... > Our problem is, that the planner always predicts one row to be returned, although only a part of the primary key is queried.This problem exceeds feasibility of performance rapidly - a query only involving a few days already takes dozensof seconds. All tables are analyzed and pg_stats looks reasonable IMHO. ... > PRIMARY KEY ( daterange, spatial_feature_id, visitor_profile_id, activity_type_combination_id, > activity_chain_id), ... > ) PARTITION BY LIST (daterange); > schemaname relname n_live_tup > mobility_insights location_statistics_y2019m03d 23569853 > mobility_insights location_statistics_y2019m03w 19264373 > mobility_insights location_statistics_y2019m03 18105295 > Aggregate (cost=2.79..2.80 rows=1 width=8) (actual time=143.073..143.073 rows=1 loops=1) > Buffers: shared hit=67334 > -> Index Scan using location_statistics_y2019m03w_pkey on location_statistics_y2019m03w st (cost=0.56..2.78 rows=1width=8) (actual time=0.026..117.284 rows=516277 loops=1) > Index Cond: ((daterange = '[2019-03-04,2019-03-11)'::daterange) AND (spatial_feature_id = 12675)) > Buffers: shared hit=67334 I guess this isn't actually the problem query, since it takes 143ms and not dozens of seconds. I don't know what is the problem query, but maybe it might help to create an new index on spatial_feature_id, which could be scanned rather than scanning the unique index. Also, if daterange *and* spatial_feature_id are always *both* included, then this might work: postgres=# CREATE STATISTICS t_stats (mcv) ON daterange,spatial_feature_id FROM t ; ANALYZE t; -- Justin
Hi,
Thank you very much for your answers and sorry for the delayed response.
> I checked and found that range types don't have "normal" statistics, and in
> particular seem to use a poor ndistinct estimate..
> /* Estimate that non-null values are unique */
> stats->stadistinct = -1.0 * (1.0 - stats->stanullfrac);
I investigated this idea and played around with the n_distinct value and you are absolutely right, the statistics do behave strangely with range types. Even creating statistics
(CREATE STATISTICS t_stats (mcv) ON daterange,spatial_feature_id FROM t ; ANALYZE t;)
doesn't change the fact.
I do get that range types were created with GIST and range comparison in mind, but as they are a really neat way to describe not only a date but also granularity dependency (i.e. "this data represent this exact week"), it would be really nice, if these data types would work with primary keys and thus b-tree too.
In my case, I switched the daterange type with a BIGINT, which holds the exact same information on byte level. This value can then be immutably converted back to daterange and vice versa. This solved the problem for me.
Thank you very much for your time and help.
Best Regards
![]() | Julian P. Wolf | Invenium Data Insights GmbH julian.wolf@invenium.io | +43 664 88 199 013 Herrengasse 28 | 8010 Graz | www.invenium.io |
From: Justin Pryzby <pryzby@telsasoft.com>
Sent: Wednesday, July 22, 2020 4:40 PM
To: Julian Wolf <julian.wolf@invenium.io>
Cc: pgsql-performance Postgres Mailing List <pgsql-performance@lists.postgresql.org>
Subject: Re: Too few rows expected by Planner on partitioned tables
Sent: Wednesday, July 22, 2020 4:40 PM
To: Julian Wolf <julian.wolf@invenium.io>
Cc: pgsql-performance Postgres Mailing List <pgsql-performance@lists.postgresql.org>
Subject: Re: Too few rows expected by Planner on partitioned tables
On Tue, Jul 21, 2020 at 01:09:22PM +0000, Julian Wolf wrote:
> Hello,
>
> A description of what you are trying to achieve and what results you expect:
> Our database is growing on a daily basis by about 2.5million rows per table (2 at the moment). Because of that, we decided to partition the data, especially, as we are pre-aggregating the data for weeks, months, quarters and years. Every aggregation is stored in a separate partition:
>
...
> Our problem is, that the planner always predicts one row to be returned, although only a part of the primary key is queried. This problem exceeds feasibility of performance rapidly - a query only involving a few days already takes dozens of seconds. All tables are analyzed and pg_stats looks reasonable IMHO.
...
> PRIMARY KEY ( daterange, spatial_feature_id, visitor_profile_id, activity_type_combination_id,
> activity_chain_id),
...
> ) PARTITION BY LIST (daterange);
> schemaname relname n_live_tup
> mobility_insights location_statistics_y2019m03d 23569853
> mobility_insights location_statistics_y2019m03w 19264373
> mobility_insights location_statistics_y2019m03 18105295
> Aggregate (cost=2.79..2.80 rows=1 width=8) (actual time=143.073..143.073 rows=1 loops=1)
> Buffers: shared hit=67334
> -> Index Scan using location_statistics_y2019m03w_pkey on location_statistics_y2019m03w st (cost=0.56..2.78 rows=1 width=8) (actual time=0.026..117.284 rows=516277 loops=1)
> Index Cond: ((daterange = '[2019-03-04,2019-03-11)'::daterange) AND (spatial_feature_id = 12675))
> Buffers: shared hit=67334
I guess this isn't actually the problem query, since it takes 143ms and not
dozens of seconds. I don't know what is the problem query, but maybe it might
help to create an new index on spatial_feature_id, which could be scanned
rather than scanning the unique index.
Also, if daterange *and* spatial_feature_id are always *both* included, then
this might work:
postgres=# CREATE STATISTICS t_stats (mcv) ON daterange,spatial_feature_id FROM t ; ANALYZE t;
--
Justin
> Hello,
>
> A description of what you are trying to achieve and what results you expect:
> Our database is growing on a daily basis by about 2.5million rows per table (2 at the moment). Because of that, we decided to partition the data, especially, as we are pre-aggregating the data for weeks, months, quarters and years. Every aggregation is stored in a separate partition:
>
...
> Our problem is, that the planner always predicts one row to be returned, although only a part of the primary key is queried. This problem exceeds feasibility of performance rapidly - a query only involving a few days already takes dozens of seconds. All tables are analyzed and pg_stats looks reasonable IMHO.
...
> PRIMARY KEY ( daterange, spatial_feature_id, visitor_profile_id, activity_type_combination_id,
> activity_chain_id),
...
> ) PARTITION BY LIST (daterange);
> schemaname relname n_live_tup
> mobility_insights location_statistics_y2019m03d 23569853
> mobility_insights location_statistics_y2019m03w 19264373
> mobility_insights location_statistics_y2019m03 18105295
> Aggregate (cost=2.79..2.80 rows=1 width=8) (actual time=143.073..143.073 rows=1 loops=1)
> Buffers: shared hit=67334
> -> Index Scan using location_statistics_y2019m03w_pkey on location_statistics_y2019m03w st (cost=0.56..2.78 rows=1 width=8) (actual time=0.026..117.284 rows=516277 loops=1)
> Index Cond: ((daterange = '[2019-03-04,2019-03-11)'::daterange) AND (spatial_feature_id = 12675))
> Buffers: shared hit=67334
I guess this isn't actually the problem query, since it takes 143ms and not
dozens of seconds. I don't know what is the problem query, but maybe it might
help to create an new index on spatial_feature_id, which could be scanned
rather than scanning the unique index.
Also, if daterange *and* spatial_feature_id are always *both* included, then
this might work:
postgres=# CREATE STATISTICS t_stats (mcv) ON daterange,spatial_feature_id FROM t ; ANALYZE t;
--
Justin
Hi Justin,
thank you very much for your help and sorry for the late answer.
After testing around with your suggestions, it actually was the daterange type which caused all the problems. Messing around with the statistics value improved performance drastically but did not solve the problem. We decided to replace the daterange type with a BIGINT and calculate the "id" of the daterange by just using the BIGINT (2x 4 bytes) representation of the daterange. Thus, it can be transformed in both directions immutably.
CREATE OR REPLACE FUNCTION to_daterange_id(daterange DATERANGE)
RETURNS BIGINT
IMMUTABLE
LANGUAGE plpgsql
AS
$$
BEGIN
return (extract(EPOCH FROM lower(daterange))::BIGINT << 32) |
extract(EPOCH FROM upper(daterange))::BIGINT;
end;
--------------------------------------------------------------------------------------------------------------
CREATE OR REPLACE FUNCTION to_daterange(daterange_id BIGINT)
RETURNS DATERANGE
IMMUTABLE
LANGUAGE plpgsql
AS
$$
BEGIN
RETURN daterange(to_timestamp(daterange_id >> 32)::DATE, to_timestamp(daterange_id & x'FFFFFFFF'::BIGINT)::DATE);
END;
$$;
So there is no daterange object messing up the primary key index. Your other suggestions sadly didn't work, as the daterange was the partition key of the table too, this field was inevitably the first criterion in all queries and thus overruled every other index.
With that said and done, it would be nice, if daterange objects could be used in unique indexes too. They are a great way to identify data which represents a week, month, etc. worth of data (similar to a two-column-date representation).
Thank you very much again for your time and help
Julian
![]() | Julian P. Wolf | Invenium Data Insights GmbH julian.wolf@invenium.io | +43 664 88 199 013 Herrengasse 28 | 8010 Graz | www.invenium.io |
From: Justin Pryzby <pryzby@telsasoft.com>
Sent: Wednesday, July 22, 2020 4:40 PM
To: Julian Wolf <julian.wolf@invenium.io>
Cc: pgsql-performance Postgres Mailing List <pgsql-performance@lists.postgresql.org>
Subject: Re: Too few rows expected by Planner on partitioned tables
Sent: Wednesday, July 22, 2020 4:40 PM
To: Julian Wolf <julian.wolf@invenium.io>
Cc: pgsql-performance Postgres Mailing List <pgsql-performance@lists.postgresql.org>
Subject: Re: Too few rows expected by Planner on partitioned tables
On Tue, Jul 21, 2020 at 01:09:22PM +0000, Julian Wolf wrote:
> Hello,
>
> A description of what you are trying to achieve and what results you expect:
> Our database is growing on a daily basis by about 2.5million rows per table (2 at the moment). Because of that, we decided to partition the data, especially, as we are pre-aggregating the data for weeks, months, quarters and years. Every aggregation is stored in a separate partition:
>
...
> Our problem is, that the planner always predicts one row to be returned, although only a part of the primary key is queried. This problem exceeds feasibility of performance rapidly - a query only involving a few days already takes dozens of seconds. All tables are analyzed and pg_stats looks reasonable IMHO.
...
> PRIMARY KEY ( daterange, spatial_feature_id, visitor_profile_id, activity_type_combination_id,
> activity_chain_id),
...
> ) PARTITION BY LIST (daterange);
> schemaname relname n_live_tup
> mobility_insights location_statistics_y2019m03d 23569853
> mobility_insights location_statistics_y2019m03w 19264373
> mobility_insights location_statistics_y2019m03 18105295
> Aggregate (cost=2.79..2.80 rows=1 width=8) (actual time=143.073..143.073 rows=1 loops=1)
> Buffers: shared hit=67334
> -> Index Scan using location_statistics_y2019m03w_pkey on location_statistics_y2019m03w st (cost=0.56..2.78 rows=1 width=8) (actual time=0.026..117.284 rows=516277 loops=1)
> Index Cond: ((daterange = '[2019-03-04,2019-03-11)'::daterange) AND (spatial_feature_id = 12675))
> Buffers: shared hit=67334
I guess this isn't actually the problem query, since it takes 143ms and not
dozens of seconds. I don't know what is the problem query, but maybe it might
help to create an new index on spatial_feature_id, which could be scanned
rather than scanning the unique index.
Also, if daterange *and* spatial_feature_id are always *both* included, then
this might work:
postgres=# CREATE STATISTICS t_stats (mcv) ON daterange,spatial_feature_id FROM t ; ANALYZE t;
--
Justin
> Hello,
>
> A description of what you are trying to achieve and what results you expect:
> Our database is growing on a daily basis by about 2.5million rows per table (2 at the moment). Because of that, we decided to partition the data, especially, as we are pre-aggregating the data for weeks, months, quarters and years. Every aggregation is stored in a separate partition:
>
...
> Our problem is, that the planner always predicts one row to be returned, although only a part of the primary key is queried. This problem exceeds feasibility of performance rapidly - a query only involving a few days already takes dozens of seconds. All tables are analyzed and pg_stats looks reasonable IMHO.
...
> PRIMARY KEY ( daterange, spatial_feature_id, visitor_profile_id, activity_type_combination_id,
> activity_chain_id),
...
> ) PARTITION BY LIST (daterange);
> schemaname relname n_live_tup
> mobility_insights location_statistics_y2019m03d 23569853
> mobility_insights location_statistics_y2019m03w 19264373
> mobility_insights location_statistics_y2019m03 18105295
> Aggregate (cost=2.79..2.80 rows=1 width=8) (actual time=143.073..143.073 rows=1 loops=1)
> Buffers: shared hit=67334
> -> Index Scan using location_statistics_y2019m03w_pkey on location_statistics_y2019m03w st (cost=0.56..2.78 rows=1 width=8) (actual time=0.026..117.284 rows=516277 loops=1)
> Index Cond: ((daterange = '[2019-03-04,2019-03-11)'::daterange) AND (spatial_feature_id = 12675))
> Buffers: shared hit=67334
I guess this isn't actually the problem query, since it takes 143ms and not
dozens of seconds. I don't know what is the problem query, but maybe it might
help to create an new index on spatial_feature_id, which could be scanned
rather than scanning the unique index.
Also, if daterange *and* spatial_feature_id are always *both* included, then
this might work:
postgres=# CREATE STATISTICS t_stats (mcv) ON daterange,spatial_feature_id FROM t ; ANALYZE t;
--
Justin
On Wed, Aug 26, 2020, 1:37 AM Julian Wolf <julian.wolf@invenium.io> wrote:
Hi Justin,thank you very much for your help and sorry for the late answer.After testing around with your suggestions, it actually was the daterange type which caused all the problems. Messing around with the statistics value improved performance drastically but did not solve the problem. We decided to replace the daterange type with a BIGINT and calculate the "id" of the daterange by just using the BIGINT (2x 4 bytes) representation of the daterange. Thus, it can be transformed in both directions immutably.CREATE OR REPLACE FUNCTION to_daterange_id(daterange DATERANGE)
RETURNS BIGINT
IMMUTABLE
LANGUAGE plpgsql
AS
$$
BEGIN
return (extract(EPOCH FROM lower(daterange))::BIGINT << 32) |
extract(EPOCH FROM upper(daterange))::BIGINT;
end;--------------------------------------------------------------------------------------------------------------
CREATE OR REPLACE FUNCTION to_daterange(daterange_id BIGINT)
RETURNS DATERANGE
IMMUTABLE
LANGUAGE plpgsql
AS
$$
BEGIN
RETURN daterange(to_timestamp(daterange_id >> 32)::DATE, to_timestamp(daterange_id & x'FFFFFFFF'::BIGINT)::DATE);
END;
$$;
You might want to consider changing that language declaration to SQL.