Thread: Abbreviated keys for text cost model fix
Over in the abbreviated keys for numeric thread, Tomas Vondra reports a case where the ad-hoc cost model of abbreviated keys/sortsupport for text is far too conservative, and aborts a case where abbreviated keys can still greatly help. Previously, I proposed additions to the cost model that dealt with all this, by passing down a reltuples hint from the optimizer or a relcache entry to tuplesort. Robert seemed to think that this was a little bit much, and that it could be discussed at a later point. It's clear that we need to do something about cases like the one reported by Tomas, though. Attached patch adds a decay to the threshold that (estimated) abbreviated key cardinality must cross as a proportion of the (estimated) cardinality of the original set of strings to be sorted, while also quadrupling the initial required proportional cardinality to 20% of full key cardinality (but for just the first 10,000 strings, before this threshold is decayed aggressively). This seems significantly less invasive than what I previously proposed, while still being effective. I suggest that this be treated as a bugfix, since Tomas reports a 5% -7% regression with his case. OTOH, with this patch, I saw the same case have a ~300% improvement in performance. The adjusted cost model strongly prefers to decide to abort early, which seems desirable. In one way this makes it a little more vulnerable to skewed sets, since early data matters more, but in another more important way it's less vulnerable, since perhaps that skew indicates a general skew in the data, a skew that makes the idea of measuring the cardinality of abbreviated keys as a proxy for full key cardinality less useful. Thoughts? -- Peter Geoghegan
Attachment
Hi, On 22.2.2015 02:59, Peter Geoghegan wrote: > Over in the abbreviated keys for numeric thread, Tomas Vondra reports > a case where the ad-hoc cost model of abbreviated keys/sortsupport for > text is far too conservative, and aborts a case where abbreviated keys > can still greatly help. > > Previously, I proposed additions to the cost model that dealt with all > this, by passing down a reltuples hint from the optimizer or a > relcache entry to tuplesort. Robert seemed to think that this was a > little bit much, and that it could be discussed at a later point. It's > clear that we need to do something about cases like the one reported > by Tomas, though. > > Attached patch adds a decay to the threshold that (estimated) > abbreviated key cardinality must cross as a proportion of the > (estimated) cardinality of the original set of strings to be sorted, > while also quadrupling the initial required proportional cardinality > to 20% of full key cardinality (but for just the first 10,000 strings, > before this threshold is decayed aggressively). This seems > significantly less invasive than what I previously proposed, while > still being effective. I suggest that this be treated as a bugfix, > since Tomas reports a 5% -7% regression with his case. OTOH, with this > patch, I saw the same case have a ~300% improvement in performance. > > The adjusted cost model strongly prefers to decide to abort early, > which seems desirable. In one way this makes it a little more > vulnerable to skewed sets, since early data matters more, but in > another more important way it's less vulnerable, since perhaps that > skew indicates a general skew in the data, a skew that makes the idea > of measuring the cardinality of abbreviated keys as a proxy for full > key cardinality less useful. > > Thoughts? I've done some more testing, and this helps a lot. I did the tests on two different machines (one with Xeon E5450, one with i5-2500k), to see how the patches (this and the datum/numeric ones) behave on different CPUs. I also extended the tests a bit, to test random data (as before, and data already sorted in ASC/DESC order). And of course, I did that with different dataset sizes. The aggregated results are available as a spreadsheet here: Xeon E5450: http://bit.ly/1AyRN7M i5-2500k: http://bit.ly/1z9dLdP You probably only want to look at the 'summary' sheet, which shows speedup vs. master. Detailed results along with benchmarking scripts and logs attached. In short, this fixes all the cases except for the ASC sorted data. I haven't done any code review, but I think we want this. I'll use data from the i5-2500k, but it applies to the Xeon too, except that the Xeon results are more noisy and the speedups are not that significant. For the 'text' data type, and 'random' dataset, the results are these: scale datum cost-model ------------------------------- 100000 328% 323% 1000000 392% 391% 2000000 96% 565% 3000000 97% 572% 4000000 97% 571% 5000000 98% 570% The numbers are speedup vs. master, so 100% means exactly the same speed, 200% means twice as fast. So while with 'datum' patch this actually caused very nice speedup for small datasets - about 3-4x speedup up to 1M rows, for larger datasets we've seen small regression (~3% slower). With the cost model fix, we actually see a significant speedup (about 5.7x) for these cases. I haven't verified whether this produces the same results, but if it does this is very nice. For 'DESC' dataset (i.e. data sorted in reverse order), we do get even better numbers, with up to 6.5x speedup on large datasets. But for 'ASC' dataset (i.e. already sorted data), we do get this: scale datum cost-model ------------------------------- 100000 85% 84% 1000000 87% 87% 2000000 76% 96% 3000000 82% 90% 4000000 91% 83% 5000000 93% 81% Ummm, not that great, I guess :-( -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Sun, Feb 22, 2015 at 1:19 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > In short, this fixes all the cases except for the ASC sorted data. I > haven't done any code review, but I think we want this. > > I'll use data from the i5-2500k, but it applies to the Xeon too, except > that the Xeon results are more noisy and the speedups are not that > significant. > > For the 'text' data type, and 'random' dataset, the results are these: > > scale datum cost-model > ------------------------------- > 100000 328% 323% > 1000000 392% 391% > 2000000 96% 565% > 3000000 97% 572% > 4000000 97% 571% > 5000000 98% 570% > > The numbers are speedup vs. master, so 100% means exactly the same > speed, 200% means twice as fast. > > So while with 'datum' patch this actually caused very nice speedup for > small datasets - about 3-4x speedup up to 1M rows, for larger datasets > we've seen small regression (~3% slower). With the cost model fix, we > actually see a significant speedup (about 5.7x) for these cases. Cool. > I haven't verified whether this produces the same results, but if it > does this is very nice. > > For 'DESC' dataset (i.e. data sorted in reverse order), we do get even > better numbers, with up to 6.5x speedup on large datasets. > > But for 'ASC' dataset (i.e. already sorted data), we do get this: > > scale datum cost-model > ------------------------------- > 100000 85% 84% > 1000000 87% 87% > 2000000 76% 96% > 3000000 82% 90% > 4000000 91% 83% > 5000000 93% 81% > > Ummm, not that great, I guess :-( You should try it with the data fully sorted like this, but with one tiny difference: The very last tuple is out of order. How does that look? -- Peter Geoghegan
On Sun, Feb 22, 2015 at 1:30 PM, Peter Geoghegan <pg@heroku.com> wrote: > You should try it with the data fully sorted like this, but with one > tiny difference: The very last tuple is out of order. How does that > look? Another thing that may be of particular interest to you as a Czech person is how various locales perform. I believe that the collation rules of Czech and Hungarian are particularly complex, with several passes often required for strcoll() (I think maybe more than 4). You should either measure that, or control for it. I was prepared to attribute the differences in the two systems to differences in compute bandwidth between the CPUs involved (which are perhaps more noticeable now, since memory latency is less of a bottleneck). However, the inconsistent use of collations could actually matter here. -- Peter Geoghegan
On 23.2.2015 00:16, Peter Geoghegan wrote: > On Sun, Feb 22, 2015 at 1:30 PM, Peter Geoghegan <pg@heroku.com> wrote: >> You should try it with the data fully sorted like this, but with one >> tiny difference: The very last tuple is out of order. How does that >> look? I'm running that test now, I'll post the results tomorrow. > Another thing that may be of particular interest to you as a Czech > person is how various locales perform. I believe that the collation > rules of Czech and Hungarian are particularly complex, with several > passes often required for strcoll() (I think maybe more than 4). You > should either measure that, or control for it. I was prepared to > attribute the differences in the two systems to differences in compute > bandwidth between the CPUs involved (which are perhaps more noticeable > now, since memory latency is less of a bottleneck). However, the > inconsistent use of collations could actually matter here. That's a good point, but all the tests were executed with en_US.UTF-8. -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 22.2.2015 22:30, Peter Geoghegan wrote: > On Sun, Feb 22, 2015 at 1:19 PM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> In short, this fixes all the cases except for the ASC sorted data. I >> haven't done any code review, but I think we want this. >> >> I'll use data from the i5-2500k, but it applies to the Xeon too, except >> that the Xeon results are more noisy and the speedups are not that >> significant. >> >> For the 'text' data type, and 'random' dataset, the results are these: >> >> scale datum cost-model >> ------------------------------- >> 100000 328% 323% >> 1000000 392% 391% >> 2000000 96% 565% >> 3000000 97% 572% >> 4000000 97% 571% >> 5000000 98% 570% >> >> The numbers are speedup vs. master, so 100% means exactly the same >> speed, 200% means twice as fast. >> >> So while with 'datum' patch this actually caused very nice speedup for >> small datasets - about 3-4x speedup up to 1M rows, for larger datasets >> we've seen small regression (~3% slower). With the cost model fix, we >> actually see a significant speedup (about 5.7x) for these cases. > > Cool. > >> I haven't verified whether this produces the same results, but if it >> does this is very nice. >> >> For 'DESC' dataset (i.e. data sorted in reverse order), we do get even >> better numbers, with up to 6.5x speedup on large datasets. >> >> But for 'ASC' dataset (i.e. already sorted data), we do get this: >> >> scale datum cost-model >> ------------------------------- >> 100000 85% 84% >> 1000000 87% 87% >> 2000000 76% 96% >> 3000000 82% 90% >> 4000000 91% 83% >> 5000000 93% 81% >> >> Ummm, not that great, I guess :-( > > You should try it with the data fully sorted like this, but with one > tiny difference: The very last tuple is out of order. How does that > look? So here are the results for ASC-ordered dataset, with one 'unsorted' row added to the end of the dataset. As before the complete scripts are attached, and the raw results are available in a spreadsheet: http://bit.ly/18g1nTU The durations are much higher than without the single unsorted row added at the end. Queries often take 20x longer to finish (on the same code), depending on the scale. The speedup results (compared to master) look like this: scale query# datum numeric cost model 100000 1 859% 861% 856% 100000 2 811% 814% 805% 100000 3 100% 100% 97% 1000000 1 805% 804% 807% 1000000 2 769% 773% 770% 1000000 3 100% 100% 98% 2000000 1 97% 97% 673% 2000000 2 96% 97% 646% 2000000 3 99% 101% 678% 3000000 1 98% 98% 578% 3000000 2 96% 97% 557% 3000000 3 99% 101% 579% 4000000 1 99% 99% 513% 4000000 2 97% 98% 497% 4000000 3 99% 101% 510% 5000000 1 99% 99% 469% 5000000 2 97% 98% 456% 5000000 3 99% 101% 466% What's interesting here is that some queries are much faster, but query #3 is slow until we hit 2M rows: select * from (select * from stuff_int_desc order by randint offset 100000000000) foo Looking at the previous tests, I see this is exactly what's happening to this query with 'random' dataset - it's slightly slower than master up until 2M rows, when it suddenly jumps to the same speedup as the other queries. Can we do something about that? Anyway, I'm wondering what conclusion we can do from this? I believe vast majority of datasets in production won't be perfectly sorted, because when the table is CLUSTERed by index we tend to use index scan to do the sort (so no problem), or the data are not actually perfectly sorted (and here we get significant speedup). -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Mon, Feb 23, 2015 at 8:40 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > The durations are much higher than without the single unsorted row added > at the end. Queries often take 20x longer to finish (on the same code), > depending on the scale. I knew this would happen. :-) > What's interesting here is that some queries are much faster, but query > #3 is slow until we hit 2M rows: > > select * from (select * from stuff_int_desc order by randint > offset 100000000000) foo Not sure why that would be. Can you see what a "#define DEBUG_ABBREV_KEYS" (within varlena.c) build shows happens with the cost model? Enable debug1 output for that. > Looking at the previous tests, I see this is exactly what's happening to > this query with 'random' dataset - it's slightly slower than master up > until 2M rows, when it suddenly jumps to the same speedup as the other > queries. Can we do something about that? Possibly. > Anyway, I'm wondering what conclusion we can do from this? I believe > vast majority of datasets in production won't be perfectly sorted, > because when the table is CLUSTERed by index we tend to use index scan > to do the sort (so no problem), or the data are not actually perfectly > sorted (and here we get significant speedup). One conclusion might be that commit a3f0b3d6 should not have added the pre-check for perfectly sorted input, which is not in the Bentley & Mcilroy original - exploiting the fact that the set is already partially sorted is a fine thing, but we're not doing the necessary bookkeeping. But that's not really why I suggested that test case. Rather, I wanted to put the possible regression in the event of perfectly sorted input in perspective. Maybe that regression isn't worth worrying about, if in the event of a single tuple being out of order at the end of the set the outcome dramatically shifts to strongly favor abbreviation. Another way of looking at it would be that the pre-sorted case is an argument against strxfrm() sorting in general more than abbreviated keys in particular, an argument that seems new and strange to me. The analysis of algorithms focuses on the worst case and average case for a reason, and (say) the glibc documentation never considers this when discussing strxfrm(). -- Peter Geoghegan
On 23.2.2015 19:22, Peter Geoghegan wrote: > On Mon, Feb 23, 2015 at 8:40 AM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> The durations are much higher than without the single unsorted row added >> at the end. Queries often take 20x longer to finish (on the same code), >> depending on the scale. > > I knew this would happen. :-) > >> What's interesting here is that some queries are much faster, but query >> #3 is slow until we hit 2M rows: >> >> select * from (select * from stuff_int_desc order by randint >> offset 100000000000) foo > > Not sure why that would be. Can you see what a "#define > DEBUG_ABBREV_KEYS" (within varlena.c) build shows happens with the > cost model? Enable debug1 output for that. test=# select * from (select * from stuff_text_asc order by randtxt offset 100000000000) foo; DEBUG: abbrev_distinct after 160: 152.862464 (key_distinct: 155.187096, norm_abbrev_card: 0.955390, prop_card: 0.200000) DEBUG: abbrev_distinct after 320: 314.784336 (key_distinct: 313.425344, norm_abbrev_card: 0.983701, prop_card: 0.200000) DEBUG: abbrev_distinct after 640: 629.050490 (key_distinct: 643.945298, norm_abbrev_card: 0.982891, prop_card: 0.200000) DEBUG: abbrev_distinct after 1280: 1194.271440 (key_distinct: 1257.153875, norm_abbrev_card: 0.933025, prop_card: 0.200000) DEBUG: abbrev_distinct after 2560: 2402.820431 (key_distinct: 2631.772790, norm_abbrev_card: 0.938602, prop_card: 0.200000) DEBUG: abbrev_distinct after 5120: 4490.142750 (key_distinct: 5261.865309, norm_abbrev_card: 0.876981, prop_card: 0.200000) DEBUG: abbrev_distinct after 10240: 8000.884171 (key_distinct: 10123.941172, norm_abbrev_card: 0.781336, prop_card: 0.200000) DEBUG: abbrev_distinct after 20480: 13596.546899 (key_distinct: 20563.364696, norm_abbrev_card: 0.663894, prop_card: 0.130000) DEBUG: abbrev_distinct after 40960: 23369.899021 (key_distinct: 40500.600680, norm_abbrev_card: 0.570554, prop_card: 0.084500) DEBUG: abbrev_distinct after 81920: 30884.544030 (key_distinct: 81466.848436, norm_abbrev_card: 0.377009, prop_card: 0.054925)randtxt --------- (0 rows) >> Looking at the previous tests, I see this is exactly what's >> happening to this query with 'random' dataset - it's slightly >> slower than master up until 2M rows, when it suddenly jumps to the >> same speedup as the other queries. Can we do something about that? > > Possibly. > >> Anyway, I'm wondering what conclusion we can do from this? I believe >> vast majority of datasets in production won't be perfectly sorted, >> because when the table is CLUSTERed by index we tend to use index scan >> to do the sort (so no problem), or the data are not actually perfectly >> sorted (and here we get significant speedup). > > One conclusion might be that commit a3f0b3d6 should not have added the > pre-check for perfectly sorted input, which is not in the Bentley & > Mcilroy original - exploiting the fact that the set is already > partially sorted is a fine thing, but we're not doing the necessary Would that pre-check hurt even the random test case? I can imagine it might behave strangely with almost perfectly sorted data (when only the very last row is unsorted), but not for random data (but see below). > bookkeeping. But that's not really why I suggested that test case. > Rather, I wanted to put the possible regression in the event of > perfectly sorted input in perspective. Maybe that regression isn't > worth worrying about, if in the event of a single tuple being out of > order at the end of the set the outcome dramatically shifts to > strongly favor abbreviation. Another way of looking at it would be > that the pre-sorted case is an argument against strxfrm() sorting in > general more than abbreviated keys in particular, an argument that > seems new and strange to me. The analysis of algorithms focuses on the > worst case and average case for a reason, and (say) the glibc > documentation never considers this when discussing strxfrm(). Yeah, every algorithm has a worst case - either you get a very fast execution on average with a few rare cases when it's much slower, or you get very bad average performance. But after looking at the data a bit more, I actually think this is a red herring. After looking at the actual runtimes (and not just speedups), I get this: scale query# master cost-model speedup 100000 1 884 103 856% 1000000 1 12153 1506 807% 2000000 1 26187 3894 673% 3000000 1 38147 6601 578% 4000000 1 56332 10976 513% 5000000 1 77009 16409 469% 100000 2 883 110 805% 1000000 2 12114 1574 770% 2000000 2 26104 4038 646% 3000000 2 38028 6828 557% 4000000 2 56138 11294 497% 5000000 2 76720 16829 456% 100000 3 102 106 97% 1000000 3 1507 1537 98% 2000000 3 26984 3977 678% 3000000 3 39090 6753 579% 4000000 3 57239 11228 510% 5000000 3 78150 16769 466% So while it's true that for the 3rd query we get much worse results compared to the other queries (i.e. we don't get >400% speedup but ~3% slowdown compared to master), it's true that master performs exceptionally well for this query with small datasets. Once we get to 2M rows, the master performance drops significantly but cost-model keeps the performance characteristics and the speedup jumps back to ~700% which is nice. These numbers are for the 'ASC + unsorted row' test, but I do get exactly the same pattern for the 'random' tests done previously. It would be nice if we could address the 3% regression for the last query, but I guess it's not a big deal. The numbers in general are absolutely impressive. Kudos. -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Feb 23, 2015 at 11:17 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > So while it's true that for the 3rd query we get much worse results > compared to the other queries (i.e. we don't get >400% speedup but ~3% > slowdown compared to master), it's true that master performs > exceptionally well for this query with small datasets. Once we get to 2M > rows, the master performance drops significantly but cost-model keeps > the performance characteristics and the speedup jumps back to ~700% > which is nice. > > These numbers are for the 'ASC + unsorted row' test, but I do get > exactly the same pattern for the 'random' tests done previously. Yeah. Looks like you're comparing a case where the old cost model did the right thing anyway (i.e. used abbreviation). The difference would then be entirely explainable as noise. Right? > It would be nice if we could address the 3% regression for the last > query, but I guess it's not a big deal. The numbers in general are > absolutely impressive. Kudos. Thanks. -- Peter Geoghegan
On 23.2.2015 21:52, Peter Geoghegan wrote: > On Mon, Feb 23, 2015 at 11:17 AM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> So while it's true that for the 3rd query we get much worse results >> compared to the other queries (i.e. we don't get >400% speedup but ~3% >> slowdown compared to master), it's true that master performs >> exceptionally well for this query with small datasets. Once we get to 2M >> rows, the master performance drops significantly but cost-model keeps >> the performance characteristics and the speedup jumps back to ~700% >> which is nice. >> >> These numbers are for the 'ASC + unsorted row' test, but I do get >> exactly the same pattern for the 'random' tests done previously. > > Yeah. Looks like you're comparing a case where the old cost model > did the right thing anyway (i.e. used abbreviation). The difference > would then be entirely explainable as noise. Right? I don't think it's just noise - on the i5-2500 CPU, the result are very consistent. For 1M random rows with this query (#3) select * from (select * from stuff_text order by randtxt offset 100000000000)foo I do get this (with 20 of the query): min max median mean stddev ------------------------------------------------------------ master 932.449 932.902 932.637 932.639 0.116 cost-model 957.16 958.47 957.497 957.586 0.411 That is super-consistent, IMHO, and the ~2.6% slowdown is clearly visible. But it might be due to the padding described by Andrew. Similarly on the Xeon E5450 CPU, although the results there are much more noisy. Maybe it's more >> It would be nice if we could address the 3% regression for the >> last query, but I guess it's not a big deal. The numbers in general >> are absolutely impressive. Kudos. > > Thanks. Are you going to add this into the CF? Would be nice to get this into 9.5. Strictly speaking it should go to 2015-06 I guess, but I'd consider it part of one of the existing sortsupport patches. -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Feb 23, 2015 at 4:41 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > Are you going to add this into the CF? Would be nice to get this into > 9.5. Strictly speaking it should go to 2015-06 I guess, but I'd consider > it part of one of the existing sortsupport patches. It's a bugfix, IMV. I guess I should add it to the current CF, but I think I might be forbidden from doing so as a non-CF-manager... -- Peter Geoghegan
On 24.2.2015 01:44, Peter Geoghegan wrote: > On Mon, Feb 23, 2015 at 4:41 PM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> Are you going to add this into the CF? Would be nice to get this into >> 9.5. Strictly speaking it should go to 2015-06 I guess, but I'd consider >> it part of one of the existing sortsupport patches. > > It's a bugfix, IMV. I guess I should add it to the current CF, but I > think I might be forbidden from doing so as a non-CF-manager... Yes, I consider that a bugfix too, fixing a performance regression. regards -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 23/02/15 16:40, Tomas Vondra wrote: > On 22.2.2015 22:30, Peter Geoghegan wrote: >> You should try it with the data fully sorted like this, but with one >> tiny difference: The very last tuple is out of order. How does that >> look? If this case is actually important, a merge-sort can take significant advantage of the partial order: test=# explain analyze select * from (select * from stuff_text_asc order by randtxt offset 100000000000) foo; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------Limit (cost=247054.81..247054.81 rows=1 width=18) (actual time=25133.029..25133.029 rows=0 loops=1) -> Sort (cost=242054.81..247054.81 rows=2000001 width=18) (actual time=25025.931..25088.406 rows=2000001 loops=1) Sort Key: stuff_text_asc.randtxt Sort Method: quicksort Memory:221213kB Compares: 95541376 -> Seq Scan on stuff_text_asc (cost=0.00..32739.01 rows=2000001 width=18) (actual time=0.011..118.390 rows=2000001 loops=1)Planning time: 0.080 msExecution time: 25144.538ms (7 rows) Time: 25145.185 ms test=# test=# test=# set enable_intmerge_sort to on; SET Time: 0.378 ms test=# explain analyze select * from (select * from stuff_text_asc order by randtxt offset 100000000000) foo; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------Limit (cost=247054.81..247054.81 rows=1 width=18) (actual time=1051.603..1051.603 rows=0 loops=1) -> Sort (cost=242054.81..247054.81 rows=2000001 width=18) (actual time=943.304..1006.988 rows=2000001 loops=1) Sort Key: stuff_text_asc.randtxt Sort Method: internal merge Memory: 221213kB Compares: 2000002 -> Seq Scan on stuff_text_asc (cost=0.00..32739.01 rows=2000001 width=18) (actual time=0.009..98.474 rows=2000001 loops=1)Planning time: 0.072 msExecution time: 1063.434 ms (7 rows) Time: 1064.113 ms test=# test=# set enable_intmerge_sort to off; SET Time: 0.353 ms test=# test=# test=# test=# test=# test=# explain analyze select count(distinct randtxt) from stuff_text_asc; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=37739.01..37739.02 rows=1 width=18) (actual time=25196.814..25196.815 rows=1 loops=1) -> Seq Scan on stuff_text_asc (cost=0.00..32739.01 rows=2000001 width=18) (actual time=0.010..114.995 rows=2000001 loops=1)Planning time: 0.053 msExecution time: 25196.857 ms (4 rows) Time: 25197.371 ms test=# test=# explain analyze select count(*) from (select distinct randtxt from stuff_text_asc) as foo; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=277054.83..277054.84 rows=1 width=0) (actual time=25521.258..25521.258 rows=1 loops=1) -> Unique (cost=242054.81..252054.81 rows=2000001 width=18) (actual time=25101.157..25438.622 rows=1999100 loops=1) -> Sort (cost=242054.81..247054.81 rows=2000001 width=18) (actual time=25101.156..25184.436 rows=2000001 loops=1) Sort Key: stuff_text_asc.randtxt Sort Method:quicksort Memory: 221213kB Compares: 95541376 -> Seq Scan on stuff_text_asc (cost=0.00..32739.01 rows=2000001 width=18) (actual time=0.011..116.509 rows=2000001 loops=1)Planning time: 0.088 msExecution time: 25532.947ms (8 rows) Time: 25533.642 ms test=# test=# test=# set enable_intmerge_sort to on; SET Time: 0.401 ms test=# explain analyze select count(*) from (select distinct randtxt from stuff_text_asc) as foo; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=272054.82..272054.83 rows=1 width=0) (actual time=1184.289..1184.289 rows=1 loops=1) -> Sort (cost=242054.81..247054.81 rows=2000001 width=18) (actual time=1037.019..1100.720 rows=1999100 loops=1) Sort Key: stuff_text_asc.randtxt Sort Method: dedup internalmerge Memory: 221143kB Compares: 2000001 -> Seq Scan on stuff_text_asc (cost=0.00..32739.01 rows=2000001 width=18) (actual time=0.010..106.729 rows=2000001 loops=1)Planning time: 0.086 msExecution time: 1195.891 ms (7 rows) Time: 1196.514 ms test=# -- Cheers, Jeremy
On Tue, Feb 24, 2015 at 4:32 PM, Jeremy Harris <jgh@wizmail.org> wrote: > On 23/02/15 16:40, Tomas Vondra wrote: >> On 22.2.2015 22:30, Peter Geoghegan wrote: >>> You should try it with the data fully sorted like this, but with one >>> tiny difference: The very last tuple is out of order. How does that >>> look? > > If this case is actually important, a merge-sort can take > significant advantage of the partial order: > test=# set enable_intmerge_sort to on; What is this? -- Peter Geoghegan
On 25/02/15 00:42, Peter Geoghegan wrote: > On Tue, Feb 24, 2015 at 4:32 PM, Jeremy Harris <jgh@wizmail.org> wrote: >> On 23/02/15 16:40, Tomas Vondra wrote: >>> On 22.2.2015 22:30, Peter Geoghegan wrote: >>>> You should try it with the data fully sorted like this, but with one >>>> tiny difference: The very last tuple is out of order. How does that >>>> look? >> >> If this case is actually important, a merge-sort can take >> significant advantage of the partial order: > >> test=# set enable_intmerge_sort to on; > > What is this? Associated with an implementation of an internal merge sort, a control to disable it. Not in the mainline sourcebase. -- Cheers, Jeremy
On 25/02/15 00:32, Jeremy Harris wrote: > On 23/02/15 16:40, Tomas Vondra wrote: >> On 22.2.2015 22:30, Peter Geoghegan wrote: >>> You should try it with the data fully sorted like this, but with one >>> tiny difference: The very last tuple is out of order. How does that >>> look? > > If this case is actually important, a merge-sort can take > significant advantage of the partial order: Presumably it is not, as nobody commented on the alleged 20 or 30x speedup.
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Mon, Mar 2, 2015 at 8:04 PM, Jeremy Harris<span dir="ltr"><<a href="mailto:jgh@wizmail.org" target="_blank">jgh@wizmail.org</a>></span> wrote:<br /><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On25/02/15 00:32, Jeremy Harris wrote:<br /> > On 23/02/15 16:40, Tomas Vondra wrote:<br /> >> On 22.2.201522:30, Peter Geoghegan wrote:<br /> >>> You should try it with the data fully sorted like this, but withone<br /> >>> tiny difference: The very last tuple is out of order. How does that<br /> >>> look?<br/> ><br /> > If this case is actually important, a merge-sort can take<br /> > significant advantage ofthe partial order:<br /><br /></span>Presumably it is not, as nobody commented<br /> on the alleged 20 or 30x speedup.<br/><div class="HOEnZb"><div class="h5"><br /><br /><br /><br /> --<br /> Sent via pgsql-hackers mailing list (<ahref="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br /> To make changes to your subscription:<br/><a href="http://www.postgresql.org/mailpref/pgsql-hackers" target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></div></div></blockquote></div><br /></div><divclass="gmail_extra">Does it always perform mergesort instead of quicksort when enabled?<br />Seems like the casefor a hybrid sort (like timsort). I know there was some talk to replace quicksort with timsort back in 2012 but it wasa deadend at the time.<br /></div></div>
On 03/03/15 03:08, Arthur Silva wrote: > Does it always perform mergesort instead of quicksort when enabled? Yes; there seemed no advantage to any additional complexity. The merge consistently performs fewer comparisons than the quicksort, on random input - and many fewer if there are any sorted (or reverse-sorted) sections. However, it also consistently takes slightly longer (a few percent). I was unable to chase this down but assume it to be a cacheing effect. So I don't currently think it should replace the current sort for all use. If the planner can identify cases where there is some pre-sort in the input (CLUSTER was mentioned?), or maybe a correlated index, it could be worthwhile. Also useful might be very-expensive comparison cases, and distinct-output cases (uniqification is supported at each submerge stage, so data disappears early). There's potential for running submerges in parallel using multiple cores, but I've not worked on that. -- Cheers, Jeremy
On Tue, Mar 3, 2015 at 2:00 AM, Jeremy Harris <jgh@wizmail.org> wrote: > Yes; there seemed no advantage to any additional complexity. > The merge consistently performs fewer comparisons than the > quicksort, on random input - and many fewer if there are > any sorted (or reverse-sorted) sections. However, it also > consistently takes slightly longer (a few percent). I was > unable to chase this down but assume it to be a cacheing > effect. So I don't currently think it should replace the > current sort for all use. It's definitely caching effects. That's a very important consideration here. -- Peter Geoghegan
On Mon, Feb 23, 2015 at 7:44 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Mon, Feb 23, 2015 at 4:41 PM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> Are you going to add this into the CF? Would be nice to get this into >> 9.5. Strictly speaking it should go to 2015-06 I guess, but I'd consider >> it part of one of the existing sortsupport patches. > > It's a bugfix, IMV. I guess I should add it to the current CF, but I > think I might be forbidden from doing so as a non-CF-manager... Committed. For future reference, I'd prefer to have things like this added to the next CF rather than no CF at all. I'm not sure if you've got all the details right here, but the concept makes sense to me: if we're going to give up, we should do it early. Doing it later throws away more work and is therefore riskier. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Apr 3, 2015 at 1:49 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Committed. For future reference, I'd prefer to have things like this > added to the next CF rather than no CF at all. I'll bear that in mind. Thanks. -- Peter Geoghegan