Thread: [HACKERS] avoid bloat from CREATE INDEX CONCURRENTLY
Here's another small patch, this time from Simon Riggs. Maybe he already posted it for this commitfest, but I didn't find it in a quick look so here it is. This patch reduces the amount of bloat you get from running CREATE INDEX CONCURRENTLY by destroying the snapshot taken in the first phase, before entering the second phase. This allows the global xmin to advance, letting concurrent vacuum keep bloat in other tables in check. Currently this implements the change for btree indexes only, but doing it for other indexes should be a one-liner. -- Álvaro Herrera -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > This patch reduces the amount of bloat you get from running CREATE INDEX > CONCURRENTLY by destroying the snapshot taken in the first phase, before > entering the second phase. This allows the global xmin to advance, Um ... isn't there a transaction boundary there anyway? regards, tom lane
On 28 February 2017 at 13:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> This patch reduces the amount of bloat you get from running CREATE INDEX >> CONCURRENTLY by destroying the snapshot taken in the first phase, before >> entering the second phase. This allows the global xmin to advance, > > Um ... isn't there a transaction boundary there anyway? Yes, the patch releases the snapshot early, so it does not hold it once the build scan has completed. This allows the sort and build phases to occur without holding back the xmin. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Simon Riggs <simon@2ndquadrant.com> writes: > On 28 February 2017 at 13:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Um ... isn't there a transaction boundary there anyway? > Yes, the patch releases the snapshot early, so it does not hold it > once the build scan has completed. This allows the sort and build > phases to occur without holding back the xmin. Oh ... so Alvaro explained it badly. The reason this is specific to btree is that it's the only AM with any significant post-scan building time. However, now that I read the patch: this is a horribly ugly hack. I really don't like the API (if it even deserves the dignity of that name) that you've added to snapmgr. I supposwe the zero documentation for it fits in nicely with the fact that it's a badly-thought-out kluge. I think it would be better to just move the responsibility for snapshot popping in this sequence to the index AMs, full stop. regards, tom lane
On 28 February 2017 at 13:30, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndquadrant.com> writes: >> On 28 February 2017 at 13:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Um ... isn't there a transaction boundary there anyway? > >> Yes, the patch releases the snapshot early, so it does not hold it >> once the build scan has completed. This allows the sort and build >> phases to occur without holding back the xmin. > > Oh ... so Alvaro explained it badly. The reason this is specific to > btree is that it's the only AM with any significant post-scan building > time. > > However, now that I read the patch: this is a horribly ugly hack. > I really don't like the API (if it even deserves the dignity of that > name) that you've added to snapmgr. I supposwe the zero documentation > for it fits in nicely with the fact that it's a badly-thought-out kluge. WTF. Frankly, knowing it would generate such a ridiculously negative response was the reason it wasn't me that submitted it and why its not fully documented. Documentation in this case would be a short paragraph in the index AM, explaining for the user what is already in code comments. You're right to point out that there is significant post-scan build time and the reduction in bloat during that time is well worth the trouble. I'm pleased to have thought of it and to have contributed it to the community. > I think it would be better to just move the responsibility for snapshot > popping in this sequence to the index AMs, full stop. There were two choices: a) leave the responsibility to the index AM, giving a clean API, or b) don't trust that all index AMs would know or implement this correctly. If the index AM doesn't implement this correctly it becomes a crash bug, which seemed unacceptable in an extensible server. After implementing (a), I chose (b) and took extra time to implement the the ugly API in preference to the possibility of a crash bug. I am open to following consensus on that and to resubmit other patches as required. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services