[HACKERS] Skip all-visible pages during second HeapScan of CIC - Mailing list pgsql-hackers

From Pavan Deolasee
Subject [HACKERS] Skip all-visible pages during second HeapScan of CIC
Date
Msg-id CABOikdO+=3=rK_Y=8o-xd5oPiNSPsoORYThJUCNE8kWm1pWOow@mail.gmail.com
Whole thread Raw
Responses Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC
Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC
Re: [HACKERS] Skip all-visible pages during second HeapScan of CIC
List pgsql-hackers
Hello All,

During the second heap scan of CREATE INDEX CONCURRENTLY, we're only interested in the tuples which were inserted after the first scan was started. All such tuples can only exists in pages which have their VM bit unset. So I propose the attached patch which consults VM during second scan and skip all-visible pages. We do the same trick of skipping pages only if certain threshold of pages can be skipped to ensure OS's read-ahead is not disturbed.

The patch obviously shows significant reduction of time for building index concurrently for very large tables, which are not being updated frequently and which was vacuumed recently (so that VM bits are set). I can post performance numbers if there is interest. For tables that are being updated heavily, the threshold skipping was indeed useful and without that we saw a slight regression.

Since VM bits are only set during VACUUM which conflicts with CIC on the relation lock, I don't see any risk of incorrectly skipping pages that the second scan should have scanned.

Comments?

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] avoid bloat from CREATE INDEX CONCURRENTLY
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] BRIN de-summarize ranges