Re: REPACK and naming - Mailing list pgsql-hackers

From Álvaro Herrera
Subject Re: REPACK and naming
Date
Msg-id 202509191243.7o2i3qnbhjmb@alvherre.pgsql
Whole thread Raw
In response to Re: REPACK and naming  (Antonin Houska <ah@cybertec.at>)
List pgsql-hackers
On 2025-Sep-19, Antonin Houska wrote:

> Admittedly I haven't thought about clause like ORDER BY yet, but I wonder if
> it'd really be useful. My understanding is that the purpose of clustering is
> to make index scan more efficient:

Not necessarily.  For some queries in some workloads, having tuples in a
certain order for a seqscan might give considerable performance benefit
also.  Moreso with, say, BRIN indexes, where having one tuple in one
page range or another could mean having to scan that page range or
eliding it completely.

> with a clustered table, the heap tuples
> pertaining to given index tuple should be located on the same page, so the
> heap access is not that random.

Yes, I suppose this is the first-order reason, and probably why we
currently only support basing clustering on an index.  But I doubt it's
the only one.  (It's also worth pointing out that quite possibly having
REPACK CONCURRENTLY is going to make clustering a lot more popular;
without concurrency, clustering is practically useless.)

> If IOT-AM table does not have anything like index, I imagine it has some kind
> of ordering information in the system catalog. Without that the query planner
> can hardly utilize the ordering.

Sure.

> In such case REPACK should use the catalog information on ordering
> rather than accept arbitrary ORDER BY clause.

... but, as David said, it might be valuable to change that ordering for
whatever reason.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



pgsql-hackers by date:

Previous
From: Álvaro Herrera
Date:
Subject: Re: REPACK and naming
Next
From: Sami Imseih
Date:
Subject: Re: [BUG] temporary file usage report with extended protocol and unnamed portals