Re: Should we optimize the `ORDER BY random() LIMIT x` case? - Mailing list pgsql-hackers

From Aleksander Alekseev
Subject Re: Should we optimize the `ORDER BY random() LIMIT x` case?
Date
Msg-id CAJ7c6TOgEX02tofvoMaZHAO4E5pe=pZs2VNHa9m5aEpM9mwviw@mail.gmail.com
Whole thread Raw
In response to Re: Should we optimize the `ORDER BY random() LIMIT x` case?  (Andrei Lepikhov <lepihov@gmail.com>)
List pgsql-hackers
Andrei,

> > ```
> > -- imagine replacing inefficient array_sample(array_agg(t), 10)
> > -- with more efficient array_sample_reservoir(t, 10)
> > SELECT (unnest(agg)).* AS k FROM
> > (  SELECT array_sample(array_agg(t), 10) AS agg FROM (
> >     ... here goes the subquery ...
> >     ) AS t
> > );
> > ```
> >
> > ... if only we supported such a column expansion for not registered
> > records. Currently such a query fails with:
> >
> > ```
> > ERROR:  record type has not been registered
> > ```
> I know about this issue. Having resolved it in a limited number of local
> cases (like FDW push-down of row types), I still do not have a universal
> solution worth proposing upstream. Do you have any public implementation
> of the array_sample_reservoir to play with?

array_sample_reservoir() is purely a figment of my imagination at the
moment. Semantically it does the same as array_sample(array_agg(t), N)
except the fact that array_sample(..., N) requires the array to have
at least N items. You can experiment with array_sample(array_agg(...),
N) as long as the subquery returns much more than N rows.

-- 
Best regards,
Aleksander Alekseev



pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: Conflict detection for update_deleted in logical replication
Next
From: Sami Imseih
Date:
Subject: Re: Regression in statement locations