Thread: TABLESAMPLE doesn't actually satisfy the SQL spec, does it?

TABLESAMPLE doesn't actually satisfy the SQL spec, does it?

From

Tom Lane

Date:

12 July 2015, 16:02:17

As best I can tell (evidence below), the SQL standard requires that if a
single query reads a table with a TABLESAMPLE clause multiple times (say,
because it's on the inside of a nestloop), then the exact same set of
sampled rows are returned each time. The BERNOULLI code, at least, fails
to achieve this. Suppose that a concurrent session adds some tuples to
a page in between scans. When that page is visited the next time, the
sampler RNG will be advanced more times than it was previously, because
it's called a number of times that depends on PageGetMaxOffsetNumber().
So the RNG state will be different at the end of the page, and then
the set of tuples selected from subsequent pages will be completely
different from what it was before. This will happen even if none of the
added rows are committed yet, let alone visible to the query's snapshot;
which makes it a failure of serializability even if you don't believe
the argument that it violates the requirements for TABLESAMPLE.

Now, as for the language-lawyering aspect of it, I read in SQL2011
7.6 <table reference> general rule 5a:
iv) Case:
1) If <sample method> specifies BERNOULLI, then the result of TF is a table containing approximately (N*S/100)
rowsof RT. The probability of a row of RT being included in result of TF is S/100. Further, whether a given row
ofRT is included in result of TF is independent of whether other rows of RT are included in result of TF.

2) Otherwise, result of TF is a table containing approximately (N*S/100) rows of RT. The probability of a row of
RTbeing included in result of TF is S/100.

v) If TF contains outer references, then a table with identical rows is generated every time TF is evaluated with a
givenset of values for outer references.

This seems to me to clearly require that a fixed set of samples (ie, a
"table", whether real or virtual) is selected during any query. Note that
this requirement is the same whether or not REPEATABLE is used.

Also, I found a description of IBM DB2's implementation of this feature,
http://www.almaden.ibm.com/cs/people/peterh/idugjbig.pdf
and in that they say:
Semantically, sampling of a table occurs before any other query processing, such as applying predicates or performing
joins.One can envision the original tables referenced in a query being initially replaced by temporary “reduced” tables
containingsampled rows, and then normal query processing commencing on the reduced tables. (For performance reasons,
actualprocessing may not occur in exactly this manner.) It follows, for example, that repeated accesses of a sampled
tablewithin the same query, such as in a nested-loop join or a correlated subquery, will see the same sample of that
tablewithin a single execution of that query.

They do mention that alterations to a table may result in *successive*
queries giving different results, even when using REPEATABLE. But I do
not think it's okay for this to happen within a single query. That would
mean that vagaries of plan choice, eg nestloop vs some other join method,
would produce inconsistent results.

A possible way around this problem is to redefine the sampling rule so
that it is not history-dependent but depends only on the tuple TIDs.
For instance, one could hash the TID of a candidate tuple, xor that with
a hash of the seed being used for the current query, and then select the
tuple if (hash/MAXINT) < P.
regards, tom lane

Re: TABLESAMPLE doesn't actually satisfy the SQL spec, does it?

From

Robert Haas

Date:

12 July 2015, 16:57:25

On Sun, Jul 12, 2015 at 12:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> As best I can tell (evidence below), the SQL standard requires that if a
> single query reads a table with a TABLESAMPLE clause multiple times (say,
> because it's on the inside of a nestloop), then the exact same set of
> sampled rows are returned each time.

Hmm, I tend to agree that it would be good if it behaved that way.
Otherwise, it seems like the behavior could be quite surprising.
Generally, we don't want the set of tuples that can be seen by a query
to change during the query; that's one of the things that snapshot
isolation does for us, as compared with, say, a literal interpretation
of READ COMMITTED, which would behave as SnapshotNow used to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: TABLESAMPLE doesn't actually satisfy the SQL spec, does it?

From

Tom Lane

Date:

12 July 2015, 17:51:02

Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Jul 12, 2015 at 12:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> As best I can tell (evidence below), the SQL standard requires that if a
>> single query reads a table with a TABLESAMPLE clause multiple times (say,
>> because it's on the inside of a nestloop), then the exact same set of
>> sampled rows are returned each time.

> Hmm, I tend to agree that it would be good if it behaved that way.
> Otherwise, it seems like the behavior could be quite surprising.

Yeah.  As a concrete example, consider
select * from t1, t2 tablesample ... where t1.x = t2.x

and suppose that there are multiple occurences of x = 10 in both tables.
As things stand, if the join is done as a nestloop then a particular t2
row with x = 10 might appear in the output joined with some of the t1 rows
with x = 10 but not with others.  On the other hand, the results of a hash
join would not be inconsistent in that way, since t2 would be read only
once.
        regards, tom lane

Re: TABLESAMPLE doesn't actually satisfy the SQL spec, does it?

From

Simon Riggs

Date:

14 July 2015, 06:31:41

On 12 July 2015 at 18:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Jul 12, 2015 at 12:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> As best I can tell (evidence below), the SQL standard requires that if a
>> single query reads a table with a TABLESAMPLE clause multiple times (say,
>> because it's on the inside of a nestloop), then the exact same set of
>> sampled rows are returned each time.

> Hmm, I tend to agree that it would be good if it behaved that way.
> Otherwise, it seems like the behavior could be quite surprising.

Yeah. As a concrete example, consider

select * from t1, t2 tablesample ... where t1.x = t2.x

and suppose that there are multiple occurences of x = 10 in both tables.
As things stand, if the join is done as a nestloop then a particular t2
row with x = 10 might appear in the output joined with some of the t1 rows
with x = 10 but not with others. On the other hand, the results of a hash
join would not be inconsistent in that way, since t2 would be read only
once.

Hmm, a non-key join to a sampled table. What would the meaning of such a query be? The table would need to big enough to experience updates and also be under current update activity. BERNOULLI isn't likely to have many users because it is so slow. So overall, such a query is not useful and as such unlikely.

The mechanism of sampling was discussed heavily before and there wasn't an approach that met all of the goals: IIRC we would need to test visibility twice on each tuple to get around these problems. Given that users of TABLESAMPLE have already explicitly stated their preference for speed over accuracy, minor tweaks to handle corner cases don't seem warranted.

If you have a simple, better way I would not object. Forgive me, I haven't yet understood your proposal about sampling rule above.

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: TABLESAMPLE doesn't actually satisfy the SQL spec, does it?

From

Petr Jelinek

Date:

16 July 2015, 13:50:50

On 2015-07-12 18:02, Tom Lane wrote:
>
> A possible way around this problem is to redefine the sampling rule so
> that it is not history-dependent but depends only on the tuple TIDs.
> For instance, one could hash the TID of a candidate tuple, xor that with
> a hash of the seed being used for the current query, and then select the
> tuple if (hash/MAXINT) < P.
>

That would work for bernoulli for physical tuples, yes. Only thing that 
worries me is future extensibility for data sources that only provide 
virtual tuples.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: TABLESAMPLE doesn't actually satisfy the SQL spec, does it?

From

Tom Lane

Date:

16 July 2015, 14:22:17

Petr Jelinek <petr@2ndquadrant.com> writes:
> On 2015-07-12 18:02, Tom Lane wrote:
>> A possible way around this problem is to redefine the sampling rule so
>> that it is not history-dependent but depends only on the tuple TIDs.
>> For instance, one could hash the TID of a candidate tuple, xor that with
>> a hash of the seed being used for the current query, and then select the
>> tuple if (hash/MAXINT) < P.

> That would work for bernoulli for physical tuples, yes. Only thing that 
> worries me is future extensibility for data sources that only provide 
> virtual tuples.

Well, repeatability of a TABLESAMPLE attached to a join seems like an
unsolved and possibly unsolvable problem anyway.  I don't think we should
assume that the API we define today will cope with that.

But that is another reason why the current API is inadequate: there's no
provision for specifying whether or how a tablesample method can be
applied to non-base-table RTEs.  (I re-read the thread and noted that
Peter E. complained about that some time ago, but nothing was done about
it.  I'm fine with not supporting the case right now, but nonetheless
it's another reason why we'd better make the API more easily extensible.)
        regards, tom lane

Re: TABLESAMPLE doesn't actually satisfy the SQL spec, does it?

From

Petr Jelinek

Date:

16 July 2015, 15:08:34

On 2015-07-16 16:22, Tom Lane wrote:
> Petr Jelinek <petr@2ndquadrant.com> writes:
>> On 2015-07-12 18:02, Tom Lane wrote:
>>> A possible way around this problem is to redefine the sampling rule so
>>> that it is not history-dependent but depends only on the tuple TIDs.
>>> For instance, one could hash the TID of a candidate tuple, xor that with
>>> a hash of the seed being used for the current query, and then select the
>>> tuple if (hash/MAXINT) < P.
>
>> That would work for bernoulli for physical tuples, yes. Only thing that
>> worries me is future extensibility for data sources that only provide
>> virtual tuples.
>
> Well, repeatability of a TABLESAMPLE attached to a join seems like an
> unsolved and possibly unsolvable problem anyway.  I don't think we should
> assume that the API we define today will cope with that.
>

Ok, It's true that the implementations I've seen in other databases so 
far only concern themselves by sampling physical relations and ignore 
the rest.

> But that is another reason why the current API is inadequate: there's no
> provision for specifying whether or how a tablesample method can be
> applied to non-base-table RTEs.  (I re-read the thread and noted that
> Peter E. complained about that some time ago, but nothing was done about
> it.  I'm fine with not supporting the case right now, but nonetheless
> it's another reason why we'd better make the API more easily extensible.)

Nothing in terms of implementation yes, I did write my idea on how this 
could be done via extending the current API in the future. I won't try 
to pretend that I am absolutely sure that the API might not need some 
breaking change to do that though.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services