Thread: Wording in TABLESAMPLE documentation
The following documentation comment has been logged on the website: Page: https://www.postgresql.org/docs/9.6/static/sql-select.html Description: Regarding the TABLESAMPLE documentation on [1], I think in the following sentence > If REPEATABLE is not given then a new random sample is selected for each query. the word "sample" should be "seed". Of course it results in a new random sample as well, but IMHO this sentence is about what happens to the seed in case REPEATABLE (seed) is omitted. Best regards, Patrik Wenger [1] https://www.postgresql.org/docs/9.6/static/sql-select.html
On 11 August 2016 at 17:21, <paddor@gmail.com> wrote: > The following documentation comment has been logged on the website: > > Page: https://www.postgresql.org/docs/9.6/static/sql-select.html > Description: > > Regarding the TABLESAMPLE documentation on [1], I think in the following > sentence > > > If REPEATABLE is not given then a new random sample is selected for each > query. > > the word "sample" should be "seed". Of course it results in a new random > sample as well, but IMHO this sentence is about what happens to the seed in > case REPEATABLE (seed) is omitted. Corrected, thanks. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Simon Riggs <simon@2ndquadrant.com> writes: > On 11 August 2016 at 17:21, <paddor@gmail.com> wrote: >> > If REPEATABLE is not given then a new random sample is selected for each >> query. >> >> the word "sample" should be "seed". Of course it results in a new random >> sample as well, but IMHO this sentence is about what happens to the seed in >> case REPEATABLE (seed) is omitted. > Corrected, thanks. I do not think this is an improvement. The sentence was specifically about whether the sample (that is, the set of rows selected) would change. This rewording essentially removes that user-visible behavior guarantee, and for what? It's certainly not any clearer. regards, tom lane
On 12 August 2016 at 15:24, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndquadrant.com> writes: >> On 11 August 2016 at 17:21, <paddor@gmail.com> wrote: >>> > If REPEATABLE is not given then a new random sample is selected for each >>> query. >>> >>> the word "sample" should be "seed". Of course it results in a new random >>> sample as well, but IMHO this sentence is about what happens to the seed in >>> case REPEATABLE (seed) is omitted. > >> Corrected, thanks. > > I do not think this is an improvement. The sentence was specifically about > whether the sample (that is, the set of rows selected) would change. This > rewording essentially removes that user-visible behavior guarantee, and > for what? It's certainly not any clearer. It was supposed to be a correction, rather than an improvement. I saw the use of the word "sample" as an error. But now you mention it, I agree with you. Let's put it back to say "sample" but also explain where that new sample comes from... my attempt to explain this better is in square brackets "If REPEATABLE is not given then a new random sample will be taken for each query [based upon the global seed value for the current user.]" -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Simon Riggs <simon@2ndquadrant.com> writes: > But now you mention it, I agree with you. Let's put it back to say > "sample" but also explain where that new sample comes from... my > attempt to explain this better is in square brackets > "If REPEATABLE is not given then a new random sample will be taken for > each query [based upon the global seed value for the current user.]" I think "global" might have implications we don't want. How about adding ", based on a system-generated seed"? regards, tom lane
On 12 August 2016 at 16:23, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndquadrant.com> writes: >> But now you mention it, I agree with you. Let's put it back to say >> "sample" but also explain where that new sample comes from... my >> attempt to explain this better is in square brackets > >> "If REPEATABLE is not given then a new random sample will be taken for >> each query [based upon the global seed value for the current user.]" > > I think "global" might have implications we don't want. How about > adding ", based on a system-generated seed"? What I was trying to express was that SELECT setseed(dp); SELECT * FROM foo TABLESAMPLE ...; SELECT * FROM foo TABLESAMPLE ...; SELECT * FROM foo TABLESAMPLE ...; would yield a repeatable set of samples, similarly repeatable but not same samples as SELECT * FROM foo TABLESAMPLE ... REPEATABLE; SELECT * FROM foo TABLESAMPLE ... REPEATABLE; SELECT * FROM foo TABLESAMPLE ... REPEATABLE; so that people understand there is some predictability even without REPEATABLE. So I don't understand the "based on a system-generated seed", but maybe I'm missing information. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Simon Riggs <simon@2ndquadrant.com> writes: > On 12 August 2016 at 16:23, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think "global" might have implications we don't want. How about >> adding ", based on a system-generated seed"? > What I was trying to express was that > SELECT setseed(dp); > SELECT * FROM foo TABLESAMPLE ...; > SELECT * FROM foo TABLESAMPLE ...; > SELECT * FROM foo TABLESAMPLE ...; > would yield a repeatable set of samples, similarly repeatable but not > same samples as > SELECT * FROM foo TABLESAMPLE ... REPEATABLE; > SELECT * FROM foo TABLESAMPLE ... REPEATABLE; > SELECT * FROM foo TABLESAMPLE ... REPEATABLE; But that's *wrong*. Not all tablesample methods make any such guarantee. In fact, neither of our contrib methods do. Only if you use REPEATABLE (and the method allows it) is there any promise at all about repeatability. regards, tom lane
On 12 August 2016 at 18:54, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndquadrant.com> writes: >> On 12 August 2016 at 16:23, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I think "global" might have implications we don't want. How about >>> adding ", based on a system-generated seed"? > >> What I was trying to express was that > >> SELECT setseed(dp); >> SELECT * FROM foo TABLESAMPLE ...; >> SELECT * FROM foo TABLESAMPLE ...; >> SELECT * FROM foo TABLESAMPLE ...; > >> would yield a repeatable set of samples, similarly repeatable but not >> same samples as > >> SELECT * FROM foo TABLESAMPLE ... REPEATABLE; >> SELECT * FROM foo TABLESAMPLE ... REPEATABLE; >> SELECT * FROM foo TABLESAMPLE ... REPEATABLE; > > But that's *wrong*. Not all tablesample methods make any such guarantee. > In fact, neither of our contrib methods do. Only if you use REPEATABLE > (and the method allows it) is there any promise at all about repeatability. OK, fair enough. I'll just use your wording then. Thanks. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services