Thread: check for interrupts in set_rtable_names

check for interrupts in set_rtable_names

From

Jeff Janes

Date:

13 November 2015, 22:51:14

Someone sent my server a deranged query, it tripped my
auto_explain.log_min_duration setting, that hit some kind of
pathological case while assigning aliases, and now it sits
uninterruptibly in set_rtable_names for hours.

Is there any reason we can't check for interrupts in set_rtable_names,
like the attached?

I've tried it on a deranged query and it seems to do the job.

A deranged query:

perl -le 'print "explain"; print "select * from pgbench_accounts where
aid=5 union" foreach 1..5000; print "select * from pgbench_accounts
where aid=5 ;"'|psql

Cheers,

Jeff

Attachment

rtable_names_interrupts.patch

Re: check for interrupts in set_rtable_names

From

Tom Lane

Date:

13 November 2015, 23:13:57

Jeff Janes <jeff.janes@gmail.com> writes:
> Someone sent my server a deranged query, it tripped my
> auto_explain.log_min_duration setting, that hit some kind of
> pathological case while assigning aliases, and now it sits
> uninterruptibly in set_rtable_names for hours.

> Is there any reason we can't check for interrupts in set_rtable_names,
> like the attached?

There's probably no reason not to do that, but I'd be much more interested
in eliminating the slowness to begin with ...
        regards, tom lane

Re: check for interrupts in set_rtable_names

From

Jeff Janes

Date:

13 November 2015, 23:37:03

On Fri, Nov 13, 2015 at 3:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> Someone sent my server a deranged query, it tripped my
>> auto_explain.log_min_duration setting, that hit some kind of
>> pathological case while assigning aliases, and now it sits
>> uninterruptibly in set_rtable_names for hours.
>
>> Is there any reason we can't check for interrupts in set_rtable_names,
>> like the attached?
>
> There's probably no reason not to do that, but I'd be much more interested
> in eliminating the slowness to begin with ...

I was thinking about that as well, but I don't think that would be
back-patchable, at least not the way I was envisioning it.

I was thinking of detecting bad cases (had to count to over 10 before
finding a novel name, more than 10 times) and then switching from an
object-local count, to a global count, for the numbers to add to the
name.  But that would be a behavior change under some conditions.

I think the code says it clearer than prose, so crude first attempt at
that attached.

Cheers,

Jeff

Attachment

rtable_names_faster.patch

Re: check for interrupts in set_rtable_names

From

Tom Lane

Date:

16 November 2015, 00:14:21

Jeff Janes <jeff.janes@gmail.com> writes:
> On Fri, Nov 13, 2015 at 3:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There's probably no reason not to do that, but I'd be much more interested
>> in eliminating the slowness to begin with ...

> I was thinking about that as well, but I don't think that would be
> back-patchable, at least not the way I was envisioning it.

> I was thinking of detecting bad cases (had to count to over 10 before
> finding a novel name, more than 10 times) and then switching from an
> object-local count, to a global count, for the numbers to add to the
> name.  But that would be a behavior change under some conditions.

I experimented with using a hash table to avoid the O(N^2) behavior.
This seems to work quite well, and I think it doesn't change the results
(leastwise, it does not break the regression tests).

It would be possible to do something similar to dodge the O(N^2) costs
in make_colname_unique/colname_is_unique; but it would be a lot messier,
and the tests I did suggest that it's fairly hard to get into a regime
where those functions are a huge part of the runtime anyway, because
we have limits on the numbers of columns.  So I'm inclined to leave that
alone.

            regards, tom lane

diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 3388f7c..2b6de8f 100644
*** a/src/backend/utils/adt/ruleutils.c
--- b/src/backend/utils/adt/ruleutils.c
***************
*** 55,60 ****
--- 55,61 ----
  #include "utils/array.h"
  #include "utils/builtins.h"
  #include "utils/fmgroids.h"
+ #include "utils/hsearch.h"
  #include "utils/lsyscache.h"
  #include "utils/rel.h"
  #include "utils/ruleutils.h"
*************** typedef struct
*** 267,272 ****
--- 268,282 ----
  #define deparse_columns_fetch(rangetable_index, dpns) \
      ((deparse_columns *) list_nth((dpns)->rtable_columns, (rangetable_index)-1))

+ /*
+  * Entry in set_rtable_names' hash table
+  */
+ typedef struct
+ {
+     char        name[NAMEDATALEN];        /* Hash key --- must be first */
+     int            counter;        /* Largest addition used so far for name */
+ } NameHashEntry;
+

  /* ----------
   * Global data
*************** static void print_function_rettype(Strin
*** 312,319 ****
  static void print_function_trftypes(StringInfo buf, HeapTuple proctup);
  static void set_rtable_names(deparse_namespace *dpns, List *parent_namespaces,
                   Bitmapset *rels_used);
- static bool refname_is_unique(char *refname, deparse_namespace *dpns,
-                   List *parent_namespaces);
  static void set_deparse_for_query(deparse_namespace *dpns, Query *query,
                        List *parent_namespaces);
  static void set_simple_column_names(deparse_namespace *dpns);
--- 322,327 ----
*************** static void
*** 2676,2690 ****
  set_rtable_names(deparse_namespace *dpns, List *parent_namespaces,
                   Bitmapset *rels_used)
  {
      ListCell   *lc;
-     int            rtindex = 1;

      dpns->rtable_names = NIL;
      foreach(lc, dpns->rtable)
      {
          RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
          char       *refname;

          if (rels_used && !bms_is_member(rtindex, rels_used))
          {
              /* Ignore unreferenced RTE */
--- 2684,2744 ----
  set_rtable_names(deparse_namespace *dpns, List *parent_namespaces,
                   Bitmapset *rels_used)
  {
+     HASHCTL        hash_ctl;
+     HTAB       *names_hash;
+     NameHashEntry *hentry;
+     bool        found;
+     int            rtindex;
      ListCell   *lc;

      dpns->rtable_names = NIL;
+     /* nothing more to do if empty rtable */
+     if (dpns->rtable == NIL)
+         return;
+
+     /*
+      * We use a hash table to hold known names, so that this process is O(N)
+      * not O(N^2) for N names.
+      */
+     MemSet(&hash_ctl, 0, sizeof(hash_ctl));
+     hash_ctl.keysize = NAMEDATALEN;
+     hash_ctl.entrysize = sizeof(NameHashEntry);
+     hash_ctl.hcxt = CurrentMemoryContext;
+     names_hash = hash_create("set_rtable_names names",
+                              list_length(dpns->rtable),
+                              &hash_ctl,
+                              HASH_ELEM | HASH_CONTEXT);
+     /* Preload the hash table with names appearing in parent_namespaces */
+     foreach(lc, parent_namespaces)
+     {
+         deparse_namespace *olddpns = (deparse_namespace *) lfirst(lc);
+         ListCell   *lc2;
+
+         foreach(lc2, olddpns->rtable_names)
+         {
+             char       *oldname = (char *) lfirst(lc2);
+
+             if (oldname == NULL)
+                 continue;
+             hentry = (NameHashEntry *) hash_search(names_hash,
+                                                    oldname,
+                                                    HASH_ENTER,
+                                                    &found);
+             /* we do not complain about duplicate names in parent namespaces */
+             hentry->counter = 0;
+         }
+     }
+
+     /* Now we can scan the rtable */
+     rtindex = 1;
      foreach(lc, dpns->rtable)
      {
          RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
          char       *refname;

+         /* Just in case this takes an unreasonable amount of time ... */
+         CHECK_FOR_INTERRUPTS();
+
          if (rels_used && !bms_is_member(rtindex, rels_used))
          {
              /* Ignore unreferenced RTE */
*************** set_rtable_names(deparse_namespace *dpns
*** 2712,2767 ****
          }

          /*
!          * If the selected name isn't unique, append digits to make it so
           */
!         if (refname &&
!             !refname_is_unique(refname, dpns, parent_namespaces))
          {
!             char       *modname = (char *) palloc(strlen(refname) + 32);
!             int            i = 0;

!             do
              {
!                 sprintf(modname, "%s_%d", refname, ++i);
!             } while (!refname_is_unique(modname, dpns, parent_namespaces));
!             refname = modname;
          }

          dpns->rtable_names = lappend(dpns->rtable_names, refname);
          rtindex++;
      }
- }
-
- /*
-  * refname_is_unique: is refname distinct from all already-chosen RTE names?
-  */
- static bool
- refname_is_unique(char *refname, deparse_namespace *dpns,
-                   List *parent_namespaces)
- {
-     ListCell   *lc;
-
-     foreach(lc, dpns->rtable_names)
-     {
-         char       *oldname = (char *) lfirst(lc);
-
-         if (oldname && strcmp(oldname, refname) == 0)
-             return false;
-     }
-     foreach(lc, parent_namespaces)
-     {
-         deparse_namespace *olddpns = (deparse_namespace *) lfirst(lc);
-         ListCell   *lc2;
-
-         foreach(lc2, olddpns->rtable_names)
-         {
-             char       *oldname = (char *) lfirst(lc2);

!             if (oldname && strcmp(oldname, refname) == 0)
!                 return false;
!         }
!     }
!     return true;
  }

  /*
--- 2766,2809 ----
          }

          /*
!          * If the selected name isn't unique, append digits to make it so, and
!          * make a new hash entry for it once we've got a unique name.
           */
!         if (refname)
          {
!             hentry = (NameHashEntry *) hash_search(names_hash,
!                                                    refname,
!                                                    HASH_ENTER,
!                                                    &found);
!             if (found)
!             {
!                 /* Name already in use, must choose a new one */
!                 char       *modname = (char *) palloc(strlen(refname) + 32);
!                 NameHashEntry *hentry2;

!                 do
!                 {
!                     sprintf(modname, "%s_%d", refname, ++(hentry->counter));
!                     hentry2 = (NameHashEntry *) hash_search(names_hash,
!                                                             modname,
!                                                             HASH_ENTER,
!                                                             &found);
!                 } while (found);
!                 hentry2->counter = 0;    /* init new hash entry */
!                 refname = modname;
!             }
!             else
              {
!                 /* Name not previously used, need only initialize hentry */
!                 hentry->counter = 0;
!             }
          }

          dpns->rtable_names = lappend(dpns->rtable_names, refname);
          rtindex++;
      }

!     hash_destroy(names_hash);
  }

  /*

Re: check for interrupts in set_rtable_names

From

Tom Lane

Date:

16 November 2015, 16:35:26

I wrote:
> I experimented with using a hash table to avoid the O(N^2) behavior.
> This seems to work quite well, and I think it doesn't change the results
> (leastwise, it does not break the regression tests).

Looking at this again in the light of morning, it occurred to me that it's
pretty broken in the face of long (approaching NAMEDATALEN) input
identifiers.  If the de-duplication digits are beyond the NAMEDATALEN
threshold, it would actually get into an infinite loop because hash_search
would ignore them as not part of the key.  That can be fixed by truncating
the name appropriately.

However: actually, this code had a problem already with long identifiers.
What happened before was that it would blithely add digits and thereby
produce an overlength identifier, which indeed looks distinct from the
other ones in the query --- but if we're dumping a view/rule, the
identifier won't be distinct after truncation, which means that
dump/reload could fail, or even worse, silently produce something with
different semantics than intended.

So the attached updated patch takes care of that problem, not only for
relation names but also for column names.

I had been leaning towards not back-patching this, but I now feel that
we must back-patch at least the truncation fix, and we probably might as
well back-patch it in toto.  Comments?

            regards, tom lane

diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 3388f7c..771b14b 100644
*** a/src/backend/utils/adt/ruleutils.c
--- b/src/backend/utils/adt/ruleutils.c
***************
*** 38,43 ****
--- 38,44 ----
  #include "commands/tablespace.h"
  #include "executor/spi.h"
  #include "funcapi.h"
+ #include "mb/pg_wchar.h"
  #include "miscadmin.h"
  #include "nodes/makefuncs.h"
  #include "nodes/nodeFuncs.h"
***************
*** 55,60 ****
--- 56,62 ----
  #include "utils/array.h"
  #include "utils/builtins.h"
  #include "utils/fmgroids.h"
+ #include "utils/hsearch.h"
  #include "utils/lsyscache.h"
  #include "utils/rel.h"
  #include "utils/ruleutils.h"
*************** typedef struct
*** 267,272 ****
--- 269,283 ----
  #define deparse_columns_fetch(rangetable_index, dpns) \
      ((deparse_columns *) list_nth((dpns)->rtable_columns, (rangetable_index)-1))

+ /*
+  * Entry in set_rtable_names' hash table
+  */
+ typedef struct
+ {
+     char        name[NAMEDATALEN];        /* Hash key --- must be first */
+     int            counter;        /* Largest addition used so far for name */
+ } NameHashEntry;
+

  /* ----------
   * Global data
*************** static void print_function_rettype(Strin
*** 312,319 ****
  static void print_function_trftypes(StringInfo buf, HeapTuple proctup);
  static void set_rtable_names(deparse_namespace *dpns, List *parent_namespaces,
                   Bitmapset *rels_used);
- static bool refname_is_unique(char *refname, deparse_namespace *dpns,
-                   List *parent_namespaces);
  static void set_deparse_for_query(deparse_namespace *dpns, Query *query,
                        List *parent_namespaces);
  static void set_simple_column_names(deparse_namespace *dpns);
--- 323,328 ----
*************** static void
*** 2676,2690 ****
  set_rtable_names(deparse_namespace *dpns, List *parent_namespaces,
                   Bitmapset *rels_used)
  {
      ListCell   *lc;
-     int            rtindex = 1;

      dpns->rtable_names = NIL;
      foreach(lc, dpns->rtable)
      {
          RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
          char       *refname;

          if (rels_used && !bms_is_member(rtindex, rels_used))
          {
              /* Ignore unreferenced RTE */
--- 2685,2745 ----
  set_rtable_names(deparse_namespace *dpns, List *parent_namespaces,
                   Bitmapset *rels_used)
  {
+     HASHCTL        hash_ctl;
+     HTAB       *names_hash;
+     NameHashEntry *hentry;
+     bool        found;
+     int            rtindex;
      ListCell   *lc;

      dpns->rtable_names = NIL;
+     /* nothing more to do if empty rtable */
+     if (dpns->rtable == NIL)
+         return;
+
+     /*
+      * We use a hash table to hold known names, so that this process is O(N)
+      * not O(N^2) for N names.
+      */
+     MemSet(&hash_ctl, 0, sizeof(hash_ctl));
+     hash_ctl.keysize = NAMEDATALEN;
+     hash_ctl.entrysize = sizeof(NameHashEntry);
+     hash_ctl.hcxt = CurrentMemoryContext;
+     names_hash = hash_create("set_rtable_names names",
+                              list_length(dpns->rtable),
+                              &hash_ctl,
+                              HASH_ELEM | HASH_CONTEXT);
+     /* Preload the hash table with names appearing in parent_namespaces */
+     foreach(lc, parent_namespaces)
+     {
+         deparse_namespace *olddpns = (deparse_namespace *) lfirst(lc);
+         ListCell   *lc2;
+
+         foreach(lc2, olddpns->rtable_names)
+         {
+             char       *oldname = (char *) lfirst(lc2);
+
+             if (oldname == NULL)
+                 continue;
+             hentry = (NameHashEntry *) hash_search(names_hash,
+                                                    oldname,
+                                                    HASH_ENTER,
+                                                    &found);
+             /* we do not complain about duplicate names in parent namespaces */
+             hentry->counter = 0;
+         }
+     }
+
+     /* Now we can scan the rtable */
+     rtindex = 1;
      foreach(lc, dpns->rtable)
      {
          RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
          char       *refname;

+         /* Just in case this takes an unreasonable amount of time ... */
+         CHECK_FOR_INTERRUPTS();
+
          if (rels_used && !bms_is_member(rtindex, rels_used))
          {
              /* Ignore unreferenced RTE */
*************** set_rtable_names(deparse_namespace *dpns
*** 2712,2767 ****
          }

          /*
!          * If the selected name isn't unique, append digits to make it so
           */
!         if (refname &&
!             !refname_is_unique(refname, dpns, parent_namespaces))
          {
!             char       *modname = (char *) palloc(strlen(refname) + 32);
!             int            i = 0;

!             do
              {
!                 sprintf(modname, "%s_%d", refname, ++i);
!             } while (!refname_is_unique(modname, dpns, parent_namespaces));
!             refname = modname;
          }

          dpns->rtable_names = lappend(dpns->rtable_names, refname);
          rtindex++;
      }
- }
-
- /*
-  * refname_is_unique: is refname distinct from all already-chosen RTE names?
-  */
- static bool
- refname_is_unique(char *refname, deparse_namespace *dpns,
-                   List *parent_namespaces)
- {
-     ListCell   *lc;
-
-     foreach(lc, dpns->rtable_names)
-     {
-         char       *oldname = (char *) lfirst(lc);

!         if (oldname && strcmp(oldname, refname) == 0)
!             return false;
!     }
!     foreach(lc, parent_namespaces)
!     {
!         deparse_namespace *olddpns = (deparse_namespace *) lfirst(lc);
!         ListCell   *lc2;
!
!         foreach(lc2, olddpns->rtable_names)
!         {
!             char       *oldname = (char *) lfirst(lc2);
!
!             if (oldname && strcmp(oldname, refname) == 0)
!                 return false;
!         }
!     }
!     return true;
  }

  /*
--- 2767,2828 ----
          }

          /*
!          * If the selected name isn't unique, append digits to make it so, and
!          * make a new hash entry for it once we've got a unique name.  For a
!          * very long input name, we might have to truncate to stay within
!          * NAMEDATALEN.
           */
!         if (refname)
          {
!             hentry = (NameHashEntry *) hash_search(names_hash,
!                                                    refname,
!                                                    HASH_ENTER,
!                                                    &found);
!             if (found)
!             {
!                 /* Name already in use, must choose a new one */
!                 int            refnamelen = strlen(refname);
!                 char       *modname = (char *) palloc(refnamelen + 16);
!                 NameHashEntry *hentry2;

!                 do
!                 {
!                     hentry->counter++;
!                     for (;;)
!                     {
!                         /*
!                          * We avoid using %.*s here because it can misbehave
!                          * if the data is not valid in what libc thinks is the
!                          * prevailing encoding.
!                          */
!                         memcpy(modname, refname, refnamelen);
!                         sprintf(modname + refnamelen, "_%d", hentry->counter);
!                         if (strlen(modname) < NAMEDATALEN)
!                             break;
!                         /* drop chars from refname to keep all the digits */
!                         refnamelen = pg_mbcliplen(refname, refnamelen,
!                                                   refnamelen - 1);
!                     }
!                     hentry2 = (NameHashEntry *) hash_search(names_hash,
!                                                             modname,
!                                                             HASH_ENTER,
!                                                             &found);
!                 } while (found);
!                 hentry2->counter = 0;    /* init new hash entry */
!                 refname = modname;
!             }
!             else
              {
!                 /* Name not previously used, need only initialize hentry */
!                 hentry->counter = 0;
!             }
          }

          dpns->rtable_names = lappend(dpns->rtable_names, refname);
          rtindex++;
      }

!     hash_destroy(names_hash);
  }

  /*
*************** make_colname_unique(char *colname, depar
*** 3589,3604 ****
                      deparse_columns *colinfo)
  {
      /*
!      * If the selected name isn't unique, append digits to make it so
       */
      if (!colname_is_unique(colname, dpns, colinfo))
      {
!         char       *modname = (char *) palloc(strlen(colname) + 32);
          int            i = 0;

          do
          {
!             sprintf(modname, "%s_%d", colname, ++i);
          } while (!colname_is_unique(modname, dpns, colinfo));
          colname = modname;
      }
--- 3650,3683 ----
                      deparse_columns *colinfo)
  {
      /*
!      * If the selected name isn't unique, append digits to make it so.  For a
!      * very long input name, we might have to truncate to stay within
!      * NAMEDATALEN.
       */
      if (!colname_is_unique(colname, dpns, colinfo))
      {
!         int            colnamelen = strlen(colname);
!         char       *modname = (char *) palloc(colnamelen + 16);
          int            i = 0;

          do
          {
!             i++;
!             for (;;)
!             {
!                 /*
!                  * We avoid using %.*s here because it can misbehave if the
!                  * data is not valid in what libc thinks is the prevailing
!                  * encoding.
!                  */
!                 memcpy(modname, colname, colnamelen);
!                 sprintf(modname + colnamelen, "_%d", i);
!                 if (strlen(modname) < NAMEDATALEN)
!                     break;
!                 /* drop chars from colname to keep all the digits */
!                 colnamelen = pg_mbcliplen(colname, colnamelen,
!                                           colnamelen - 1);
!             }
          } while (!colname_is_unique(modname, dpns, colinfo));
          colname = modname;
      }

Re: check for interrupts in set_rtable_names

From

Jeff Janes

Date:

16 November 2015, 23:52:35

On Mon, Nov 16, 2015 at 8:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> I experimented with using a hash table to avoid the O(N^2) behavior.
>> This seems to work quite well, and I think it doesn't change the results
>> (leastwise, it does not break the regression tests).
>
> Looking at this again in the light of morning, it occurred to me that it's
> pretty broken in the face of long (approaching NAMEDATALEN) input
> identifiers.  If the de-duplication digits are beyond the NAMEDATALEN
> threshold, it would actually get into an infinite loop because hash_search
> would ignore them as not part of the key.  That can be fixed by truncating
> the name appropriately.
>
> However: actually, this code had a problem already with long identifiers.
> What happened before was that it would blithely add digits and thereby
> produce an overlength identifier, which indeed looks distinct from the
> other ones in the query --- but if we're dumping a view/rule, the
> identifier won't be distinct after truncation, which means that
> dump/reload could fail, or even worse, silently produce something with
> different semantics than intended.
>
> So the attached updated patch takes care of that problem, not only for
> relation names but also for column names.
>
> I had been leaning towards not back-patching this, but I now feel that
> we must back-patch at least the truncation fix, and we probably might as
> well back-patch it in toto.  Comments?

As-committed it has solved the problem, as best as I can tell.

Thanks,

Jeff