Thread: notify duplicate elimination performance
I know that it is not a big problem for most users, but allowing a very large number of notifications while using linear search is a bit dumb. I can fix this with a very small modification to struct Notification: {char *channel ;char *payload ;uint32 hash ;struct Notification *left ;struct Notification *right ; } AsyncExistsPendingNotify does an iterative binary tree search. The tree is insert-only, there is no need for rebalancing, and the code is quite simple. Any comments?
Hi, On 2014-02-08 18:56:41 +0100, Hardy Falk wrote: > I know that it is not a big problem for most users, but allowing a very > large number of notifications while using linear search is a bit dumb. > I can fix this with a very small modification to > struct Notification: > { > char *channel ; > char *payload ; > uint32 hash ; > struct Notification *left ; > struct Notification *right ; > } > AsyncExistsPendingNotify does an iterative binary tree search. > The tree is insert-only, there is no need for rebalancing, and the code > is quite simple. > Any comments? Well, you didn't add any code, so it's hard to say... Simple ways of doing what I think you describe will remove the queue's order. Do you preserve the ordering guarantees? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
> Well, you didn't add any code, so it's hard to say... Simple ways of > doing what I think you describe will remove the queue's order. Do you > preserve the ordering guarantees? > > Greetings, > > Andres Freund > Yes, the order is preserved. I didn't remove the the original list code. The tree is just an additional access path. > oldcontext = MemoryContextSwitchTo(CurTransactionContext); > > n = (Notification *) palloc(sizeof(Notification)); > n->channel = pstrdup(channel); > if (payload) > n->payload = pstrdup(payload); > else > n->payload = ""; > n->hash = hash ; > n->left = NULL ; > n->right= NULL ; > *tt = n ; tt is a Notification** obtained by the search. > static Notification **search(const char *channel, const char *payload ) > { > Notification *t,**tt ; > uint32 hash ; > t = hashroot ; > tt = &hashroot ; > hash = hashf(channel,691) ; > hash = hashf(payload,hash) ; > while ( t ) > { > if ( hash < t->hash ) > { > tt = &t->left ; > t = t->left ; > } > else if ( hash > t->hash ) > { > tt = &t->right ; > t = t->right ; > } > else > { > if (0==strcmp(t->channel,channel) && 0==strcmp(t->payload,payload)) > { > return NULL > } > else > { > tt = &t->left ; > t = t->left ; > } > } > } > return tt ; > }
Hi, On 2014-02-08 19:28:56 +0100, Hardy Falk wrote: > > Well, you didn't add any code, so it's hard to say... Simple ways of > > doing what I think you describe will remove the queue's order. Do you > > preserve the ordering guarantees? > > > > Greetings, > > > > Andres Freund > > > Yes, the order is preserved. > I didn't remove the the original list code. > The tree is just an additional access path. How about actually attaching a patch? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Hardy Falk <hardy.falk@blue-cable.de> writes: >> Well, you didn't add any code, so it's hard to say... Simple ways of >> doing what I think you describe will remove the queue's order. Do you >> preserve the ordering guarantees? > Yes, the order is preserved. > I didn't remove the the original list code. > The tree is just an additional access path. It seems likely that this approach would be a net loss for small numbers of notify events (which is surely the common case). Have you done any measurements of the performance tradeoff? regards, tom lane
Tom Lane schrieb: > Hardy Falk <hardy.falk@blue-cable.de> writes: >>> Well, you didn't add any code, so it's hard to say... Simple ways of >>> doing what I think you describe will remove the queue's order. Do you >>> preserve the ordering guarantees? > >> Yes, the order is preserved. >> I didn't remove the the original list code. >> The tree is just an additional access path. > > It seems likely that this approach would be a net loss for small numbers > of notify events (which is surely the common case). Have you done any > measurements of the performance tradeoff? > > regards, tom lane > > I can easily keep the tail test. This avoids the hash computation in a common case. The tree search compares only uint32 values, this should be quite fast. Even a degenerate tree is no worse than the list. Im my first tests I didn't use murmurhash, a select pg_notify('alasys',format('Hello %s',x) from generate_series(1,1000000) as x triggered this worst case. With murmurhash2 the average search depth for 10^6 entries is 24.57. I am about to prepare a patch, should I do some performance tests with rtdsc first?