Re: MD5 aggregate - Mailing list pgsql-hackers

From Andres Freund
Subject Re: MD5 aggregate
Date
Msg-id 20130614145657.GH19500@alap2.anarazel.de
Whole thread Raw
In response to Re: MD5 aggregate  (Dean Rasheed <dean.a.rasheed@gmail.com>)
List pgsql-hackers
On 2013-06-14 15:49:31 +0100, Dean Rasheed wrote:
> On 14 June 2013 15:19, Stephen Frost <sfrost@snowman.net> wrote:
> > * Andrew Dunstan (andrew@dunslane.net) wrote:
> >> I'd rather go the other way, processing the records without having
> >> to process them otherwise at all. Turning things into text must slow
> >> things down, surely.
> >
> > That's certainly an interesting idea also..
> >
> 
> md5_agg(record) ?
> 
> Yes, I like it.

It's more complex than just memcmp()ing HeapTupleData though. At least
if the Datum contains varlena columns there's so many different
representations (short, long, compressed, external, external compressed)
of the same data that a md5 without normalizing that wouldn't be very
interesting.
So you would at least need a normalizing version of
toast_flatten_tuple() that also deals with short/long varlenas. But even
after that, you would still need to deal with Datums that can have
different representation (like short numerics, old style hstore, ...).

It might be more realistic to use the binary output functions, but I am
not sure whether all of those are sufficiently reproduceable.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Dean Rasheed
Date:
Subject: Re: MD5 aggregate
Next
From: Hannu Krosing
Date:
Subject: Re: MD5 aggregate