Change sort order on UUIDs? - Mailing list pgsql-hackers
| From | Robert Wojciechowski |
|---|---|
| Subject | Change sort order on UUIDs? |
| Date | |
| Msg-id | 85D4F2C294E8434CA0AF7757415326864AA826@server1.ssgi.local Whole thread Raw |
| Responses |
Re: Change sort order on UUIDs?
Re: Change sort order on UUIDs? Re: Change sort order on UUIDs? |
| List | pgsql-hackers |
<div class="Section1"><p class="MsoNormal">I’ve been testing the new UUID functionality in 8.3dev and noticed that
UUIDsare sorted using memcmp in their default in-memory layout, which is:<p class="MsoNormal"> <p
class="MsoNormal"> struct uuid {<p class="MsoNormal"> uint32_t time_low;<p
class="MsoNormal"> uint16_t time_mid;<p class="MsoNormal"> uint16_t
time_hi_and_version;<pclass="MsoNormal"> uint8_t clock_seq_hi_and_reserved;<p
class="MsoNormal"> uint8_t clock_seq_low;<p class="MsoNormal"> uint8_t
node[_UUID_NODE_LEN];<pclass="MsoNormal"> };<p class="MsoNormal"> <p class="MsoNormal">When done that way, you’re
goingto see a lot of index B-tree fragmentation with even DCE 1.1 (ISO/IEC 11578:1996) time based UUIDs, as described
above.With random (version 4) or hashed based (version 3 or 5) UUIDs there’s nothing that can be done to improve the
situation,obviously.<p class="MsoNormal"> <p class="MsoNormal">So I went down the path of changing the pgsql sorting
orderto instead sort by, from most significant to least:<p class="MsoNormal"> <p class="MsoListParagraph"
style="margin-left:22.5pt;text-indent:-.25in;
mso-list:l0 level1 lfo1"><span style="mso-list:Ignore">1)<span style="font:7.0pt "Times New Roman"">
</span></span>Node(MAC address),<p class="MsoListParagraph" style="margin-left:22.5pt;text-indent:-.25in;
mso-list:l0 level1 lfo1"><span style="mso-list:Ignore">2)<span style="font:7.0pt "Times New Roman"">
</span></span>clocksequence, then<p class="MsoListParagraph" style="margin-left:22.5pt;text-indent:-.25in;
mso-list:l0 level1 lfo1"><span style="mso-list:Ignore">3)<span style="font:7.0pt "Times New Roman"">
</span></span>time.<pclass="MsoNormal"> <p class="MsoNormal">The implementation is as follows:<p class="MsoNormal"> <p
class="MsoNormal">/*internal uuid compare function */<p class="MsoNormal">static int<p
class="MsoNormal">uuid_internal_cmp(constpg_uuid_t *arg1, const pg_uuid_t *arg2)<p class="MsoNormal">{<p
class="MsoNormal"> int result;<p class="MsoNormal"> <p class="MsoNormal"> /* node */<p class="MsoNormal"> if ((result
=memcmp(&arg1->data[10], &arg2->data[10], 6)) != 0)<p class="MsoNormal"> return result;<p
class="MsoNormal"> <pclass="MsoNormal"> /* clock_seq_hi_and_reserved, clock_seq_low */<p class="MsoNormal"> if
((result= memcmp(&arg1->data[8], &arg2->data[8], 2)) != 0)<p class="MsoNormal"> return result;<p
class="MsoNormal"> <pclass="MsoNormal"> /* time_hi_and_version */<p class="MsoNormal"> if ((result =
memcmp(&arg1->data[6],&arg2->data[6], 2)) != 0)<p class="MsoNormal"> return result;<p
class="MsoNormal"> <pclass="MsoNormal"> /* time_mid */<p class="MsoNormal"> if ((result =
memcmp(&arg1->data[4],&arg2->data[4], 2)) != 0)<p class="MsoNormal"> return result;<p
class="MsoNormal"> <pclass="MsoNormal"> /* time_low */<p class="MsoNormal"> return memcmp(&arg1->data[0],
&arg2->data[0],4);<p class="MsoNormal">}<p class="MsoNormal"> <p class="MsoNormal">This results in much less
fragmentationand reduced page hits when indexing a UUID column. When multiple UUID generators with different node
valuescontribute to a single table concurrently, it should also result in better performance than if they sorted the
waythey do now or by time first.<p class="MsoNormal"> <p class="MsoNormal">Sorting UUIDs when they are random/hashed
withmemcmp seems pretty darn useless in all scenarios and performs poorly on indexes. This method is equally poor with
random/hashedUUIDs, but much better with version 1 time based UUIDs.<p class="MsoNormal"> <p class="MsoNormal">What do
youguys think about changing the default behavior of pgsql to compare UUIDs this way?<p class="MsoNormal"> <p
class="MsoNormal">--Robert</div>
pgsql-hackers by date: