Change sort order on UUIDs? - Mailing list pgsql-hackers
From | Robert Wojciechowski |
---|---|
Subject | Change sort order on UUIDs? |
Date | |
Msg-id | 85D4F2C294E8434CA0AF7757415326864AA826@server1.ssgi.local Whole thread Raw |
Responses |
Re: Change sort order on UUIDs?
Re: Change sort order on UUIDs? Re: Change sort order on UUIDs? |
List | pgsql-hackers |
<div class="Section1"><p class="MsoNormal">I’ve been testing the new UUID functionality in 8.3dev and noticed that UUIDsare sorted using memcmp in their default in-memory layout, which is:<p class="MsoNormal"> <p class="MsoNormal"> struct uuid {<p class="MsoNormal"> uint32_t time_low;<p class="MsoNormal"> uint16_t time_mid;<p class="MsoNormal"> uint16_t time_hi_and_version;<pclass="MsoNormal"> uint8_t clock_seq_hi_and_reserved;<p class="MsoNormal"> uint8_t clock_seq_low;<p class="MsoNormal"> uint8_t node[_UUID_NODE_LEN];<pclass="MsoNormal"> };<p class="MsoNormal"> <p class="MsoNormal">When done that way, you’re goingto see a lot of index B-tree fragmentation with even DCE 1.1 (ISO/IEC 11578:1996) time based UUIDs, as described above.With random (version 4) or hashed based (version 3 or 5) UUIDs there’s nothing that can be done to improve the situation,obviously.<p class="MsoNormal"> <p class="MsoNormal">So I went down the path of changing the pgsql sorting orderto instead sort by, from most significant to least:<p class="MsoNormal"> <p class="MsoListParagraph" style="margin-left:22.5pt;text-indent:-.25in; mso-list:l0 level1 lfo1"><span style="mso-list:Ignore">1)<span style="font:7.0pt "Times New Roman""> </span></span>Node(MAC address),<p class="MsoListParagraph" style="margin-left:22.5pt;text-indent:-.25in; mso-list:l0 level1 lfo1"><span style="mso-list:Ignore">2)<span style="font:7.0pt "Times New Roman""> </span></span>clocksequence, then<p class="MsoListParagraph" style="margin-left:22.5pt;text-indent:-.25in; mso-list:l0 level1 lfo1"><span style="mso-list:Ignore">3)<span style="font:7.0pt "Times New Roman""> </span></span>time.<pclass="MsoNormal"> <p class="MsoNormal">The implementation is as follows:<p class="MsoNormal"> <p class="MsoNormal">/*internal uuid compare function */<p class="MsoNormal">static int<p class="MsoNormal">uuid_internal_cmp(constpg_uuid_t *arg1, const pg_uuid_t *arg2)<p class="MsoNormal">{<p class="MsoNormal"> int result;<p class="MsoNormal"> <p class="MsoNormal"> /* node */<p class="MsoNormal"> if ((result =memcmp(&arg1->data[10], &arg2->data[10], 6)) != 0)<p class="MsoNormal"> return result;<p class="MsoNormal"> <pclass="MsoNormal"> /* clock_seq_hi_and_reserved, clock_seq_low */<p class="MsoNormal"> if ((result= memcmp(&arg1->data[8], &arg2->data[8], 2)) != 0)<p class="MsoNormal"> return result;<p class="MsoNormal"> <pclass="MsoNormal"> /* time_hi_and_version */<p class="MsoNormal"> if ((result = memcmp(&arg1->data[6],&arg2->data[6], 2)) != 0)<p class="MsoNormal"> return result;<p class="MsoNormal"> <pclass="MsoNormal"> /* time_mid */<p class="MsoNormal"> if ((result = memcmp(&arg1->data[4],&arg2->data[4], 2)) != 0)<p class="MsoNormal"> return result;<p class="MsoNormal"> <pclass="MsoNormal"> /* time_low */<p class="MsoNormal"> return memcmp(&arg1->data[0], &arg2->data[0],4);<p class="MsoNormal">}<p class="MsoNormal"> <p class="MsoNormal">This results in much less fragmentationand reduced page hits when indexing a UUID column. When multiple UUID generators with different node valuescontribute to a single table concurrently, it should also result in better performance than if they sorted the waythey do now or by time first.<p class="MsoNormal"> <p class="MsoNormal">Sorting UUIDs when they are random/hashed withmemcmp seems pretty darn useless in all scenarios and performs poorly on indexes. This method is equally poor with random/hashedUUIDs, but much better with version 1 time based UUIDs.<p class="MsoNormal"> <p class="MsoNormal">What do youguys think about changing the default behavior of pgsql to compare UUIDs this way?<p class="MsoNormal"> <p class="MsoNormal">--Robert</div>
pgsql-hackers by date: