Thread: Weird procedure question
Hi all, I've got an interesting use case that I am stuck on. It's a bit of a complicated environment, but I'll try to keep it simple. In short; I have a history schema that has tables that match the public schema, plus one 'history_id' column that has a simple sequential bigserial value. Then I have a procedure and trigger that, on UPDATE or INSERT, copies the data to history. Example use case is that I can UPDATE a sensor value in the public table and it's also INSERTs the data into history. So public shows just the most recent values, but I can see changes over time in the history schema. I have built my system to support writing to one or more DBs. I keep a list of connected DBs and send INSERT/UPDATE calls to a method that then runs the UPDATE/INSERT against all connected databases, as a form of redundancy. This all works fine. The problem I've hit is that the 'history_id' differs between the various databases. So I want to switch this to 'history_uuid' and use UUIDs instead of bigserial. Now the question; Can I tell a produce to use a specific UUID? The idea is to generate a UUID for 'history_uuid' so that I have consistency across databases. Of course, if an UPDATE will change multiple rows, then I'll need to predefine multiple UUIDs. This is where things start to get really complicated I think... Maybe I could pass an array of UUIDs? I don't care if I find out which UUID was used for which record, just that the same UUID was used for the same record when the procedure is (re)run on other DBs. The databases are not clustered, on purpose. I've been trying to handle all the HA stuff in my application for various reasons. If it helps, here is an example pair of tables, the procedure and the trigger I currently use; ==== CREATE TABLE host_variable ( host_variable_uuid uuid not null primary key, host_variable_host_uuid uuid not null, host_variable_name text not null, host_variable_value text not null, modified_date timestamp with time zone not null ); ALTER TABLE host_variable OWNER TO admin; CREATE TABLE history.host_variable ( history_id bigserial, host_variable_uuid uuid, host_variable_host_uuid uuid, host_variable_name text, host_variable_value text, modified_date timestamp with time zone not null ); ALTER TABLE history.host_variable OWNER TO admin; CREATE FUNCTION history_host_variable() RETURNS trigger AS $$ DECLARE history_host_variable RECORD; BEGIN SELECT INTO history_host_variable * FROM host_variable WHERE host_uuid = new.host_uuid; INSERT INTO history.host_variable (host_variable_uuid, host_variable_host_uuid, host_variable_name, host_variable_value, modified_date) VALUES (history_host_variable.host_variable_uuid, history_host_variable.host_variable_host_uuid, history_host_variable.host_variable_name, history_host_variable.host_variable_value, history_host_variable.modified_date); RETURN NULL; END; $$ LANGUAGE plpgsql; ALTER FUNCTION history_host_variable() OWNER TO admin; CREATE TRIGGER trigger_host_variable AFTER INSERT OR UPDATE ON host_variable FOR EACH ROW EXECUTE PROCEDURE history_host_variable(); ==== I know this might sound odd, but I didn't want to complicate things with how my system works. However, if it would help solve the problem, I'm happy to dig into more detail. Thanks!
On 2018-09-25 1:22 a.m., digimer wrote: > Can I tell a produce to use a specific UUID? s/produce/procedure/
Do you need a single field for the pk or can you just make it the (original_table_pk, modified_time)? Alternatively, you could generate a uuid v3 from the (original_table_pk, modified_time) using something like uuid_generate_v3(uuid_nil(), original_table_pk || ":" || modified_time)?
On 2018-09-25 1:33 a.m., James Keener wrote: > Do you need a single field for the pk or can you just make it the > (original_table_pk, modified_time)? Alternatively, you could generate > a uuid v3 from the (original_table_pk, modified_time) using something > like uuid_generate_v3(uuid_nil(), original_table_pk || ":" || > modified_time)? I need to preset the modified_time, I can't use now() or else the value would differ between databases. Also, unless I am missing something, uuid_generate_v3() would generate a different UUID per trigger of the procedure, so I'd end up with different history_uuids on each database that I ran the query against. If I am missing something (and entirely possible I am), please hit me with a clue stick. :) digimer
v3 UUIDs are basically MD5 hashes (v5 is sha1?). So for the same input you'll always get the same hash.
I had assumed the modified time would be the same; if that's not, then I'm not sure and my gut tells me this becomes A Really Hard Problem™.
Jim
On Tue, Sep 25, 2018 at 1:38 AM digimer <lists@alteeve.ca> wrote:
On 2018-09-25 1:33 a.m., James Keener wrote:
> Do you need a single field for the pk or can you just make it the
> (original_table_pk, modified_time)? Alternatively, you could generate
> a uuid v3 from the (original_table_pk, modified_time) using something
> like uuid_generate_v3(uuid_nil(), original_table_pk || ":" ||
> modified_time)?
I need to preset the modified_time, I can't use now() or else the value
would differ between databases. Also, unless I am missing something,
uuid_generate_v3() would generate a different UUID per trigger of the
procedure, so I'd end up with different history_uuids on each database
that I ran the query against.
If I am missing something (and entirely possible I am), please hit me
with a clue stick. :)
digimer
Also, modified time doesn't need to be the current time, if it starts as "null" and is set on the first update, and all subsequent updates, the pre-update modified time could be used to help key the history pk.
Jim
On Tue, Sep 25, 2018 at 1:45 AM James Keener <jim@jimkeener.com> wrote:
v3 UUIDs are basically MD5 hashes (v5 is sha1?). So for the same input you'll always get the same hash.I had assumed the modified time would be the same; if that's not, then I'm not sure and my gut tells me this becomes A Really Hard Problem™.JimOn Tue, Sep 25, 2018 at 1:38 AM digimer <lists@alteeve.ca> wrote:On 2018-09-25 1:33 a.m., James Keener wrote:
> Do you need a single field for the pk or can you just make it the
> (original_table_pk, modified_time)? Alternatively, you could generate
> a uuid v3 from the (original_table_pk, modified_time) using something
> like uuid_generate_v3(uuid_nil(), original_table_pk || ":" ||
> modified_time)?
I need to preset the modified_time, I can't use now() or else the value
would differ between databases. Also, unless I am missing something,
uuid_generate_v3() would generate a different UUID per trigger of the
procedure, so I'd end up with different history_uuids on each database
that I ran the query against.
If I am missing something (and entirely possible I am), please hit me
with a clue stick. :)
digimer
Oooooh, this is a very interesting approach! I didn't realize any UUIDs could be created in a predictable way. Thank you, this might be what I need.
digimer
On 2018-09-25 1:47 a.m., James Keener wrote:
Also, modified time doesn't need to be the current time, if it starts as "null" and is set on the first update, and all subsequent updates, the pre-update modified time could be used to help key the history pk.JimOn Tue, Sep 25, 2018 at 1:45 AM James Keener <jim@jimkeener.com> wrote:v3 UUIDs are basically MD5 hashes (v5 is sha1?). So for the same input you'll always get the same hash.I had assumed the modified time would be the same; if that's not, then I'm not sure and my gut tells me this becomes A Really Hard Problem™.JimOn Tue, Sep 25, 2018 at 1:38 AM digimer <lists@alteeve.ca> wrote:On 2018-09-25 1:33 a.m., James Keener wrote:
> Do you need a single field for the pk or can you just make it the
> (original_table_pk, modified_time)? Alternatively, you could generate
> a uuid v3 from the (original_table_pk, modified_time) using something
> like uuid_generate_v3(uuid_nil(), original_table_pk || ":" ||
> modified_time)?
I need to preset the modified_time, I can't use now() or else the value
would differ between databases. Also, unless I am missing something,
uuid_generate_v3() would generate a different UUID per trigger of the
procedure, so I'd end up with different history_uuids on each database
that I ran the query against.
If I am missing something (and entirely possible I am), please hit me
with a clue stick. :)
digimer
digimer <lists@alteeve.ca> writes: > Hi all, > > I've got an interesting use case that I am stuck on. It's a bit of a > complicated environment, but I'll try to keep it simple. > > In short; I have a history schema that has tables that match the > public schema, plus one 'history_id' column that has a simple sequential > bigserial value. Then I have a procedure and trigger that, on UPDATE or > INSERT, copies the data to history. Example use case is that I can > UPDATE a sensor value in the public table and it's also INSERTs the data > into history. So public shows just the most recent values, but I can see > changes over time in the history schema. > > I have built my system to support writing to one or more DBs. I keep > a list of connected DBs and send INSERT/UPDATE calls to a method that > then runs the UPDATE/INSERT against all connected databases, as a form > of redundancy. This all works fine. > > The problem I've hit is that the 'history_id' differs between the > various databases. So I want to switch this to 'history_uuid' and use > UUIDs instead of bigserial. > > Now the question; > > Can I tell a produce to use a specific UUID? > > The idea is to generate a UUID for 'history_uuid' so that I have > consistency across databases. Of course, if an UPDATE will change > multiple rows, then I'll need to predefine multiple UUIDs. This is where > things start to get really complicated I think... Maybe I could pass an > array of UUIDs? I don't care if I find out which UUID was used for which > record, just that the same UUID was used for the same record when the > procedure is (re)run on other DBs. > > The databases are not clustered, on purpose. I've been trying to > handle all the HA stuff in my application for various reasons. > > If it helps, here is an example pair of tables, the procedure and the > trigger I currently use; > > ==== > CREATE TABLE host_variable ( > host_variable_uuid uuid not null primary key, > host_variable_host_uuid uuid not null, > host_variable_name text not null, > host_variable_value text not null, > modified_date timestamp with time zone not null > ); > ALTER TABLE host_variable OWNER TO admin; > > CREATE TABLE history.host_variable ( > history_id bigserial, > host_variable_uuid uuid, > host_variable_host_uuid uuid, > host_variable_name text, > host_variable_value text, > modified_date timestamp with time zone not null > ); > ALTER TABLE history.host_variable OWNER TO admin; > > CREATE FUNCTION history_host_variable() RETURNS trigger > AS $$ > DECLARE > history_host_variable RECORD; > BEGIN > SELECT INTO history_host_variable * FROM host_variable WHERE > host_uuid = new.host_uuid; > INSERT INTO history.host_variable > (host_variable_uuid, > host_variable_host_uuid, > host_variable_name, > host_variable_value, > modified_date) > VALUES > (history_host_variable.host_variable_uuid, > history_host_variable.host_variable_host_uuid, > history_host_variable.host_variable_name, > history_host_variable.host_variable_value, > history_host_variable.modified_date); > RETURN NULL; > END; > $$ > LANGUAGE plpgsql; > ALTER FUNCTION history_host_variable() OWNER TO admin; > > CREATE TRIGGER trigger_host_variable > AFTER INSERT OR UPDATE ON host_variable > FOR EACH ROW EXECUTE PROCEDURE history_host_variable(); > ==== > > I know this might sound odd, but I didn't want to complicate things > with how my system works. However, if it would help solve the problem, > I'm happy to dig into more detail. > > Thanks! I think James has probably given you the input you need - basically, don't allow the system to automatically set the modified time - make that parameter to your function or set that value before the copy to the history tables - content would then be the same, so uuid v3 should work. However, I do think you have another big problem lurking in the shadows. What happens if any of your connected databases are unavailable or unreachable for a period of time? I suspect your going to run into update anomalies and depending on your setup/environment, possibly even partitioning problems (depending on number of clients and typology etc). These are well known problems in distributed or replication systems. You appear to be implementing a 'poor mans' replication system. There are lots of complex issues to deal with and I wonder why you want to take them on when PG has already got well tested and robust solutions for this that would simplify your architecture and avoid the need to re-implement functionality which already exists? regards, Tim -- Tim Cross
On 2018-09-25 6:22 p.m., Tim Cross wrote: > digimer <lists@alteeve.ca> writes: > >> Hi all, >> >> I've got an interesting use case that I am stuck on. It's a bit of a >> complicated environment, but I'll try to keep it simple. >> >> In short; I have a history schema that has tables that match the >> public schema, plus one 'history_id' column that has a simple sequential >> bigserial value. Then I have a procedure and trigger that, on UPDATE or >> INSERT, copies the data to history. Example use case is that I can >> UPDATE a sensor value in the public table and it's also INSERTs the data >> into history. So public shows just the most recent values, but I can see >> changes over time in the history schema. >> >> I have built my system to support writing to one or more DBs. I keep >> a list of connected DBs and send INSERT/UPDATE calls to a method that >> then runs the UPDATE/INSERT against all connected databases, as a form >> of redundancy. This all works fine. >> >> The problem I've hit is that the 'history_id' differs between the >> various databases. So I want to switch this to 'history_uuid' and use >> UUIDs instead of bigserial. >> >> Now the question; >> >> Can I tell a produce to use a specific UUID? >> >> The idea is to generate a UUID for 'history_uuid' so that I have >> consistency across databases. Of course, if an UPDATE will change >> multiple rows, then I'll need to predefine multiple UUIDs. This is where >> things start to get really complicated I think... Maybe I could pass an >> array of UUIDs? I don't care if I find out which UUID was used for which >> record, just that the same UUID was used for the same record when the >> procedure is (re)run on other DBs. >> >> The databases are not clustered, on purpose. I've been trying to >> handle all the HA stuff in my application for various reasons. >> >> If it helps, here is an example pair of tables, the procedure and the >> trigger I currently use; >> >> ==== >> CREATE TABLE host_variable ( >> host_variable_uuid uuid not null primary key, >> host_variable_host_uuid uuid not null, >> host_variable_name text not null, >> host_variable_value text not null, >> modified_date timestamp with time zone not null >> ); >> ALTER TABLE host_variable OWNER TO admin; >> >> CREATE TABLE history.host_variable ( >> history_id bigserial, >> host_variable_uuid uuid, >> host_variable_host_uuid uuid, >> host_variable_name text, >> host_variable_value text, >> modified_date timestamp with time zone not null >> ); >> ALTER TABLE history.host_variable OWNER TO admin; >> >> CREATE FUNCTION history_host_variable() RETURNS trigger >> AS $$ >> DECLARE >> history_host_variable RECORD; >> BEGIN >> SELECT INTO history_host_variable * FROM host_variable WHERE >> host_uuid = new.host_uuid; >> INSERT INTO history.host_variable >> (host_variable_uuid, >> host_variable_host_uuid, >> host_variable_name, >> host_variable_value, >> modified_date) >> VALUES >> (history_host_variable.host_variable_uuid, >> history_host_variable.host_variable_host_uuid, >> history_host_variable.host_variable_name, >> history_host_variable.host_variable_value, >> history_host_variable.modified_date); >> RETURN NULL; >> END; >> $$ >> LANGUAGE plpgsql; >> ALTER FUNCTION history_host_variable() OWNER TO admin; >> >> CREATE TRIGGER trigger_host_variable >> AFTER INSERT OR UPDATE ON host_variable >> FOR EACH ROW EXECUTE PROCEDURE history_host_variable(); >> ==== >> >> I know this might sound odd, but I didn't want to complicate things >> with how my system works. However, if it would help solve the problem, >> I'm happy to dig into more detail. >> >> Thanks! > I think James has probably given you the input you need - basically, > don't allow the system to automatically set the modified time - make > that parameter to your function or set that value before the copy to the > history tables - content would then be the same, so uuid v3 should work. > > However, I do think you have another big problem lurking in the > shadows. What happens if any of your connected databases are unavailable > or unreachable for a period of time? I suspect your going to run into > update anomalies and depending on your setup/environment, possibly even > partitioning problems (depending on number of clients and typology > etc). These are well known problems in distributed or replication > systems. > > You appear to be implementing a 'poor mans' replication system. There > are lots of complex issues to deal with and I wonder why you want to > take them on when PG has already got well tested and robust solutions > for this that would simplify your architecture and avoid the need to > re-implement functionality which already exists? > > regards, > > Tim > Hi Tim, Last I checked, pgsql couldn't handle this; Two DBs up, getting data. DB1 goes down, DB2 continues to collect data. DB2 goes down DB1 comes back up, starts collecting data. DB2 comes back up, now I need to move data in both directions (DB1 has data 2 doesn't and vice-versa). I've created a way to resolve this in my application and it's worked for some time (obviously, in my application only. It's not a general purpose system nor is it intended to be). For the record, I realized I was looking for a complex solution to a simple problem. I do create the 'modified_date' value in my app, and I just needed to refresh it between UPDATEs/INSERTs on the same column so that no two records in the history table have the same 'modified_date'. With that, my resync works again. Cheers, digimer
On 2018-09-25 6:41 p.m., digimer wrote: > On 2018-09-25 6:22 p.m., Tim Cross wrote: >> digimer <lists@alteeve.ca> writes: >> >>> Hi all, >>> >>> I've got an interesting use case that I am stuck on. It's a bit of a >>> complicated environment, but I'll try to keep it simple. >>> >>> In short; I have a history schema that has tables that match the >>> public schema, plus one 'history_id' column that has a simple >>> sequential >>> bigserial value. Then I have a procedure and trigger that, on UPDATE or >>> INSERT, copies the data to history. Example use case is that I can >>> UPDATE a sensor value in the public table and it's also INSERTs the >>> data >>> into history. So public shows just the most recent values, but I can >>> see >>> changes over time in the history schema. >>> >>> I have built my system to support writing to one or more DBs. I keep >>> a list of connected DBs and send INSERT/UPDATE calls to a method that >>> then runs the UPDATE/INSERT against all connected databases, as a form >>> of redundancy. This all works fine. >>> >>> The problem I've hit is that the 'history_id' differs between the >>> various databases. So I want to switch this to 'history_uuid' and use >>> UUIDs instead of bigserial. >>> >>> Now the question; >>> >>> Can I tell a produce to use a specific UUID? >>> >>> The idea is to generate a UUID for 'history_uuid' so that I have >>> consistency across databases. Of course, if an UPDATE will change >>> multiple rows, then I'll need to predefine multiple UUIDs. This is >>> where >>> things start to get really complicated I think... Maybe I could pass an >>> array of UUIDs? I don't care if I find out which UUID was used for >>> which >>> record, just that the same UUID was used for the same record when the >>> procedure is (re)run on other DBs. >>> >>> The databases are not clustered, on purpose. I've been trying to >>> handle all the HA stuff in my application for various reasons. >>> >>> If it helps, here is an example pair of tables, the procedure and the >>> trigger I currently use; >>> >>> ==== >>> CREATE TABLE host_variable ( >>> host_variable_uuid uuid not null primary key, >>> host_variable_host_uuid uuid not null, >>> host_variable_name text not null, >>> host_variable_value text not null, >>> modified_date timestamp with time zone not null >>> ); >>> ALTER TABLE host_variable OWNER TO admin; >>> >>> CREATE TABLE history.host_variable ( >>> history_id bigserial, >>> host_variable_uuid uuid, >>> host_variable_host_uuid uuid, >>> host_variable_name text, >>> host_variable_value text, >>> modified_date timestamp with time zone not null >>> ); >>> ALTER TABLE history.host_variable OWNER TO admin; >>> >>> CREATE FUNCTION history_host_variable() RETURNS trigger >>> AS $$ >>> DECLARE >>> history_host_variable RECORD; >>> BEGIN >>> SELECT INTO history_host_variable * FROM host_variable WHERE >>> host_uuid = new.host_uuid; >>> INSERT INTO history.host_variable >>> (host_variable_uuid, >>> host_variable_host_uuid, >>> host_variable_name, >>> host_variable_value, >>> modified_date) >>> VALUES >>> (history_host_variable.host_variable_uuid, >>> history_host_variable.host_variable_host_uuid, >>> history_host_variable.host_variable_name, >>> history_host_variable.host_variable_value, >>> history_host_variable.modified_date); >>> RETURN NULL; >>> END; >>> $$ >>> LANGUAGE plpgsql; >>> ALTER FUNCTION history_host_variable() OWNER TO admin; >>> >>> CREATE TRIGGER trigger_host_variable >>> AFTER INSERT OR UPDATE ON host_variable >>> FOR EACH ROW EXECUTE PROCEDURE history_host_variable(); >>> ==== >>> >>> I know this might sound odd, but I didn't want to complicate things >>> with how my system works. However, if it would help solve the problem, >>> I'm happy to dig into more detail. >>> >>> Thanks! >> I think James has probably given you the input you need - basically, >> don't allow the system to automatically set the modified time - make >> that parameter to your function or set that value before the copy to the >> history tables - content would then be the same, so uuid v3 should work. >> >> However, I do think you have another big problem lurking in the >> shadows. What happens if any of your connected databases are unavailable >> or unreachable for a period of time? I suspect your going to run into >> update anomalies and depending on your setup/environment, possibly even >> partitioning problems (depending on number of clients and typology >> etc). These are well known problems in distributed or replication >> systems. >> >> You appear to be implementing a 'poor mans' replication system. There >> are lots of complex issues to deal with and I wonder why you want to >> take them on when PG has already got well tested and robust solutions >> for this that would simplify your architecture and avoid the need to >> re-implement functionality which already exists? >> >> regards, >> >> Tim >> > Hi Tim, > > Last I checked, pgsql couldn't handle this; > > Two DBs up, getting data. > DB1 goes down, DB2 continues to collect data. > DB2 goes down > DB1 comes back up, starts collecting data. > DB2 comes back up, now I need to move data in both directions (DB1 has > data 2 doesn't and vice-versa). > > I've created a way to resolve this in my application and it's worked > for some time (obviously, in my application only. It's not a general > purpose system nor is it intended to be). > > For the record, I realized I was looking for a complex solution to a > simple problem. I do create the 'modified_date' value in my app, and I > just needed to refresh it between UPDATEs/INSERTs on the same column > so that no two records in the history table have the same > 'modified_date'. With that, my resync works again. > > Cheers, > > digimer I should mention, I used two DBs in the example, but it could be 3 or more, and I need to resync in as many directions at once as needed. So it's a "poor person's" N-way replication cluster, as you guessed. I'd love to use existing tools if they actually exist. digimer
On Tue, Sep 25, 2018, 2:19 AM digimer <lists@alteeve.ca> wrote:
Oooooh, this is a very interesting approach! I didn't realize any UUIDs could be created in a predictable way. Thank you, this might be what I need.
Yep, DCE defined about 5 different versions of UUIDs, each with somewhat differing characteristics.
Versions 3 and 5 generate repeatable values, which is possibly what you are after.
1 and 2 used timestamp info plus node. At one time MAC addresses were used as node info, but that seems to have fallen into disrepute. (I think because it leaks network identifier info)
It's worth struggling through understanding them; the variations certainly are useful.
I think I have a function around to generate time-based UUIDs based on giving the timestamp as parameter; I'll see if I can dig that out. That would allow writing up old history with UUIDs that look old. Bug me off list if that sounds interesting.