Thread: How to do faster DML
Teammates suggested doing this using CTAS method, i.e. create a new table with the unique record set and then drop the main table. Something as below
create table TAB1_New
as
SELECT * from TAB1 A
where ID in
(select min(ID) from TAB1
group by ID having count(ID)>=1 );
But for the above to work faster , they mentioned to have an index created on the column using which the duplicate check will be performed i.e ID column. So, creating the index itself took ~2hrs+ and the index size now shows as ~116GB.
Create index idx1 on TAB1(ID)
And now running the SELECT subquery part of the CTAS statement to see if its giving the correct unique records count. It ran for 2.5 hrs and then we killed it. Below is the plan for the same.
explain select min(ID) from TAB1 A
group by ID having count(ID)>=1
GroupAggregate (cost=0.71..6025790113.87 rows=29432861 width=46)
Group Key: ID
Filter: (count(ID) >= 1)
-> Index Only Scan using idx1 on TAB1 a (cost=0.71..5988060903.17 rows=4883397120 width=14)
I want to understand if by any way this can be done faster . Also I am worried that creating PK constraint/index back after deleting the duplicate is also going to run forever. Is there any way we can make these heavy operations faster on postgre by facilitating more database resources through some parameter setup, like parallel hint etc? We have pg_hint_plan extension added, but not seeing the parallel hint enforced when adding it to the query.
In Oracle we have Parallel hints, Direct path read/write for faster read/write operations, parallel index scan etc. available, if anything similar to that available in aurora postgre to facilitate more horsepower and speed up the batch operations. And , how can we monitor progress of any running query ?
as
SELECT * from TAB1 A
where CTID in
(select min(CTID) from TAB1
group by ID having count(ID)>=1 );
Hello All,A non partitioned table having ~4.8 billion rows in it and having data size as ~1.4TB , row size as ~313 bytes having ~127 columns in it. This has got approx ~1billion+ duplicate rows inserted in it and we want to get the duplicate data removed for this table and create a PK/unique constraint back so as to not have the duplicate values in future. We are struggling to do the same.
Teammates suggested doing this using CTAS method, i.e. create a new table with the unique record set and then drop the main table. Something as below
create table TAB1_New
as
SELECT * from TAB1 A
where ID in
(select min(ID) from TAB1
group by ID having count(ID)>=1 );
But for the above to work faster , they mentioned to have an index created on the column using which the duplicate check will be performed i.e ID column. So, creating the index itself took ~2hrs+ and the index size now shows as ~116GB.
Create index idx1 on TAB1(ID)
And now running the SELECT subquery part of the CTAS statement to see if its giving the correct unique records count. It ran for 2.5 hrs and then we killed it. Below is the plan for the same.
explain select min(ID) from TAB1 A
group by ID having count(ID)>=1
GroupAggregate (cost=0.71..6025790113.87 rows=29432861 width=46)
Group Key: ID
Filter: (count(ID) >= 1)
-> Index Only Scan using idx1 on TAB1 a (cost=0.71..5988060903.17 rows=4883397120 width=14)
I want to understand if by any way this can be done faster . Also I am worried that creating PK constraint/index back after deleting the duplicate is also going to run forever. Is there any way we can make these heavy operations faster on postgre by facilitating more database resources through some parameter setup, like parallel hint etc? We have pg_hint_plan extension added, but not seeing the parallel hint enforced when adding it to the query.
In Oracle we have Parallel hints, Direct path read/write for faster read/write operations, parallel index scan etc. available, if anything similar to that available in aurora postgre to facilitate more horsepower and speed up the batch operations. And , how can we monitor progress of any running query ?Just to note- It's a db.r7g.8xlarge aurora postgres instance, 32VCPU, 256GB RAM. PG version 15.4.RegardsLok
Apology. One correction, the query is like below. I. E filter will be on on ctid which I believe is equivalent of rowid in oracle and we will not need the index on Id column then.But, it still runs long, so thinking any other way to make the duplicate removal faster?Also wondering , the index creation which took ~2.5hrs+ , would that have been made faster any possible way by allowing more db resource through some session level db parameter setting?create table TAB1_New
as
SELECT * from TAB1 A
where CTID in
(select min(CTID) from TAB1
group by ID having count(ID)>=1 );On Sat, Feb 3, 2024 at 5:50 PM Lok P <loknath.73@gmail.com> wrote:Hello All,A non partitioned table having ~4.8 billion rows in it and having data size as ~1.4TB , row size as ~313 bytes having ~127 columns in it. This has got approx ~1billion+ duplicate rows inserted in it and we want to get the duplicate data removed for this table and create a PK/unique constraint back so as to not have the duplicate values in future. We are struggling to do the same.
Teammates suggested doing this using CTAS method, i.e. create a new table with the unique record set and then drop the main table. Something as below
create table TAB1_New
as
SELECT * from TAB1 A
where ID in
(select min(ID) from TAB1
group by ID having count(ID)>=1 );
But for the above to work faster , they mentioned to have an index created on the column using which the duplicate check will be performed i.e ID column. So, creating the index itself took ~2hrs+ and the index size now shows as ~116GB.
Create index idx1 on TAB1(ID)
And now running the SELECT subquery part of the CTAS statement to see if its giving the correct unique records count. It ran for 2.5 hrs and then we killed it. Below is the plan for the same.
explain select min(ID) from TAB1 A
group by ID having count(ID)>=1
GroupAggregate (cost=0.71..6025790113.87 rows=29432861 width=46)
Group Key: ID
Filter: (count(ID) >= 1)
-> Index Only Scan using idx1 on TAB1 a (cost=0.71..5988060903.17 rows=4883397120 width=14)
I want to understand if by any way this can be done faster . Also I am worried that creating PK constraint/index back after deleting the duplicate is also going to run forever. Is there any way we can make these heavy operations faster on postgre by facilitating more database resources through some parameter setup, like parallel hint etc? We have pg_hint_plan extension added, but not seeing the parallel hint enforced when adding it to the query.
In Oracle we have Parallel hints, Direct path read/write for faster read/write operations, parallel index scan etc. available, if anything similar to that available in aurora postgre to facilitate more horsepower and speed up the batch operations. And , how can we monitor progress of any running query ?Just to note- It's a db.r7g.8xlarge aurora postgres instance, 32VCPU, 256GB RAM. PG version 15.4.RegardsLok
--
Apology. One correction, the query is like below. I. E filter will be on on ctid which I believe is equivalent of rowid in oracle and we will not need the index on Id column then.But, it still runs long, so thinking any other way to make the duplicate removal faster?Also wondering , the index creation which took ~2.5hrs+ , would that have been made faster any possible way by allowing more db resource through some session level db parameter setting?create table TAB1_New
as
SELECT * from TAB1 A
where CTID in
(select min(CTID) from TAB1
group by ID having count(ID)>=1 );On Sat, Feb 3, 2024 at 5:50 PM Lok P <loknath.73@gmail.com> wrote:Hello All,A non partitioned table having ~4.8 billion rows in it and having data size as ~1.4TB , row size as ~313 bytes having ~127 columns in it. This has got approx ~1billion+ duplicate rows inserted in it and we want to get the duplicate data removed for this table and create a PK/unique constraint back so as to not have the duplicate values in future. We are struggling to do the same.
Teammates suggested doing this using CTAS method, i.e. create a new table with the unique record set and then drop the main table. Something as below
create table TAB1_New
as
SELECT * from TAB1 A
where ID in
(select min(ID) from TAB1
group by ID having count(ID)>=1 );
But for the above to work faster , they mentioned to have an index created on the column using which the duplicate check will be performed i.e ID column. So, creating the index itself took ~2hrs+ and the index size now shows as ~116GB.
Create index idx1 on TAB1(ID)
And now running the SELECT subquery part of the CTAS statement to see if its giving the correct unique records count. It ran for 2.5 hrs and then we killed it. Below is the plan for the same.
explain select min(ID) from TAB1 A
group by ID having count(ID)>=1
GroupAggregate (cost=0.71..6025790113.87 rows=29432861 width=46)
Group Key: ID
Filter: (count(ID) >= 1)
-> Index Only Scan using idx1 on TAB1 a (cost=0.71..5988060903.17 rows=4883397120 width=14)
I want to understand if by any way this can be done faster . Also I am worried that creating PK constraint/index back after deleting the duplicate is also going to run forever. Is there any way we can make these heavy operations faster on postgre by facilitating more database resources through some parameter setup, like parallel hint etc? We have pg_hint_plan extension added, but not seeing the parallel hint enforced when adding it to the query.
In Oracle we have Parallel hints, Direct path read/write for faster read/write operations, parallel index scan etc. available, if anything similar to that available in aurora postgre to facilitate more horsepower and speed up the batch operations. And , how can we monitor progress of any running query ?Just to note- It's a db.r7g.8xlarge aurora postgres instance, 32VCPU, 256GB RAM. PG version 15.4.
| 7:37 PM (1 hour ago) | ![]() ![]() ![]() | ||
|

On Sat, Feb 3, 2024 at 8:55 AM Lok P <loknath.73@gmail.com> wrote:Apology. One correction, the query is like below. I. E filter will be on on ctid which I believe is equivalent of rowid in oracle and we will not need the index on Id column then.But, it still runs long, so thinking any other way to make the duplicate removal faster?Also wondering , the index creation which took ~2.5hrs+ , would that have been made faster any possible way by allowing more db resource through some session level db parameter setting?create table TAB1_New
as
SELECT * from TAB1 A
where CTID in
(select min(CTID) from TAB1
group by ID having count(ID)>=1 );On Sat, Feb 3, 2024 at 5:50 PM Lok P <loknath.73@gmail.com> wrote:Hello All,A non partitioned table having ~4.8 billion rows in it and having data size as ~1.4TB , row size as ~313 bytes having ~127 columns in it. This has got approx ~1billion+ duplicate rows inserted in it and we want to get the duplicate data removed for this table and create a PK/unique constraint back so as to not have the duplicate values in future. We are struggling to do the same.
Teammates suggested doing this using CTAS method, i.e. create a new table with the unique record set and then drop the main table. Something as below
create table TAB1_New
as
SELECT * from TAB1 A
where ID in
(select min(ID) from TAB1
group by ID having count(ID)>=1 );
But for the above to work faster , they mentioned to have an index created on the column using which the duplicate check will be performed i.e ID column. So, creating the index itself took ~2hrs+ and the index size now shows as ~116GB.
Create index idx1 on TAB1(ID)
And now running the SELECT subquery part of the CTAS statement to see if its giving the correct unique records count. It ran for 2.5 hrs and then we killed it. Below is the plan for the same.
explain select min(ID) from TAB1 A
group by ID having count(ID)>=1
GroupAggregate (cost=0.71..6025790113.87 rows=29432861 width=46)
Group Key: ID
Filter: (count(ID) >= 1)
-> Index Only Scan using idx1 on TAB1 a (cost=0.71..5988060903.17 rows=4883397120 width=14)
I want to understand if by any way this can be done faster . Also I am worried that creating PK constraint/index back after deleting the duplicate is also going to run forever. Is there any way we can make these heavy operations faster on postgre by facilitating more database resources through some parameter setup, like parallel hint etc? We have pg_hint_plan extension added, but not seeing the parallel hint enforced when adding it to the query.
In Oracle we have Parallel hints, Direct path read/write for faster read/write operations, parallel index scan etc. available, if anything similar to that available in aurora postgre to facilitate more horsepower and speed up the batch operations. And , how can we monitor progress of any running query ?Just to note- It's a db.r7g.8xlarge aurora postgres instance, 32VCPU, 256GB RAM. PG version 15.4.Aurora is not Postgresql, so configurations might not work. Having said that...And have you analyzed the table lately? Also, what's your work_mem and maintenance_work_mem?
show max_parallel_workers_per_gather;- 4
show max_parallel_maintenance_workers; - 2
show maintenance_work_mem; - 4155MB
show work_mem; - 8MB
show shared_buffers ; -22029684
SET max_parallel_maintenance_workers TO 16;
SET maintenance_work_mem TO '16 GB';
On Sat, 3 Feb 2024 at 19:29, Greg Sabino Mullane <htamfids@gmail.com> wrote: ... > Given the size of your table, you probably want to divide that up. > As long as nothing is changing the original table, you could do: > > insert into mytable2 select * from mytable1 order by ctid limit 10_000_000 offset 0; > insert into mytable2 select * from mytable1 order by ctid limit 10_000_000 offset 10_000_000; > insert into mytable2 select * from mytable1 order by ctid limit 10_000_000 offset 20_000_000; Is it me or does this have the faint smell of quadratic behaviour? I mean, you need to read and discard the first 10M to do offset 10M ( also I believe ctid order is implied in sequential scan, but no so sure, if not it would need a full table sort on each pass ). When doing things like this, I normally have some kind of unique index and do it by selecting with limit above the last read value( stored when doing it via code, as I normally load first, index later so I cannot select max() fast on the target ). Or, with some kind of "sparse" index (meaning, groups much less than the batch size ) and a little code you can select where index_col > last order by index_col limit 10M, remember last received index_col and reselect discarding missing ( or just reselect it when doing your on conflict do nothing, which also has a fast select max(id) due to the PK, it will work if it has an index on id column on the original even if not unique ) to avoid that. Also, I'm not sure if ctid is ordered and you can select where ctid>last ordered, if that works it probably is faster for immutable origins. Francisco Olarte.
On 2024-02-03 19:25:12 +0530, Lok P wrote: > Apology. One correction, the query is like below. I. E filter will be on on > ctid which I believe is equivalent of rowid in oracle and we will not need the > index on Id column then. > > But, it still runs long, so thinking any other way to make the duplicate > removal faster? > > Also wondering , the index creation which took ~2.5hrs+ , would that have been > made faster any possible way by allowing more db resource through some session > level db parameter setting? > > create table TAB1_New > as > SELECT * from TAB1 A > where CTID in > (select min(CTID) from TAB1 > group by ID having count(ID)>=1 ); That »having count(ID)>=1« seems redundant to me. Surely every id which occurs in the table occurs at least once? Since you want ID to be unique I assume that it is already almost unique - so only a small fraction of the ids will be duplicates. So I would start with creating a list of duplicates: create table tab1_dups as select id, count(*) from tab1 group by id having count(*) > 1; This will still take some time because it needs to build a temporary structure large enough to hold a count for each individual id. But at least then you'll have a much smaller table to use for further cleanup. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On Sat, 3 Feb 2024 at 19:29, Greg Sabino Mullane <htamfids@gmail.com> wrote:
...
> Given the size of your table, you probably want to divide that up.
> As long as nothing is changing the original table, you could do:
>
> insert into mytable2 select * from mytable1 order by ctid limit 10_000_000 offset 0;
> insert into mytable2 select * from mytable1 order by ctid limit 10_000_000 offset 10_000_000;
> insert into mytable2 select * from mytable1 order by ctid limit 10_000_000 offset 20_000_000;
Is it me or does this have the faint smell of quadratic behaviour? I
mean, you need to read and discard the first 10M to do offset 10M (
also I believe ctid order is implied in sequential scan, but no so
sure, if not it would need a full table sort on each pass ).
When doing things like this, I normally have some kind of unique index
and do it by selecting with limit above the last read value( stored
when doing it via code, as I normally load first, index later so I
cannot select max() fast on the target ). Or, with some kind of
"sparse" index (meaning, groups much less than the batch size ) and a
little code you can select where index_col > last order by index_col
limit 10M, remember last received index_col and reselect discarding
missing ( or just reselect it when doing your on conflict do nothing,
which also has a fast select max(id) due to the PK, it will work if it
has an index on id column on the original even if not unique ) to
avoid that.
Also, I'm not sure if ctid is ordered and you can select where
ctid>last ordered, if that works it probably is faster for immutable
origins.
Francisco Olarte.
On 2024-02-04 02:14:20 +0530, Lok P wrote: > However , as we have ~5billion rows in the base table and out of that , we were > expecting almost half i.e. ~2billion would be duplicates. That's way more than I expected from your original description. And it of course raises the question whether it's a good idea to just throw away all that data or if you need to keep that in a normalized way. > And you said, doing the inserts using the "order by CTID Offset" > approach must cause one full sequential scan of the whole table for > loading each chunk/10M of rows and that would take a long time I > believe. > > I am still trying to understand the other approach which you suggested. Not > able to understand "you can select where index_col > last order by index_col > limit 10M," . > However, to get the max ID value of the last 10M loaded rows in target, do you > say that having an PK index created on that target table column(ID) will > help, Yes. Getting the maximum value from an index is a very fast operation. You just have to traverse down the right edge of the tree (or you may even be able to access the right-most leaf page directly). > and we can save the max (ID) value subsequently in another table to fetch > and keep loading from the source table (as ID>Max_ID stored in temp table)? Another table or a variable in a script (personally, if I need to do something repeatedly, I usually write a script in the scripting language I feel most comfortable in (which has been Python for the last 7 or 8 years, Perl before that) which gives you variables, loops, conditionals and - above all - repeatability. > OR > Would it be better to do it in one shot only , but by setting a higher value of > some parameters like "maintenance_work_mem" or "max_parallel_workers"? Hard to say. Normally, processing in fewer. bigger chunks is faster. But RAM is much faster than disk (even with SSDs), so it might be faster to make work_mem as large as you can and then use a chunk size which just fits inside work_mem is faster. Of course finding that sweet spot takes experimentation, hence time, and it may make little sense to experiment for 20 hours just to save 40 minutes. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
> On 3 Feb 2024, at 13:20, Lok P <loknath.73@gmail.com> wrote: > > Hello All, > A non partitioned table having ~4.8 billion rows in it and having data size as ~1.4TB , row size as ~313 bytes having ~127columns in it. This has got approx ~1billion+ duplicate rows inserted in it and we want to get the duplicate data removedfor this table and create a PK/unique constraint back so as to not have the duplicate values in future. We are strugglingto do the same. ~4.8 billion rows of which ~1 billion are duplicates… Wait a minute… Did you verify that your ID column is larger than 32-bits? Because if that’s a 32 bit integer, the range of values it can hold is about 4.3 billion, after which it wraps around. With ~4.8 billion rows that would result in about ~0.5 billion repeated ID values, giving you the reported ~1 billion duplicateID's. If that’s the case, your duplicates obviously aren’t really duplicates and you require a different type of solution. > Teammates suggested doing this using CTAS method, i.e. create a new table with the unique record set and then drop themain table. Something as below > > create table TAB1_New > as > SELECT * from TAB1 A > where ID in > (select min(ID) from TAB1 > group by ID having count(ID)>=1 ); > > But for the above to work faster , they mentioned to have an index created on the column using which the duplicate checkwill be performed i.e ID column. So, creating the index itself took ~2hrs+ and the index size now shows as ~116GB. > > Create index idx1 on TAB1(ID) Are your duplicates exact duplicates? Or is there an order of preference among them? And if so, what really makes those rows unique? That matters for solutions on how to deduplicate these rows. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Lok: On Sat, 3 Feb 2024 at 21:44, Lok P <loknath.73@gmail.com> wrote: > On Sun, Feb 4, 2024 at 12:50 AM Francisco Olarte <folarte@peoplecall.com> wrote: > However , as we have ~5billion rows in the base table and out of that , we were expecting almost half i.e. ~2billion wouldbe duplicates. And you said, doing the inserts using the "order by CTID Offset" approach must cause one full sequentialscan of the whole table for loading each chunk/10M of rows and that would take a long time I believe. I did not say "MUST CAUSE". In fact I said I believe it would not. What I am gonna say ( now ) is test it. Make 1k, 10k, 100k, 1M tables in a scratch database, explain and test your things there w/ & w/o index etc.. Not all needed, but testing 100k & 1M in 1k batches could show you missing quadratic behaviour. Explain would show you unexpected sorts or scans. > I am still trying to understand the other approach which you suggested. Not able to understand "you can select where index_col> last order by index_col limit 10M," . > However, to get the max ID value of the last 10M loaded rows in target, do you say that having an PK index created on thattarget table column(ID) will help, and we can save the max (ID) value subsequently in another table to fetch and keeploading from the source table (as ID>Max_ID stored in temp table)? I am a programmer by trade. When faced with problems like these, unless I find a trivial solution, I tend to make auxiliary programs as it is much easier for me to make a thousand lines of Perl ( or python/C/C++ Java ) or a couple hundreds of plpgsql ( if I want to avoid roundtrips ) than trying to debug complicated SQL only workflows. For your kind of problem I would make a program to read the rows and insert them. As an example, lets say you have a non-unique index on ID and are going to use the on-conflict-do-nothing route ( I do believe this would be slow due to the need of having an index on the target table to support it, but let's assume it is ok ). To do that I may just do a loop, starting with last_id=-1(some id less than any other id), selecting a chunk of rows with id>=last_id ordered by id and inserting them. After doing that I may notice that I do not need the index if the sort order is right, drop the index and the on-conflict and just do, for every row, if(id>last_id) insert before storing last_id=id. Anyway, not knowing the real table characteristics and current usage patterns I cannot recomend anything concrete. > Would it be better to do it in one shot only , but by setting a higher value of some parameters like "maintenance_work_mem"or "max_parallel_workers"? It depends on a lot of unknown ( to us ) things. Francisco Olarte.
insert into mytable2 select * from mytable1 order by ctid limit 10_000_000 offset 20_000_000;
select format('insert into mytable2 select * from mytable1 where i between %s and %s;', max(i), min(i)) from t group by i/10_000_000;
insert into mytable2 select * from mytable1 order by ctid limit 10_000_000 offset 20_000_000;You can use min/max values grouping them by 10_000_000 records, so you don´t need that offset, then generate commands and run them.
select format('insert into mytable2 select * from mytable1 where i between %s and %s;', max(i), min(i)) from t group by i/10_000_000;
I'm surprised no one has mentioned perhaps it's a good idea to partition this table while adding the pk. By your own statements the table is difficult to work with as is. Without partitioning the table, row inserts would need to walk the pk index and could be a factor. If this is static table then perhaps that's ok but if not...Anyway I don't recall what type the ID was or how it's set but i suggest you seriously investigate using it to partition this table into manageable smaller tables.Best of luck.On Sun, Feb 4, 2024, 8:00 AM Marcos Pegoraro <marcos@f10.com.br> wrote:insert into mytable2 select * from mytable1 order by ctid limit 10_000_000 offset 20_000_000;You can use min/max values grouping them by 10_000_000 records, so you don´t need that offset, then generate commands and run them.
select format('insert into mytable2 select * from mytable1 where i between %s and %s;', max(i), min(i)) from t group by i/10_000_000;
1)Even after partitioning the target table , to speed up the data load on this table , Is there an option to disable the primary and foreign keys and re-enable them post data load finishes. Will that be a good idea
3)As the size of the table or each partition is going to be very large and this will be a read intensive application, compressing the historical partition will help us save the storage space and will also help the read queries performance
On Sun, Feb 4, 2024 at 8:14 PM Dennis White <dwhite@seawardmoon.com> wrote:I'm surprised no one has mentioned perhaps it's a good idea to partition this table while adding the pk. By your own statements the table is difficult to work with as is. Without partitioning the table, row inserts would need to walk the pk index and could be a factor. If this is static table then perhaps that's ok but if not...Anyway I don't recall what type the ID was or how it's set but i suggest you seriously investigate using it to partition this table into manageable smaller tables.Best of luck.On Sun, Feb 4, 2024, 8:00 AM Marcos Pegoraro <marcos@f10.com.br> wrote:insert into mytable2 select * from mytable1 order by ctid limit 10_000_000 offset 20_000_000;You can use min/max values grouping them by 10_000_000 records, so you don´t need that offset, then generate commands and run them.
select format('insert into mytable2 select * from mytable1 where i between %s and %s;', max(i), min(i)) from t group by i/10_000_000;Thank you so much.You are correct. It was seemingly difficult to operate on this table. Every read query is struggling and so partitioning is something we must have to think of. And hoping that, postgres will be able to handle this scale, with proper partitioning and indexing strategy.I have a few related questions.1)Even after partitioning the target table , to speed up the data load on this table , Is there an option to disable the primary and foreign keys and re-enable them post data load finishes. Will that be a good idea or will it be cumbersome/resource intensive to re-enable the constraints , after persisting all the data in the table?
2)I understand there is no limitation theoretically on the number or size of partitions a table can have in postgres. But I want to know from experts here, from their real life experience, if there exists any such thing which we should take care of before deciding the partitioning strategy, so as to have the soft limit (both with respect to size and number of partitions) obeyed.Note:- Currently this table will be around ~1TB in size and will hold Approx ~3billion rows(post removal of duplicates). But as per business need it may grow up to ~45 billion rows in future.
3)As the size of the table or each partition is going to be very large and this will be a read intensive application,
compressing the historical partition will help us save the storage space and will also help the read queries performance. So, Can you please throw some light on the compression strategy which we should follow here (considering a daily range partition table based on transaction_date as partition key)?RegardsLok
Partitioning will depend a lot on how you access the data. Is it always using that transaction_date column in the where clause? Can you share the table definition via \d? We've talked about this table quite a bit, but not actually seen what it looks like. We are flying blind a little bit. You mentioned your queries are slow, but can you share an explain analyze on one of these slow queries?45 billion rows is quite manageable. How many rows are in each day? You may want to do a larger bucket than 24 hours per partition.1)Even after partitioning the target table , to speed up the data load on this table , Is there an option to disable the primary and foreign keys and re-enable them post data load finishes. Will that be a good ideaNo.3)As the size of the table or each partition is going to be very large and this will be a read intensive application, compressing the historical partition will help us save the storage space and will also help the read queries performanceI am not sure what this means. If you are not going to need the data anymore, dump the data to deep storage and drop the partition.Cheers,Greg
The table has ~127 columns of different data types , combinations of Numeric, varchar, date etc. And is having current size ~1TB holding ~3billion rows currently and the row size is ~300bytes.
Currently it has lesser volume , but in future the daily transaction per day which will be inserted into this table will be Max ~500million rows/day. And the plan is to persist at least ~3months of transaction data which will be around 45billion rows in total. And if all works well , we may need to persist ~6 months worth of data in this database in future and that will be ~90 billion.
This table will always be queried on the transaction_date column as one of the filters criteria. But the querying/search criteria can span from a day to a month worth of transaction date data.
When you said "You may want to do a larger bucket than 24 hours per partition.", do you mean to say partition by weekly or so? Currently as per math i.e. 1TB of storage for ~3billion rows. So the daily range partition size( to hold ~500million transactions/day) will be around ~100-150GB. Won't that be too much data for a single partition to operate on, and increasing the granularity further(say weekly) will make the partition more bulkier?
What I mean was, we will definitely need the data for querying purposes by the users, but just to keep the storage space incontrol (and to help the read queries), we were thinking of having the historical partitions compressed. And for that , if any compression strategy should we follow on postgres?
With regards to loading data to the table faster, wondering why you said 'NO' to load the data first and enabling/Creating the Primary key and Foreign key constraint later approach. Because this table is a child table and the parent is already having data in it, loading data to this table in presence of PK and FK makes it too slow as it tries to validate those for each set of rows. So we were thinking if doing it at a later stage at oneshot will be a faster approach. Please suggest.
I will try to collect some SELECT query and post the explain analyze. Currently we are trying to get rid of the duplicates.
1. Load the children before attaching them to the parent.2. Create the child indices, PK and FKs before attaching to the parent.3. Do step 2 in multiple parallel jobs. (cron is your friend.)4. Attach the children to the "naked" (no PK, no FK, no indices) parent.5. Use the ONLY keyword on CREATE INDEX and ALTER TABLE to add the indices, PK and FK to the parent after step 4.
Then we will create the primary key index on the child table. When you said using multiple parallel jobs, do you mean creating the PK index on each partition separately from different sessions rather than creating on the table using a single statement or some other faster way to create the PK index?
Now the last step is attaching the PK and FK of the above child table to the parent. This will validate each and every row for the uniqueness and also to ensure the foreign key is present in the parent table. Won't this take a longer time in this step?
show maintenance_work_mem; - 4155MB
show work_mem; - 8MB
show shared_buffers ; -22029684
Partitioning will depend a lot on how you access the data. Is it always using that transaction_date column in the where clause? Can you share the table definition via \d? We've talked about this table quite a bit, but not actually seen what it looks like. We are flying blind a little bit. You mentioned your queries are slow, but can you share an explain analyze on one of these slow queries?45 billion rows is quite manageable. How many rows are in each day? You may want to do a larger bucket than 24 hours per partition.1)Even after partitioning the target table , to speed up the data load on this table , Is there an option to disable the primary and foreign keys and re-enable them post data load finishes. Will that be a good ideaNo.3)As the size of the table or each partition is going to be very large and this will be a read intensive application, compressing the historical partition will help us save the storage space and will also help the read queries performanceI am not sure what this means. If you are not going to need the data anymore, dump the data to deep storage and drop the partition.Cheers,GregThank you.
The table has ~127 columns of different data types , combinations of Numeric, varchar, date etc. And is having current size ~1TB holding ~3billion rows currently and the row size is ~300bytes.
Currently it has lesser volume , but in future the daily transaction per day which will be inserted into this table will be Max ~500million rows/day. And the plan is to persist at least ~3months of transaction data which will be around 45billion rows in total. And if all works well , we may need to persist ~6 months worth of data in this database in future and that will be ~90 billion.
This table will always be queried on the transaction_date column as one of the filters criteria. But the querying/search criteria can span from a day to a month worth of transaction date data.
When you said "You may want to do a larger bucket than 24 hours per partition.", do you mean to say partition by weekly or so? Currently as per math i.e. 1TB of storage for ~3billion rows. So the daily range partition size( to hold ~500million transactions/day) will be around ~100-150GB. Won't that be too much data for a single partition to operate on, and increasing the granularity further(say weekly) will make the partition more bulkier?
What I mean was, we will definitely need the data for querying purposes by the users, but just to keep the storage space incontrol (and to help the read queries), we were thinking of having the historical partitions compressed. And for that , if any compression strategy should we follow on postgres?
With regards to loading data to the table faster, wondering why you said 'NO' to load the data first and enabling/Creating the Primary key and Foreign key constraint later approach. Because this table is a child table and the parent is already having data in it, loading data to this table in presence of PK and FK makes it too slow as it tries to validate those for each set of rows. So we were thinking if doing it at a later stage at oneshot will be a faster approach. Please suggest.
I will try to collect some SELECT query and post the explain analyze. Currently we are trying to get rid of the duplicates.RegardsLok
On Mon, 5 Feb, 2024, 2:21 am Lok P, <loknath.73@gmail.com> wrote:Partitioning will depend a lot on how you access the data. Is it always using that transaction_date column in the where clause? Can you share the table definition via \d? We've talked about this table quite a bit, but not actually seen what it looks like. We are flying blind a little bit. You mentioned your queries are slow, but can you share an explain analyze on one of these slow queries?45 billion rows is quite manageable. How many rows are in each day? You may want to do a larger bucket than 24 hours per partition.1)Even after partitioning the target table , to speed up the data load on this table , Is there an option to disable the primary and foreign keys and re-enable them post data load finishes. Will that be a good ideaNo.3)As the size of the table or each partition is going to be very large and this will be a read intensive application, compressing the historical partition will help us save the storage space and will also help the read queries performanceI am not sure what this means. If you are not going to need the data anymore, dump the data to deep storage and drop the partition.Cheers,GregThank you.
The table has ~127 columns of different data types , combinations of Numeric, varchar, date etc. And is having current size ~1TB holding ~3billion rows currently and the row size is ~300bytes.
Currently it has lesser volume , but in future the daily transaction per day which will be inserted into this table will be Max ~500million rows/day. And the plan is to persist at least ~3months of transaction data which will be around 45billion rows in total. And if all works well , we may need to persist ~6 months worth of data in this database in future and that will be ~90 billion.
This table will always be queried on the transaction_date column as one of the filters criteria. But the querying/search criteria can span from a day to a month worth of transaction date data.
When you said "You may want to do a larger bucket than 24 hours per partition.", do you mean to say partition by weekly or so? Currently as per math i.e. 1TB of storage for ~3billion rows. So the daily range partition size( to hold ~500million transactions/day) will be around ~100-150GB. Won't that be too much data for a single partition to operate on, and increasing the granularity further(say weekly) will make the partition more bulkier?
What I mean was, we will definitely need the data for querying purposes by the users, but just to keep the storage space incontrol (and to help the read queries), we were thinking of having the historical partitions compressed. And for that , if any compression strategy should we follow on postgres?
With regards to loading data to the table faster, wondering why you said 'NO' to load the data first and enabling/Creating the Primary key and Foreign key constraint later approach. Because this table is a child table and the parent is already having data in it, loading data to this table in presence of PK and FK makes it too slow as it tries to validate those for each set of rows. So we were thinking if doing it at a later stage at oneshot will be a faster approach. Please suggest.
I will try to collect some SELECT query and post the explain analyze. Currently we are trying to get rid of the duplicates.RegardsLokAny thoughts, based on above usage pattern?While I did the maths based on the current stats with limited data sets. The partitions size coming as 100 to 150gb as I explained above, if we keep it daily range partition. Should we have to make it hourly then?So want some experts view if this going to work fine for a read latency intensive applications or we should have some different strategy?
This table will always be queried on the transaction_date column as one of the filters criteria. But the querying/search criteria can span from a day to a month worth of transaction date data.
What I mean was, we will definitely need the data for querying purposes by the users, but just to keep the storage space incontrol (and to help the read queries), we were thinking of having the historical partitions compressed. And for that , if any compression strategy should we follow on postgres?
With regards to loading data to the table faster, wondering why you said 'NO' to load the data first and enabling/Creating the Primary key and Foreign key constraint later approach.
On Sun, Feb 4, 2024 at 3:52 PM Lok P <loknath.73@gmail.com> wrote:This table will always be queried on the transaction_date column as one of the filters criteria. But the querying/search criteria can span from a day to a month worth of transaction date data.In that case, and based on the numbers you provided, daily partitioning seems a decent solution.What I mean was, we will definitely need the data for querying purposes by the users, but just to keep the storage space incontrol (and to help the read queries), we were thinking of having the historical partitions compressed. And for that , if any compression strategy should we follow on postgres?There is no compression strategy, per se. There are ways Postgres internally compresses the data (see "TOAST"), but in general, the table is either there or not. If space is a concern you should examine if you truly need 127 columns, make sure they are arranged correctly (search for 'postgres column tetris'), and move unused and older partitions elsewhere - like a separate cheaper Postgres server, or something offline.With regards to loading data to the table faster, wondering why you said 'NO' to load the data first and enabling/Creating the Primary key and Foreign key constraint later approach.For an initial load, this is fine, if you do things very carefully. I'd build the child table indexes post load but still feed things into the main table as an initial tradeoff, but YMMV.Just looked back and saw this is actually Aurora, not Postgres. Most of the advice on this thread should still apply, but be aware that things are not the same as Postgres, especially at the storage layer. For all the money you are giving them, don't forget to bug them as well.Cheers,Greg
So when you said "In that case, and based on the numbers you provided, daily partitioning seems a decent solution.", does it mean that size of the partition (which will be ~100-150GB per partition here) is not an issue for the postgres to serve read latency sensitive queries?
So when you said "In that case, and based on the numbers you provided, daily partitioning seems a decent solution.", does it mean that size of the partition (which will be ~100-150GB per partition here) is not an issue for the postgres to serve read latency sensitive queries?Yes, that should be fine. At the end of the day, what's going to be more important is making sure you are using good indexing, and keeping all your active stuff in cache. Since you have 127 columns, only pull back the columns you absolutely need for each query.Cheers,Greg
Thank you Greg."and keeping all your active stuff in cache. Since you have 127 columns, only pull back the columns you absolutely need for each query."Understood the point. As postgres is a "row store" database, so keeping the size of the row lesser by making the number of columns to minimum, should definitely help as more number of rows can be packed then into one page or block (which I believe is 8KB default size per block). And that mean more number of blocks can be cached while reading, so better cache hit ratio.As you rightly pointed out, Another thing I noticed the shared_buffer parameters set as 2029684 in this instance, which comes to ~21MB and that seems very small for a database operating in large scale. And I see we have RAM in the instance showing as ~256GB. So thinking of bumping it to something as ~40-50GB. Hope that will help to some extent. Not sure if there is methods to manually, cache some objects(tables/indexes) which were getting used frequently by the read queries.On Tue, Feb 6, 2024 at 10:27 AM Greg Sabino Mullane <htamfids@gmail.com> wrote:So when you said "In that case, and based on the numbers you provided, daily partitioning seems a decent solution.", does it mean that size of the partition (which will be ~100-150GB per partition here) is not an issue for the postgres to serve read latency sensitive queries?Yes, that should be fine. At the end of the day, what's going to be more important is making sure you are using good indexing, and keeping all your active stuff in cache. Since you have 127 columns, only pull back the columns you absolutely need for each query.Cheers,Greg
On Tue, Feb 6, 2024 at 8:34 AM Greg Sabino Mullane <htamfids@gmail.com> wrote:On Sun, Feb 4, 2024 at 3:52 PM Lok P <loknath.73@gmail.com> wrote:What I mean was, we will definitely need the data for querying purposes by the users, but just to keep the storage space incontrol (and to help the read queries), we were thinking of having the historical partitions compressed. And for that , if any compression strategy should we follow on postgres?There is no compression strategy, per se. There are ways Postgres internally compresses the data (see "TOAST"), but in general, the table is either there or not. If space is a concern you should examine if you truly need 127 columns, make sure they are arranged correctly (search for 'postgres column tetris'), and move unused and older partitions elsewhere - like a separate cheaper Postgres server, or something offline.Cheers,Greg
Rearranging the table columns by typlen desc, didn't give much storage space reduction.
So it seems the compression does not apply for the rows inserted using "CTAS" or "INSERT AS SELECT". Does that mean it is only applicable for the row by row inserts but not batch inserts(which a bigger system normally will have)? I was not expecting this though, so it was disappointing.
On 2024-02-06 11:25:05 +0530, veem v wrote: > With respect to the storage part:- In Oracle database we were supposed to keep > the frequently accessed column first and less frequently accessed columns > towards last in the table while creating table DDL. This used to help the query > performance while accessing those columns as it's a row store database. Are > there any such performance implications exists in postgres? And there the data > types like varchar used to only occupy the space which the real data/value > contains. > > But here in postgres, it seems a bit different as it has significant > overhead in storage, as these seem like being fixed length data types and will > occupy the space no matter what the read data holds. Yes. Numbers in Oracle are variable length, so most Oracle tables wouldn't contain many fixed length columns. In PostgreSQL must numeric types are fixed length, so you'll have quite a lot of them. > So curious to know, if in this case will there be any performance > impact accessing those columns, if such a column which is frequently > accessed but has been put towards the end of the table because of > storage space saving? Yes. PostgreSQL has to skip over each column until it gets to the column it wants to read. So reading a column at the end of the row will be slower than reading one at the start. A very simplistic benchmark (100 columns of type text each containing a single character) suggests that accessing column 100 takes about 4 or 5 times as long as column 1, and the access times for the coiumns between are pretty linear. So there's a bit of a tradeoff between minimizing alignment overhead and arranging columns for fastest access. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
Yes. Numbers in Oracle are variable length, so most Oracle tables
wouldn't contain many fixed length columns. In PostgreSQL must numeric
types are fixed length, so you'll have quite a lot of them.
On Sun, 2024-02-11 at 13:25 +0530, veem v wrote: > On Sun, 2024-02-11 at 01:25 +0100, Peter J. Holzer wrote: > > On 2024-02-06 11:25:05 +0530, veem v wrote: > > > With respect to the storage part:- In Oracle database we were supposed to keep > > > the frequently accessed column first and less frequently accessed columns > > > towards last in the table while creating table DDL. This used to help the query > > > performance while accessing those columns as it's a row store database. Are > > > there any such performance implications exists in postgres? And there the data > > > types like varchar used to only occupy the space which the real data/value > > > contains. > > > > > > But here in postgres, it seems a bit different as it has significant > > > overhead in storage, as these seem like being fixed length data types and will > > > occupy the space no matter what the read data holds. > > > > Yes. Numbers in Oracle are variable length, so most Oracle tablesbetween > > wouldn't contain many fixed length columns. In PostgreSQL must numeric > > types are fixed length, so you'll have quite a lot of them. > > So it means , say in other databases like (oracle database), we were careless > choosing the data length , say for example Varchar2(4000), if the real data which > is inserted into the table holds a varchar string of length 20 bytes then Oracle > trimmed it to occupy the 20 bytes length only in the storage. but in postgre here > we need to be cautious and define the length as what the data attribute can max > contains , because that amount of fixed space is allocated to every value which > is inserted into the table for that attribute/data element. Similarly for > Number/Numeric data type. Please correct if my understanding is wrong. Your understanding is wrong. Oracle and PostgreSQL are not too different about storing values. The data type that Oracle calls "number", and that is called "numeric" in PostgreSQL and in the SQL standard, is stored a variable length data type in both Oracle and PostgreSQL. It is just that Oracle has very few data types (I saw them gush about adding "boolean" as a great new feature in version 23). So, for example, there are no integer data types in Oracle, and you have to store them as a variable length data type. PostgreSQL has integer data types, which are fixed length (2, 4 or 8 bytes) and provide much more efficient storage for integers. "character varying" is also pretty similar in Oracle and PostgreSQL, except that Oracle calls it "varchar2". The only fixed-length character data type is "character", but that is always a bad choice, in Oracle as in PostgreSQL. About your initial question: in PostgreSQL there is also a certain performance gain if you store frequently used columns first, since the database has to skip fewer columns to get to the data. If the previous columns are fixed length data types like integers, that is cheaper, because we don't have to look at the data to know how long they are. Another thing to consider is padding. Each fixed-width data type has certain alignment requirements (imposed by CPU processing) that you can find in "pg_type.typalign". This can lead to wasted space in the form of "padding bytes". For example, if a "bigint" follows a ASCII single-character "text" or "varchar" value (which occupies two bytes), there will be six padding bytes between them to align the "bigint" at a storage address that is a multiple of eight. But although both of these considerations (skipping over previous columns and padding) are relevant for performance, they are often a micro-optimization that you won't be able to measure, and you shouldn't lose too much sleep over them. > > > So there's a bit of a tradeoff between minimizing alignment overhead and > > arranging columns for fastest access. Precisely. Yours, Laurenz Albe
On 2024-02-11 13:25:10 +0530, veem v wrote: > On Sun, 11 Feb 2024 at 05:55, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > Yes. Numbers in Oracle are variable length, so most Oracle tables > wouldn't contain many fixed length columns. In PostgreSQL must numeric > types are fixed length, so you'll have quite a lot of them. > > > > So it means , say in other databases like (oracle database), we were careless > choosing the data length , say for example Varchar2(4000), if the real data > which is inserted into the table holds a varchar string of length 20 bytes then > Oracle trimmed it to occupy the 20 bytes length only in the storage. but in > postgre here we need to be cautious and define the length as what the data > attribute can max contains , because that amount of fixed space is allocated > to every value which is inserted into the table for that attribute/data > element. No. Varchar is (as the name implies) a variable length data type and both Oracle and PostgreSQL store only the actual value plus some length indicator in the database. Indeed, in PostgreSQL you don't need to specify the maximum length at all. However, if you specify a column as "integer" in PostgreSQL it will always occupy 4 bytes, whether you store the value 15 in it or 999999999. In Oracle, there is no "integer" type and the roughly equivalent number(10,0) is actually a variable length floating point number. So 15 will occupy only 3 bytes and 999999999 will occupy 7 bytes[1]. > Similarly for Number/Numeric data type. Number in Oracle and numeric in PostgreSQL are variable length types. But in PostgreSQL you also have a lot of fixed length numeric types (from boolean to bigint as well as float4 and float8) and you would normally prefer those over numeric (unless you really need a decimal or very long type). So padding is something you would encounter in a typical PostgreSQL database while it just wouldn't happen in a typical Oracle database. But as Laurenz wrote that's a micro optimization which you usually shouldn't spend much time on. OTOH, if you have hundreds of columns in a table, maybe it is worthwhile to spend some time on analyzing access patterns and optimizing the data layout. hp [1] From memory. I may be misremembering the details. -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
> Similarly for Number/Numeric data type.
Number in Oracle and numeric in PostgreSQL are variable length types.
But in PostgreSQL you also have a lot of fixed length numeric types
(from boolean to bigint as well as float4 and float8) and you would
normally prefer those over numeric (unless you really need a decimal or
very long type). So padding is something you would encounter in a
typical PostgreSQL database while it just wouldn't happen in a typical
Oracle database.
On Sun, 11 Feb 2024 at 19:02, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:> Similarly for Number/Numeric data type.
Number in Oracle and numeric in PostgreSQL are variable length types.
But in PostgreSQL you also have a lot of fixed length numeric types
(from boolean to bigint as well as float4 and float8) and you would
normally prefer those over numeric (unless you really need a decimal or
very long type). So padding is something you would encounter in a
typical PostgreSQL database while it just wouldn't happen in a typical
Oracle database.When you said "you would normally prefer those over numeric " I was thinking the opposite. As you mentioned integer is a fixed length data type and will occupy 4 bytes whether you store 15 or 99999999.But in case of variable length type like Number or numeric , it will resize itself based on the actual data, So is there any downside of going with the variable length data type like Numeric, Varchar type always for defining the data elements?
When you said "you would normally prefer those over numeric " I was thinking the opposite. As you mentioned integer is a fixed length data type and will occupy 4 bytes whether you store 15 or 99999999.But in case of variable length type like Number or numeric , it will resize itself based on the actual data, So is there any downside of going with the variable length data type like Numeric,
Varchar type always for defining the data elements?
On 2024-02-11 22:23:58 +0530, veem v wrote: > On Sun, 11 Feb 2024 at 19:02, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > > > Similarly for Number/Numeric data type. > > Number in Oracle and numeric in PostgreSQL are variable length types. > But in PostgreSQL you also have a lot of fixed length numeric types > (from boolean to bigint as well as float4 and float8) and you would > normally prefer those over numeric (unless you really need a decimal or > very long type). So padding is something you would encounter in a > typical PostgreSQL database while it just wouldn't happen in a typical > Oracle database. > > > > When you said "you would normally prefer those over numeric " I was thinking > the opposite. As you mentioned integer is a fixed length data type and will > occupy 4 bytes whether you store 15 or 99999999.But in case of variable > length type like Number or numeric , it will resize itself based on the actual > data, So is there any downside of going with the variable length data type like > Numeric, Varchar type always for defining the data elements? The fixed width types are those that the CPU can directly process: Integers with 16, 32 and 64 bits, floating point numbers with 32 and 64 bits. The CPU can read and write them with a single memory access, it can do arithmetic with a single instruction, etc. Number/Numeric are not native types on any CPU. To read them the CPU needs several memory accesses (probably one per byte unless you get really clever) and then it can't do any calculations with them directly, instead it has run a subroutine which does operations on little chunks and then puts those chunks together again - basically the same as you do when you're doing long addition or multiplication on paper. So that's not very efficient. Also the space savings aren't that great and probably even negative: In my example the length of a numeric type with at most 10 digits varied between 3 and 7 bytes, Only for values between -99 and +99 is this (3 bytes) actually shorter, for other values it's the same length or longer. So you would only save space if most of your values are in that ±99 range. But not if all of them are, because then you could simply use a smallint (Range -32768..32767) in PostgreSQL and save another byte. Finally - and I'm probably biased in this as I learned programming 40 years ago - to me the native types feel more natural than product specific variable-length decimal types. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On 2024-02-11 12:08:47 -0500, Ron Johnson wrote: > On Sun, Feb 11, 2024 at 11:54 AM veem v <veema0000@gmail.com> wrote: > When you said "you would normally prefer those over numeric " I was > thinking the opposite. As you mentioned integer is a fixed length data type > and will occupy 4 bytes whether you store 15 or 99999999.But in case of > variable length type like Number or numeric , it will resize itself based > on the actual data, So is there any downside of going with the variable > length data type like Numeric, > > > Consider a table with a bunch of NUMERIC fields. One of those records has > small values (aka three bytes). It fits neatly in 2KiB. > > And then you update all those NUMERIC fields to big numbers that take 15 > bytes. Suddenly (or eventually, if you update them at different times), the > record does not fit in 2KiB, and so must be moved to its own.page. That causes > extra IO. I think that's not much of a concern with PostgreSQL because you can't update a row in-place anyway because of MVCC. So in any case you're writing a new row. If you're lucky there is enough free space in the same page and you can do a HOT update, but that's quite independent on whether the row changes size. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On 2024-02-11 12:08:47 -0500, Ron Johnson wrote:
> On Sun, Feb 11, 2024 at 11:54 AM veem v <veema0000@gmail.com> wrote:
> When you said "you would normally prefer those over numeric " I was
> thinking the opposite. As you mentioned integer is a fixed length data type
> and will occupy 4 bytes whether you store 15 or 99999999.But in case of
> variable length type like Number or numeric , it will resize itself based
> on the actual data, So is there any downside of going with the variable
> length data type like Numeric,
>
>
> Consider a table with a bunch of NUMERIC fields. One of those records has
> small values (aka three bytes). It fits neatly in 2KiB.
>
> And then you update all those NUMERIC fields to big numbers that take 15
> bytes. Suddenly (or eventually, if you update them at different times), the
> record does not fit in 2KiB, and so must be moved to its own.page. That causes
> extra IO.
I think that's not much of a concern with PostgreSQL because you can't
update a row in-place anyway because of MVCC. So in any case you're
writing a new row. If you're lucky there is enough free space in the same
page and you can do a HOT update, but that's quite independent on
whether the row changes size.
I think that's not much of a concern with PostgreSQL because you can't
update a row in-place anyway because of MVCC.
Good to know. So it means here in postgres, there is no such concern like "row chaining", "row migration" etc.
which we normally have in a non mvcc database (like Oracle say).
Another thing I noticed the shared_buffer parameters set as 2029684 in this instance, which comes to ~21MB and that seems very small for a database operating in large scale. And I see we have RAM in the instance showing as ~256GB. So thinking of bumping it to something as ~40-50GB.
Hope that will help to some extent. Not sure if there is methods to manually, cache some objects(tables/indexes) which were getting used frequently by the read queries.
So we were thinking, adding many column to a table should be fine in postgres (as here we have a use case in which total number of columns may go till ~500+). But then, considering the access of columns towards the end of a row is going to add more time as compared to the column which is at the start of the row. As you mentioned, accessing 100th column may add 4 to 5 times more as compared to the access of first column. So , is it advisable here to go for similar approach of breaking the table into two , if the total number of column reaches certain number/threshold for a table?
The fixed width types are those that the CPU can directly process:
Integers with 16, 32 and 64 bits, floating point numbers with 32 and 64
bits. The CPU can read and write them with a single memory access, it
can do arithmetic with a single instruction, etc.
Number/Numeric are not native types on any CPU. To read them the CPU
needs several memory accesses (probably one per byte unless you get
really clever) and then it can't do any calculations with them
directly, instead it has run a subroutine which does operations on
little chunks and then puts those chunks together again - basically the
same as you do when you're doing long addition or multiplication on
paper. So that's not very efficient.
So it looks like the fixed length data type(like integer, float) should be the first choice while choosing the data type of the attributes wherever possible, as these are native types.
(Like choosing "Integer/float" over "Numeric", "Char" over "Varchar" etc).
However I do see even in Oracle databases, we have Integer type too, but it's suggesting(For e.g. in below blog) to rather go with Number types over Integer and Varchar2 over Char, which is opposite of what we are discussing here. Is the fixed length data type behaves differently in postgres vs oracle and thus should be treated differently?
From above blog:-
When to use CHAR: There should be no reason to use the CHAR data type, as it is similar to a VARCHAR2 and it’s better to be consistent.When to use INTEGER: You should use the NUMBER data type instead.
On Mon, Feb 12, 2024 at 3:23 PM veem v <veema0000@gmail.com> wrote:[snip]So it looks like the fixed length data type(like integer, float) should be the first choice while choosing the data type of the attributes wherever possible, as these are native types.Correct.(Like choosing "Integer/float" over "Numeric", "Char" over "Varchar" etc).However I do see even in Oracle databases, we have Integer type too, but it's suggesting(For e.g. in below blog) to rather go with Number types over Integer and Varchar2 over Char, which is opposite of what we are discussing here. Is the fixed length data type behaves differently in postgres vs oracle and thus should be treated differently?From above blog:-Oracle is not Postgresql. WTH are you using an Oracle blog to decide on Postgresql data types????When to use CHAR: There should be no reason to use the CHAR data type, as it is similar to a VARCHAR2 and it’s better to be consistent.When to use INTEGER: You should use the NUMBER data type instead.Did you actually read that blog post?Have you even read the Postgresql documentation on data types?
On 2024-02-12 11:28:41 -0500, Greg Sabino Mullane wrote: > On Mon, Feb 12, 2024 at 1:50 AM veem v <veema0000@gmail.com> wrote: > > So we were thinking, adding many column to a table should be fine in > postgres (as here we have a use case in which total number of columns may > go till ~500+). But then, considering the access of columns towards the > end of a row is going to add more time as compared to the column which is > at the start of the row. As you mentioned, accessing 100th column may add 4 > to 5 times more as compared to the access of first column. So , is it > advisable here to go for similar approach of breaking the table into two , > if the total number of column reaches certain number/threshold for a table? > > > I'm not sure of what Peter was testing exactly to get those 4-5x figures, Sorry, I should have included my benchmark code (it's short enough - see below). What i was actually timing was select count(*) from t_postgresql_column_bench where v{i} = 'a' for various i. > but I presume that is column access time, That was the goal. Of course there is always some overhead but I figured that by counting rows where a column has a constant value the overhead is minimal or at least constant. > which would not mean a direct effect on your total query time of 4-5x. Right. In any real application the column access time is only a part of the total processing time and probably a small part, so the effect on total processing time is correspondingly smaller. hp ------------------------------------------------------------------------ #!/usr/bin/python3 import random import time import psycopg2 n_cols = 100 n_rows = 100000 db = psycopg2.connect("") csr = db.cursor() csr.execute("drop table if exists t_postgresql_column_bench") q = "create table t_postgresql_column_bench (" q += ", ".join(f"v{i} text" for i in range(n_cols)) q += ")" csr.execute(q) q = "insert into t_postgresql_column_bench values(" q += ", ".join("%s" for i in range(n_cols)) q += ")" for j in range(n_rows): v = [ chr(random.randint(96+1, 96+26)) for i in range(n_cols)] csr.execute(q, v) db.commit() for i in range(n_cols): q = f"select count(*) from t_postgresql_column_bench where v{i} = 'a'" t0 = time.clock_gettime(time.CLOCK_MONOTONIC) csr.execute(q) r = csr.fetchall() print(r) t1 = time.clock_gettime(time.CLOCK_MONOTONIC) print(i, t1 - t0) db.commit() ------------------------------------------------------------------------ -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On 2024-02-13 01:53:25 +0530, veem v wrote: > On Mon, 12 Feb 2024 at 03:40, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > > The fixed width types are those that the CPU can directly process: > Integers with 16, 32 and 64 bits, floating point numbers with 32 and 64 > bits. The CPU can read and write them with a single memory access, it > can do arithmetic with a single instruction, etc. > > Number/Numeric are not native types on any CPU. To read them the CPU > needs several memory accesses (probably one per byte unless you get > really clever) and then it can't do any calculations with them > directly, instead it has run a subroutine which does operations on > little chunks and then puts those chunks together again - basically the > same as you do when you're doing long addition or multiplication on > paper. So that's not very efficient. > > > So it looks like the fixed length data type(like integer, float) should be the > first choice while choosing the data type of the attributes wherever possible, > as these are native types. (Like choosing "Integer/float" over "Numeric", > "Char" over "Varchar" etc). Please do not conflate "char(n)" with native machine types like int or float. These are very different things. A char(n) is string of fixed but arbitrary length. This is not something a CPU can process in a single instruction. It has to go over it character by character. There is almost never a reason to use char(n). Just use varchar(n) or in the case of PostgreSQL just varchar or text. > However I do see even in Oracle databases, we have Integer type too, Not really. INTEGER is just an alias for NUMBER(38) in Oracle (see for example https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlqr/Data-Types.html). It's not the same as an INTEGER in PostgreSQL. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
Please do not conflate "char(n)" with native machine types like int or
float. These are very different things. A char(n) is string of fixed but
arbitrary length. This is not something a CPU can process in a single
instruction. It has to go over it character by character.
There is almost never a reason to use char(n). Just use varchar(n) or in
the case of PostgreSQL just varchar or text.
> However I do see even in Oracle databases, we have Integer type too,
Not really. INTEGER is just an alias for NUMBER(38) in Oracle (see for
example
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlqr/Data-Types.html).
It's not the same as an INTEGER in PostgreSQL.
Basically as i understood, we can follow below steps in serial,
Step-1)First wherever possible use Smallint,Integer,bigint,float data types rather than numeric. This will give better performance.
Step-2)Use the frequently queried columns first and least frequently queried columns towards last in the row while creating the table. This is too intended for better performance.
Step-3)Define the columns with typlen desc as per below formula( column tetris symptom). This is for better storage space utilization.
SELECT a.attname, t.typname, t.typalign, t.typlen
FROM pg_class c
JOIN pg_attribute a ON (a.attrelid = c.oid)
JOIN pg_type t ON (t.oid = a.atttypid)
WHERE c.relname = 'user_order'
AND a.attnum >= 0
ORDER BY t.typlen DESC;
One question here, if we have defined one column as a fixed length data type "integer" and slowly we noticed the length of data keeps increasing (in case of a sequence generated PK column which will keep increasing), and we want to alter the column to "bigint" now. In such scenario, will it append/pad the additional spaces to all the existing values which were already stored with integer type initially in the table. And that would be then an increase to the existing table storage. Please correct me if I'm wrong.
One question here, if we have defined one column as a fixed length data type "integer" and slowly we noticed the length of data keeps increasing (in case of a sequence generated PK column which will keep increasing), and we want to alter the column to "bigint" now. In such scenario, will it append/pad the additional spaces to all the existing values which were already stored with integer type initially in the table. And that would be then an increase to the existing table storage. Please correct me if I'm wrong.
On Tue, Feb 13, 2024 at 4:17 PM veem v <veema0000@gmail.com> wrote:[sni[]One question here, if we have defined one column as a fixed length data type "integer" and slowly we noticed the length of data keeps increasing (in case of a sequence generated PK column which will keep increasing), and we want to alter the column to "bigint" now. In such scenario, will it append/pad the additional spaces to all the existing values which were already stored with integer type initially in the table. And that would be then an increase to the existing table storage. Please correct me if I'm wrong.ALTER TABLE foo ALTER COLUMN bar BIGINT; rewrites the whole table. Do yourself a favor, and start with BIGINT.
On Wed, 14 Feb, 2024, 10:30 am Ron Johnson, <ronljohnsonjr@gmail.com> wrote:On Tue, Feb 13, 2024 at 4:17 PM veem v <veema0000@gmail.com> wrote:[sni[]One question here, if we have defined one column as a fixed length data type "integer" and slowly we noticed the length of data keeps increasing (in case of a sequence generated PK column which will keep increasing), and we want to alter the column to "bigint" now. In such scenario, will it append/pad the additional spaces to all the existing values which were already stored with integer type initially in the table. And that would be then an increase to the existing table storage. Please correct me if I'm wrong.ALTER TABLE foo ALTER COLUMN bar BIGINT; rewrites the whole table. Do yourself a favor, and start with BIGINT.Not yet tried, but that's interesting. I was initially thinking as it will be increasing the length, so it would be just a metadata change and finish within seconds.
It depends:
https://www.postgresql.org/docs/current/sql-altertable.html
"Adding a column with a volatile DEFAULT
or changing the type of an existing column will require the entire table and its indexes to be rewritten. As an exception, when changing the type of an existing column, if the USING
clause does not change the column contents and the old type is either binary coercible to the new type or an unconstrained domain over the new type, a table rewrite is not needed. However, indexes must always be rebuilt unless the system can verify that the new index would be logically equivalent to the existing one. For example, if the collation for a column has been changed, an index rebuild is always required because the new sort order might be different. However, in the absence of a collation change, a column can be changed from text
to varchar
(or vice versa) without rebuilding the indexes because these data types sort identically. Table and/or index rebuilds may take a significant amount of time for a large table; and will temporarily require as much as double the disk space.
"
create table int_test(int_fld integer);
insert into int_test select * from generate_series(1, 10000, 1);
select ctid, int_fld from int_test ;
ctid | int_fld
----------+---------
(0,1) | 1
(0,2) | 2
(0,3) | 3
(0,4) | 4
(0,5) | 5
(0,6) | 6
(0,7) | 7
(0,8) | 8
(0,9) | 9
(0,10) | 10
alter table int_test alter column int_fld set data type bigint;
select ctid, int_fld from int_test ;
ctid | int_fld
----------+---------
(0,1) | 1
(0,2) | 2
(0,3) | 3
(0,4) | 4
(0,5) | 5
(0,6) | 6
(0,7) | 7
(0,8) | 8
(0,9) | 9
(0,10) | 10
update int_test set int_fld = int_fld;
select ctid, int_fld from int_test order by int_fld;
(63,1) | 1
(63,2) | 2
(63,3) | 3
(63,4) | 4
(63,5) | 5
(63,6) | 6
(63,7) | 7
(63,8) | 8
(63,9) | 9
(63,10) | 10
Where ctid is:
https://www.postgresql.org/docs/current/ddl-system-columns.html
"
ctid
The physical location of the row version within its table. Note that although the
ctid
can be used to locate the row version very quickly, a row'sctid
will change if it is updated or moved byVACUUM FULL
. Thereforectid
is useless as a long-term row identifier. A primary key should be used to identify logical rows.- "
But as you mentioned, it seems to be the effect of "fixed length data type" which is why it's going to rewrite whole table even we just increases the column length. Hope it won't be the case in variable length data type.
-- Adrian Klaver adrian.klaver@aklaver.com
It depends:
https://www.postgresql.org/docs/current/sql-altertable.html
"Adding a column with a volatile
DEFAULT
or changing the type of an existing column will require the entire table and its indexes to be rewritten. As an exception, when changing the type of an existing column, if theUSING
clause does not change the column contents and the old type is either binary coercible to the new type or an unconstrained domain over the new type, a table rewrite is not needed. However, indexes must always be rebuilt unless the system can verify that the new index would be logically equivalent to the existing one. For example, if the collation for a column has been changed, an index rebuild is always required because the new sort order might be different. However, in the absence of a collation change, a column can be changed fromtext
tovarchar
(or vice versa) without rebuilding the indexes because these data types sort identically. Table and/or index rebuilds may take a significant amount of time for a large table; and will temporarily require as much as double the disk space."
create table int_test(int_fld integer);
insert into int_test select * from generate_series(1, 10000, 1);
select ctid, int_fld from int_test ;
ctid | int_fld
----------+---------
(0,1) | 1
(0,2) | 2
(0,3) | 3
(0,4) | 4
(0,5) | 5
(0,6) | 6
(0,7) | 7
(0,8) | 8
(0,9) | 9
(0,10) | 10
alter table int_test alter column int_fld set data type bigint;
select ctid, int_fld from int_test ;
ctid | int_fld
----------+---------
(0,1) | 1
(0,2) | 2
(0,3) | 3
(0,4) | 4
(0,5) | 5
(0,6) | 6
(0,7) | 7
(0,8) | 8
(0,9) | 9
(0,10) | 10update int_test set int_fld = int_fld;
select ctid, int_fld from int_test order by int_fld;
(63,1) | 1
(63,2) | 2
(63,3) | 3
(63,4) | 4
(63,5) | 5
(63,6) | 6
(63,7) | 7
(63,8) | 8
(63,9) | 9
(63,10) | 10
Where ctid is:
https://www.postgresql.org/docs/current/ddl-system-columns.html
On Thu, 15 Feb 2024 at 00:43, Adrian Klaver <adrian.klaver@aklaver.com> wrote:It depends:
https://www.postgresql.org/docs/current/sql-altertable.html
"Adding a column with a volatile
DEFAULT
or changing the type of an existing column will require the entire table and its indexes to be rewritten. As an exception, when changing the type of an existing column, if theUSING
clause does not change the column contents and the old type is either binary coercible to the new type or an unconstrained domain over the new type, a table rewrite is not needed. However, indexes must always be rebuilt unless the system can verify that the new index would be logically equivalent to the existing one. For example, if the collation for a column has been changed, an index rebuild is always required because the new sort order might be different. However, in the absence of a collation change, a column can be changed fromtext
tovarchar
(or vice versa) without rebuilding the indexes because these data types sort identically. Table and/or index rebuilds may take a significant amount of time for a large table; and will temporarily require as much as double the disk space."
create table int_test(int_fld integer);
insert into int_test select * from generate_series(1, 10000, 1);
select ctid, int_fld from int_test ;
ctid | int_fld
----------+---------
(0,1) | 1
(0,2) | 2
(0,3) | 3
(0,4) | 4
(0,5) | 5
(0,6) | 6
(0,7) | 7
(0,8) | 8
(0,9) | 9
(0,10) | 10
alter table int_test alter column int_fld set data type bigint;
select ctid, int_fld from int_test ;
ctid | int_fld
----------+---------
(0,1) | 1
(0,2) | 2
(0,3) | 3
(0,4) | 4
(0,5) | 5
(0,6) | 6
(0,7) | 7
(0,8) | 8
(0,9) | 9
(0,10) | 10update int_test set int_fld = int_fld;
select ctid, int_fld from int_test order by int_fld;
(63,1) | 1
(63,2) | 2
(63,3) | 3
(63,4) | 4
(63,5) | 5
(63,6) | 6
(63,7) | 7
(63,8) | 8
(63,9) | 9
(63,10) | 10
Where ctid is:
https://www.postgresql.org/docs/current/ddl-system-columns.html
Thank you so much.So as I also tested the same as you posted, there has been no change in "ctid" , when I altered the column data type from 'int' to 'bigint' in the table, so that means full table rewriting won't happen in such a scenario.
On Tuesday, February 13, 2024, veem v <veema0000@gmail.com> wrote:
float data types rather than numeric. This will give better performance.
So as I also tested the same as you posted, there has been no change in "ctid" , when I altered the column data type from 'int' to 'bigint' in the table, so that means full table rewriting won't happen in such a scenario.
On 2/15/24 08:16, Greg Sabino Mullane wrote: > So as I also tested the same as you posted, there has been no change > in "ctid" , when I altered the column data type from 'int' to > 'bigint' in the table, so that means full table rewriting > won't happen in such a scenario. > > > No it was definitely rewritten - do not depend on the ctid to verify > that. Take our word for it, or > use*pg_relation_filenode('int_test');* before and after, as well as > *pg_relation_size('int_test')*; That is a mixed bag: test=# select pg_relation_filenode('int_test'); pg_relation_filenode ---------------------- 69999 (1 row) test=# select pg_relation_size('int_test'); pg_relation_size ------------------ 368640 (1 row) test=# alter table int_test alter column int_fld set data type bigint; ALTER TABLE test=# select pg_relation_filenode('int_test'); pg_relation_filenode ---------------------- 70002 (1 row) test=# select pg_relation_size('int_test'); pg_relation_size ------------------ 368640 > > Cheers, > Greg > -- Adrian Klaver adrian.klaver@aklaver.com
That is a mixed bag:
On 2/15/24 09:00, Greg Sabino Mullane wrote: > On Thu, Feb 15, 2024 at 11:43 AM Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: > > That is a mixed bag: > > > Ha! Good point. Our contrived example table does suffer from that, so > perhaps the test should be: > > create table int_test(c1 int, c2 int); Alright now I see: test=# create table int_test(c1 int, c2 int); CREATE TABLE test=# select pg_relation_filenode('int_test'); pg_relation_filenode ---------------------- 70021 (1 row) test=# insert into int_test select a, a+1 from generate_series(1, 10000, 1) as t(a); INSERT 0 10000 test=# select pg_relation_size('int_test'); pg_relation_size ------------------ 368640 (1 row) test=# alter table int_test alter column c2 set data type bigint; ALTER TABLE test=# select pg_relation_filenode('int_test'); pg_relation_filenode ---------------------- 70024 (1 row) test=# select pg_relation_size('int_test'); pg_relation_size ------------------ 450560 (1 row) > > Cheers, > Greg > -- Adrian Klaver adrian.klaver@aklaver.com
On 2/15/24 09:00, Greg Sabino Mullane wrote:
> On Thu, Feb 15, 2024 at 11:43 AM Adrian Klaver
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
>
> That is a mixed bag:
>
>
> Ha! Good point. Our contrived example table does suffer from that, so
> perhaps the test should be:
>
> create table int_test(c1 int, c2 int);
Alright now I see:
test=# create table int_test(c1 int, c2 int);
CREATE TABLE
test=# select pg_relation_filenode('int_test');
pg_relation_filenode
----------------------
70021
(1 row)
test=# insert into int_test select a, a+1 from generate_series(1,
10000, 1) as t(a);
INSERT 0 10000
test=# select pg_relation_size('int_test');
pg_relation_size
------------------
368640
(1 row)
test=# alter table int_test alter column c2 set data type bigint;
ALTER TABLE
test=# select pg_relation_filenode('int_test');
pg_relation_filenode
----------------------
70024
(1 row)
test=# select pg_relation_size('int_test');
pg_relation_size
------------------
450560
(1 row)
So one learning for me, i.e. one of the downside of fixed length data type is, with fixed length data types any future changes to it , will be a full table rewrite. And thus this sort of change for big tables will be a nightmare.
On 2024-02-14 22:55:01 -0700, David G. Johnston wrote: > On Tuesday, February 13, 2024, veem v <veema0000@gmail.com> wrote: > > float data types rather than numeric. This will give better performance. > > > Only use an inexact floating-point data type if you truly understand what you > are getting yourself into. Quickly getting the wrong answer isn’t tolerable > solution. Do NOT assume that a decimal type (even if it can grow to ridiculuous lengths like PostgreSQL's numeric) is exact in the mathematical sense. It isn't. It cannot represent almost all real numbers. No pi or e of course, but also no seemingly simple fractions like 1/3 or 1/7. Unless you never divide anything, you will need to be aware of the rounding behaviour, just as you have to with binary floating point types. And if you use a finite precision you will also have to deal with rounding on multiplication (and possibly even addition and subtraction, if you use different precisions). Almost 40 years ago, our numerical methods professor started his first lecture with the sentence "You can use a computer for anything - except computing". He spent the rest of the semester proving himself wrong, of course, but computing correctly is hard - and choosing a data type which more closely mimics the way we learn to compute in primary school doesn't necessarily make it easier. Mostly it just makes it harder to spot the errors ;-). hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On 2024-02-14 22:55:01 -0700, David G. Johnston wrote:
> On Tuesday, February 13, 2024, veem v <veema0000@gmail.com> wrote:
>
> float data types rather than numeric. This will give better performance.
>
>
> Only use an inexact floating-point data type if you truly understand what you
> are getting yourself into. Quickly getting the wrong answer isn’t tolerable
> solution.
Do NOT assume that a decimal type (even if it can grow to ridiculuous
lengths like PostgreSQL's numeric) is exact in the mathematical sense.
It isn't. It cannot represent almost all real numbers
On 2024-02-15 16:51:56 -0700, David G. Johnston wrote: > On Thu, Feb 15, 2024 at 4:31 PM Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > On 2024-02-14 22:55:01 -0700, David G. Johnston wrote: > > On Tuesday, February 13, 2024, veem v <veema0000@gmail.com> wrote: > > > > float data types rather than numeric. This will give better > > performance. > > > > > > Only use an inexact floating-point data type if you truly > > understand what you are getting yourself into. Quickly getting > > the wrong answer isn’t tolerable solution. > > Do NOT assume that a decimal type (even if it can grow to ridiculuous > lengths like PostgreSQL's numeric) is exact in the mathematical sense. > It isn't. It cannot represent almost all real numbers > > > That is an unusual definition for exact, I wouldn't have considered the > requirement to represent all real numbers to be included in it. > > What you see with an exact type is what you get, which allows for implementing > equality, unlike inexact which requires epsilon checking. You can check binary fp values for equality. If they are equal, they will compare equal. If they aren't, they won't. What you can't expect is that the laws of commutativity, associativity, etc. hold. If you compute a value in two different ways which should be equivalent mathematically (e.g. a*b/c vs. a/c*b), the result may be different and an equality test may fail. But that is also the case for numeric (and of course integer). You might get around that by epsilon checking, but whether that's the right thing to do depends on your application. And most importantly, and gets most people on the "oh noes, binary fp is inexact" bandwagon is that decimal fractions (1/10, 1/100, ...) are not exactly representable in binary, just like 1/3, 1/7, 1/11, ... aren't exactly represntable in decimal. People are used the latter, but not the former. But mathematically, that doesn't really make a difference. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On 2024-02-15 16:51:56 -0700, David G. Johnston wrote:
> On Thu, Feb 15, 2024 at 4:31 PM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
> On 2024-02-14 22:55:01 -0700, David G. Johnston wrote:
> > On Tuesday, February 13, 2024, veem v <veema0000@gmail.com> wrote:
> >
> > float data types rather than numeric. This will give better
> > performance.
> >
> >
> > Only use an inexact floating-point data type if you truly
> > understand what you are getting yourself into. Quickly getting
> > the wrong answer isn’t tolerable solution.
>
> Do NOT assume that a decimal type (even if it can grow to ridiculuous
> lengths like PostgreSQL's numeric) is exact in the mathematical sense.
> It isn't. It cannot represent almost all real numbers
>
>
On 2024-02-16 12:10:20 +0530, veem v wrote: > > On Fri, 16 Feb 2024 at 06:04, Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > > On 2024-02-15 16:51:56 -0700, David G. Johnston wrote: > > On Thu, Feb 15, 2024 at 4:31 PM Peter J. Holzer <hjp-pgsql@hjp.at> wrote: > > On 2024-02-14 22:55:01 -0700, David G. Johnston wrote: > > > On Tuesday, February 13, 2024, veem v <veema0000@gmail.com> wrote: > > > > > > float data types rather than numeric. This will give better > > > performance. > > > > > > > > > Only use an inexact floating-point data type if you truly > > > understand what you are getting yourself into. Quickly getting > > > the wrong answer isn’t tolerable solution. > > > > Do NOT assume that a decimal type (even if it can grow to ridiculuous > > lengths like PostgreSQL's numeric) is exact in the mathematical > sense. > > It isn't. It cannot represent almost all real numbers > > > > > > > Thank You. > > So it looks like the use cases where we don't need precision or decimal point > values to be stored in postgres , integer data type is the way to go without a > doubt. > > However in cases of precision is required, as you all mentioned there are > certain issues(rounding error etc) with "Float" data type and considering a > normal developers usage point of view, it should be the Numeric type which we > should use. You misunderstood. My point was that these rounding errors also happen with numeric, and if you want to avoid or minimize them you have to understand what you are doing. For a hilarious example of what happens if you don't understand that, see https://en.wikipedia.org/wiki/Vancouver_Stock_Exchange#Rounding_errors_on_its_Index_price I basically see two reasons to use numeric: * Your numbers are amounts of money. Accountants are neither mathematicians nor engineers, and numeric mimics the way they think. So the results will be wrong in the correct way ;-) * You need lots (more than 15 or 18) digits. For anything else there is a good chance that float8 or int8 is a better choice, because those types behave much more consistently. Of course there are valid reasons to use other types (including numeric) but the point is that each type has real pros and cons and false arguments like "numeric is an exact type and float isn't" is not helpful. That said, "I don't understand binary numbers" might be a valid reason. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On 2024-02-16 01:34:01 +0100, Peter J. Holzer wrote: > On 2024-02-15 16:51:56 -0700, David G. Johnston wrote: > > What you see with an exact type is what you get, which allows for implementing > > equality, unlike inexact which requires epsilon checking. > > You can check binary fp values for equality. If they are equal, they > will compare equal. If they aren't, they won't. > > What you can't expect is that the laws of commutativity, associativity, > etc. hold. If you compute a value in two different ways which should be > equivalent mathematically (e.g. a*b/c vs. a/c*b), the result may be > different and an equality test may fail. But that is also the case for > numeric (and of course integer). To illustrate that point: hjp=> create table t_n (a numeric, b numeric, c numeric); CREATE TABLE hjp=> insert into t_n values(47, 52, 10); INSERT 0 1 -- the numbers are not specially chosen. I just invoked -- select (random()*100)::int; -- three times, and they were the ones that came up. hjp=> select *, a / b * c, a * c / b, a / b * c = a * c / b from t_n; ╔════╤════╤════╤════════════════════════╤════════════════════╤══════════╗ ║ a │ b │ c │ ?column? │ ?column? │ ?column? ║ ╟────┼────┼────┼────────────────────────┼────────────────────┼──────────╢ ║ 47 │ 52 │ 10 │ 9.03846153846153846150 │ 9.0384615384615385 │ f ║ ╚════╧════╧════╧════════════════════════╧════════════════════╧══════════╝ (1 row) So with type numeric two expressions which should be equal mathematically, aren't in fact equal. Now let's try the same thing with binary floating point: hjp=> create table t_f (a float8, b float8, c float8); CREATE TABLE hjp=> insert into t_f values(47, 52, 10); INSERT 0 1 hjp=> select *, a / b * c, a * c / b, a / b * c = a * c / b from t_f; ╔════╤════╤════╤═══════════════════╤═══════════════════╤══════════╗ ║ a │ b │ c │ ?column? │ ?column? │ ?column? ║ ╟────┼────┼────┼───────────────────┼───────────────────┼──────────╢ ║ 47 │ 52 │ 10 │ 9.038461538461538 │ 9.038461538461538 │ t ║ ╚════╧════╧════╧═══════════════════╧═══════════════════╧══════════╝ (1 row) Now they are indeed equal. This is *not* guaranteed and I got a bit lucky here, but the fact that I got lucky on the first try shows that "float bad, numeric good" is not backed up by reality. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"