Thread: Count of non-null values per table column
Hello list,<br><br>Apologies if this has been asked before. My search only turned up ways to list the total non-null valuesfor all columns as a single number. I want the count for each column by column.<br><br>I have inherited a databaseconsisting of two related huge monolithic tables that lack referential integrity between them, or even basic dataconstraints. One of the problems these tables have is every single non-PK column is NULLable. I am trying to understandthe information that is actually stored and used so that I can implement a (hopefully) better design. Towards thatend I would like to know the count of non-null values in each column per column. In other words I would like to get thefollowing output from a table (the numbers are totally made up):<br><br>column_name | num_values<br>------------+-----------<br>col1 | 5787<br>------------+-----------<br>col2 | 17<br>------------+-----------<br>col3 | 567<br>------------+-----------<br>col4 | 5787<br>------------+-----------<br>col5 | 143<br>------------+-----------<br>col6 | 1<br>------------+-----------<br>...<br><br>Is this possible throughone or more of the system views, or will I need to write a function to do this? Obviously I can just issue multipleSELECT COUNT(column)... statements, but I'd rather not.<br><br>Thanks,<br>David
Well it is certainly nice to see that my choice to send my question using plain text was honored by this email service. Apologiesfor that mess. The output I am looking for is a series of rows with two columns, one the name of the table column,and the other the count of non-null values in a table's column of that same name, for all column names in the table.<br><br>Thanks<br><br>----------------------------------------<br>>From: dlnelson77808@outlook.com<br>> To: pgsql-general@postgresql.org<br>>Subject: [GENERAL] Count of non-null values per table column<br>> Date: Fri, 14 Aug2015 14:32:36 +0000<br>><br>> Hello list,<br><br>Apologies if this has been asked before. My searchonly turned up ways to list the total non-null values for all columns as a single number. I want the count for eachcolumn by column.<br><br>I have inherited a database consisting of two related huge monolithic tables thatlack referential integrity between them, or even basic data constraints. One of the problems these tables have is everysingle non-PK column is NULLable. I am trying to understand the information that is actually stored and used so thatI can implement a (hopefully) better design. Towards that end I would like to know the count of non-null values in eachcolumn per column. In other words I would like to get the following output from a table (the numbers are totally madeup):<br><br>column_name | num_values<br>------------+-----------<br>col1 | 5787<br>------------+-----------<br>col2 | 17<br>------------+-----------<br>col3 | 567<br>------------+-----------<br>col4 | 5787<br>------------+-----------<br>col5 | 143<br>------------+-----------<br>col6 | 1<br>------------+-----------<br>...<br><br>Isthis possible through one or more of the system views,or will I need to write a function to do this? Obviously I can just issue multiple SELECT COUNT(column)... statements,but I'd rather not.<br><br>Thanks,<br>David<br>><br>> --<br>> Sent via pgsql-generalmailing list (pgsql-general@postgresql.org)<br>> To make changes to your subscription:<br>> http://www.postgresql.org/mailpref/pgsql-general<br>
On 15 August 2015 at 02:32, David Nelson <dlnelson77808@outlook.com> wrote:
Hello list,<br><br>Apologies if this has been asked before. My search only turned up ways to list the total non-null values for all columns as a single number. I want the count for each column by column.<br><br>I have inherited a database consisting of two related huge monolithic tables that lack referential integrity between them, or even basic data constraints. One of the problems these tables have is every single non-PK column is NULLable. I am trying to understand the information that is actually stored and used so that I can implement a (hopefully) better design. Towards that end I would like to know the count of non-null values in each column per column. In other words I would like to get the following output from a table (the numbers are totally made up):<br><br>column_name | num_values<br>------------+-----------<br>col1 | 5787<br>------------+-----------<br>col2 | 17<br>------------+-----------<br>col3 | 567<br>------------+-----------<br>col4 | 5787<br>------------+-----------<br>col5 | 143<br>------------+-----------<br>col6 | 1<br>------------+-----------<br>...<br><br>Is this possible through one or more of the system views, or will I need to write a function to do this? Obviously I can just issue multiple SELECT COUNT(column)... statements, but I'd rather not.<br><br>Thanks,<br>David
I didn't quite catch a Postgres version number in that mess :)
I assume the tables are quite large if you don't want to just issue a: SELECT COUNT(a) FILTER(WHERE a IS NULL),COUNT(b) FILTER(WHERE b IS NULL) ... (assuming you're on a version new enough to support agg FILTER)
On the other hand if you were happy with just an approximation then you could look at pg_stats;
create table abc(a int, b int,c int);
insert into abc values(1, 1, NULL),(1, NULL, NULL),(NULL, NULL, NULL);
analyze abc;
select attname, null_frac from pg_stats where tablename = 'abc';
attname | null_frac
---------+-----------
a | 0.333333
b | 0.666667
c | 1
Keep in mind though that this is an *approximation* and possibly could be inaccurate. If you want exact do the COUNT(col).
If you're not on a version new enough for COUNT(col) FILTER WHERE ..., you could just SUM(CASE WHEN col IS NULL THEN 1 ELSE 0 END)
Regards
David Rowley
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
David,
It still came through as junk. But I reconstructed it below
=== original message ===
Apologies if this has been asked before. My search only turned up ways to list the total non-null values for all columns as a single number. I want the count for each column by column.
I have inherited a database consisting of two related huge monolithic tables that lack referential integrity between them, or even basic data constraints. One of the problems these tables have is every single non-PK column is NULLable. I am trying to understand the information that is actually stored and used so that I can implement a (hopefully) better design. Towards that end I would like to know the count of non-null values in each column per column. In other words I would like to get the following output from a table (the numbers are totally made up):
column_name | num_values
------------+-----------
col1 | 5787
------------+-----------
col2 | 17
------------+-----------
col3 | 567
------------+-----------
col4 | 5787
------------+-----------
col5 | 143
------------+-----------
col6 | 1
------------+-----------
====
I assume what "num_values" contains is the _distinctly different_ number of values in "column_name" for each and every column name in a table. E.g. if "col1" contains value 'x' twice and 'y' ten times,then "num_values" would be 2, not 12. Or do you really want the 12? I'm unsure.
Schrodinger's backup: The condition of any backup is unknown until a restore is attempted.
Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be.
He's about as useful as a wax frying pan.
10 to the 12th power microphones = 1 Megaphone
Maranatha! <><
John McKown
10 to the 12th power microphones = 1 Megaphone
Maranatha! <><
John McKown
David Rowley <david.rowley@2ndquadrant.com> writes: > On 15 August 2015 at 02:32, David Nelson <dlnelson77808@outlook.com> wrote: >> Hello list,<br><br>Apologies if this has been asked before. My search only >> turned up ways to list the total non-null values for all columns as a >> single number. I want the count for each column by column. > I assume the tables are quite large if you don't want to just issue a: > SELECT COUNT(a) FILTER(WHERE a IS NULL),COUNT(b) FILTER(WHERE b IS NULL) > ... (assuming you're on a version new enough to support agg FILTER) AFAIK this should work in any version, or indeed any SQL-compliant DBMS: select count(col1), count(col2), ... from table; COUNT with an argument counts the non-null values of that argument. > On the other hand if you were happy with just an approximation then you > could look at pg_stats; Yeah; you might want to ANALYZE the table first to be sure the stats are up to date. regards, tom lane
On Fri, Aug 14, 2015 at 9:59 AM, John McKown <john.archie.mckown@gmail.com> wrote:
>
> David,
>
> It still came through as junk. But I reconstructed it below
>
> === original message ===
> Apologies if this has been asked before. My search only turned up ways to list the total non-null values for all columns as a single number. I want the count for each column by column.
>
> I have inherited a database consisting of two related huge monolithic tables that lack referential integrity between them, or even basic data constraints. One of the problems these tables have is every single non-PK column is NULLable. I am trying to understand the information that is actually stored and used so that I can implement a (hopefully) better design. Towards that end I would like to know the count of non-null values in each column per column. In other words I would like to get the following output from a table (the numbers are totally made up):
>
> column_name | num_values
> ------------+-----------
> col1 | 5787
> ------------+-----------
> col2 | 17
> ------------+-----------
> col3 | 567
> ------------+-----------
> col4 | 5787
> ------------+-----------
> col5 | 143
> ------------+-----------
> col6 | 1
> ------------+-----------
>
> ====
>
> I assume what "num_values" contains is the _distinctly different_ number of values in "column_name" for each and every column name in a table. E.g. if "col1" contains value 'x' twice and 'y' ten times,then "num_values" would be 2, not 12. Or do you really want the 12? I'm unsure.
>
Thanks John for fixing that which microsoft screwed up, and I've switched to my gmail account for this.
For my purposes 12 would be fine. I just want to know of the total number of
rows in the table, how many have a value in each column. I guess I'm actually trying to get the complement of the number of NULLs per column in the end.
BTW, aplologies for omitting basic info:
SELECT version();
version
----------------------------------------------------------------------------------------------------------------
PostgreSQL 9.1.18 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11), 64-bit
>
>
> --
>
> Schrodinger's backup: The condition of any backup is unknown until a restore is attempted.
>
> Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be.
>
> He's about as useful as a wax frying pan.
>
> 10 to the 12th power microphones = 1 Megaphone
>
> Maranatha! <><
> John McKown
>
> David,
>
> It still came through as junk. But I reconstructed it below
>
> === original message ===
> Apologies if this has been asked before. My search only turned up ways to list the total non-null values for all columns as a single number. I want the count for each column by column.
>
> I have inherited a database consisting of two related huge monolithic tables that lack referential integrity between them, or even basic data constraints. One of the problems these tables have is every single non-PK column is NULLable. I am trying to understand the information that is actually stored and used so that I can implement a (hopefully) better design. Towards that end I would like to know the count of non-null values in each column per column. In other words I would like to get the following output from a table (the numbers are totally made up):
>
> column_name | num_values
> ------------+-----------
> col1 | 5787
> ------------+-----------
> col2 | 17
> ------------+-----------
> col3 | 567
> ------------+-----------
> col4 | 5787
> ------------+-----------
> col5 | 143
> ------------+-----------
> col6 | 1
> ------------+-----------
>
> ====
>
> I assume what "num_values" contains is the _distinctly different_ number of values in "column_name" for each and every column name in a table. E.g. if "col1" contains value 'x' twice and 'y' ten times,then "num_values" would be 2, not 12. Or do you really want the 12? I'm unsure.
>
Thanks John for fixing that which microsoft screwed up, and I've switched to my gmail account for this.
For my purposes 12 would be fine. I just want to know of the total number of
rows in the table, how many have a value in each column. I guess I'm actually trying to get the complement of the number of NULLs per column in the end.
BTW, aplologies for omitting basic info:
SELECT version();
version
----------------------------------------------------------------------------------------------------------------
PostgreSQL 9.1.18 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11), 64-bit
>
>
> --
>
> Schrodinger's backup: The condition of any backup is unknown until a restore is attempted.
>
> Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be.
>
> He's about as useful as a wax frying pan.
>
> 10 to the 12th power microphones = 1 Megaphone
>
> Maranatha! <><
> John McKown
On Fri, Aug 14, 2015 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <david.rowley@2ndquadrant.com> writes:
> > On 15 August 2015 at 02:32, David Nelson <dlnelson77808@outlook.com> wrote:
> >> Hello list,<br><br>Apologies if this has been asked before. My search only
> >> turned up ways to list the total non-null values for all columns as a
> >> single number. I want the count for each column by column.
>
> > I assume the tables are quite large if you don't want to just issue a:
> > SELECT COUNT(a) FILTER(WHERE a IS NULL),COUNT(b) FILTER(WHERE b IS NULL)
> > ... (assuming you're on a version new enough to support agg FILTER)
>
> AFAIK this should work in any version, or indeed any SQL-compliant DBMS:
>
> select count(col1), count(col2), ... from table;
Thanks Tom (I've switched to my gmail account for this conversation). Tthat is the way I would do it for a table with a small number of columns, but these have several dozen so this would get tedious. Although I just realized I could output the list of column names from the pg_stat view to a file and whip up some vi find and replace to create the entire statement pretty quickly. I was just wondering if that was the only way or not.>
> David Rowley <david.rowley@2ndquadrant.com> writes:
> > On 15 August 2015 at 02:32, David Nelson <dlnelson77808@outlook.com> wrote:
> >> Hello list,<br><br>Apologies if this has been asked before. My search only
> >> turned up ways to list the total non-null values for all columns as a
> >> single number. I want the count for each column by column.
>
> > I assume the tables are quite large if you don't want to just issue a:
> > SELECT COUNT(a) FILTER(WHERE a IS NULL),COUNT(b) FILTER(WHERE b IS NULL)
> > ... (assuming you're on a version new enough to support agg FILTER)
>
> AFAIK this should work in any version, or indeed any SQL-compliant DBMS:
>
> select count(col1), count(col2), ... from table;
BTW, aplologies for omitting basic info:
SELECT version();
version
----------------------------------------------------------------------------------------------------------------
PostgreSQL 9.1.18 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11), 64-bit
>
> COUNT with an argument counts the non-null values of that argument.
>
> > On the other hand if you were happy with just an approximation then you
> > could look at pg_stats;
>
> Yeah; you might want to ANALYZE the table first to be sure the stats are
> up to date.
>
> regards, tom lane
>
On Fri, Aug 14, 2015 at 6:35 PM, David Nelson <dnelson77808@gmail.com> wrote:
On Fri, Aug 14, 2015 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <david.rowley@2ndquadrant.com> writes:
Tthat is the way I would do it for a table with a small number of columns, but these have several dozen so this would get tedious. Although I just realized I could output the list of column names from the pg_stat view to a file and whip up some vi find and replace to create the entire statement pretty quickly. I was just wondering if that was the only way or not.
\set my_table my_real_table_name
SELECT 'SELECT COUNT(*) AS total_rows, '||array_to_string(array(SELECT 'COUNT('||column_name::text ||') AS ' || column_name::text FROM information_schema.columns WHERE table_name=:'my_table'),E',\n') || ' FROM ' || :'my_table' || ';';
Cheers,
Ken
-- 
AGENCY Software
A Free Software data system
By and for non-profits
(253) 245-3801
learn more about AGENCY or
follow the discussion.
On Fri, Aug 14, 2015 at 9:17 PM, Ken Tanzer <ken.tanzer@gmail.com> wrote:
On Fri, Aug 14, 2015 at 6:35 PM, David Nelson <dnelson77808@gmail.com> wrote:On Fri, Aug 14, 2015 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <david.rowley@2ndquadrant.com> writes:Tthat is the way I would do it for a table with a small number of columns, but these have several dozen so this would get tedious. Although I just realized I could output the list of column names from the pg_stat view to a file and whip up some vi find and replace to create the entire statement pretty quickly. I was just wondering if that was the only way or not.You could use this to generate the SQL:\set my_table my_real_table_nameSELECT 'SELECT COUNT(*) AS total_rows, '||array_to_string(array(SELECT 'COUNT('||column_name::text ||') AS ' || column_name::text FROM information_schema.columns WHERE table_name=:'my_table'),E',\n') || ' FROM ' || :'my_table' || ';';
Oh yeah, SQL to generate SQL. Done it many times to create ETL scripts. Don't know why I didn't think of that here. Definitely a viable option.
Thanks Ken.
Cheers,Ken--AGENCY SoftwareA Free Software data systemBy and for non-profitslearn more about AGENCY orfollow the discussion.