Thread: Count of non-null values per table column

Count of non-null values per table column

From

David Nelson

Date:

14 August 2015, 14:32:42

Hello list,<br><br>Apologies if this has been asked before. My search only turned up ways to list the total non-null
valuesfor all columns as a single number. I want the count for each column by column.<br><br>I have inherited a
databaseconsisting of two related huge monolithic tables that lack referential integrity between them, or even basic
dataconstraints. One of the problems these tables have is every single non-PK column is NULLable. I am trying to
understandthe information that is actually stored and used so that I can implement a (hopefully) better design. Towards
thatend I would like to know the count of non-null values in each column per column. In other words I would like to get
thefollowing output from a table (the numbers are totally made up):<br><br>column_name |
num_values<br>------------+-----------<br>col1       
|      5787<br>------------+-----------<br>col2       
|        
17<br>------------+-----------<br>col3       
|       
567<br>------------+-----------<br>col4       |      
5787<br>------------+-----------<br>col5       
|       
143<br>------------+-----------<br>col6       
|         1<br>------------+-----------<br>...<br><br>Is this possible
throughone or more of the system views, or will I need to write a function to do this? Obviously I can just issue
multipleSELECT COUNT(column)... statements, but I'd rather not.<br><br>Thanks,<br>David

Re: Count of non-null values per table column

From

David Nelson

Date:

14 August 2015, 14:38:29

Well it is certainly nice to see that my choice to send my question using plain text was honored by this email service.
Apologiesfor that mess. The output I am looking for is a series of rows with two columns, one the name of the table
column,and the other the count of non-null values in a table's column of that same name, for all column names in the
table.<br><br>Thanks<br><br>----------------------------------------<br>>From: dlnelson77808@outlook.com<br>> To:
pgsql-general@postgresql.org<br>>Subject: [GENERAL] Count of non-null values per table column<br>> Date: Fri, 14
Aug2015 14:32:36 +0000<br>><br>> Hello list,<br><br>Apologies if this has been asked before. My
searchonly turned up ways to list the total non-null values for all columns as a single number. I want the count for
eachcolumn by column.<br><br>I have inherited a database consisting of two related huge monolithic tables
thatlack referential integrity between them, or even basic data constraints. One of the problems these tables have is
everysingle non-PK column is NULLable. I am trying to understand the information that is actually stored and used so
thatI can implement a (hopefully) better design. Towards that end I would like to know the count of non-null values in
eachcolumn per column. In other words I would like to get the following output from a table (the numbers are totally
madeup):<br><br>column_name |
num_values<br>------------+-----------<br>col1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
5787<br>------------+-----------<br>col2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
17<br>------------+-----------<br>col3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
567<br>------------+-----------<br>col4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
5787<br>------------+-----------<br>col5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
143<br>------------+-----------<br>col6&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
1<br>------------+-----------<br>...<br><br>Isthis possible through one or more of the system
views,or will I need to write a function to do this? Obviously I can just issue multiple SELECT COUNT(column)...
statements,but I'd rather not.<br><br>Thanks,<br>David<br>><br>> --<br>> Sent via
pgsql-generalmailing list (pgsql-general@postgresql.org)<br>> To make changes to your subscription:<br>>
http://www.postgresql.org/mailpref/pgsql-general<br>

Re: Count of non-null values per table column

From

David Rowley

Date:

14 August 2015, 14:52:07

On 15 August 2015 at 02:32, David Nelson <dlnelson77808@outlook.com> wrote:

Hello list, Apologies if this has been asked before. My search only turned up ways to list the total non-null values for all columns as a single number. I want the count for each column by column. I have inherited a database consisting of two related huge monolithic tables that lack referential integrity between them, or even basic data constraints. One of the problems these tables have is every single non-PK column is NULLable. I am trying to understand the information that is actually stored and used so that I can implement a (hopefully) better design. Towards that end I would like to know the count of non-null values in each column per column. In other words I would like to get the following output from a table (the numbers are totally made up): column_name | num_values ------------+----------- col1        |       5787 ------------+----------- col2        |         17 ------------+----------- col3        |        567 ------------+----------- col4        |       5787 ------------+----------- col5        |        143 ------------+----------- col6        |          1 ------------+----------- ... Is this possible through one or more of the system views, or will I need to write a function to do this? Obviously I can just issue multiple SELECT COUNT(column)... statements, but I'd rather not. Thanks, David

I didn't quite catch a Postgres version number in that mess :)

I assume the tables are quite large if you don't want to just issue a: SELECT COUNT(a) FILTER(WHERE a IS NULL),COUNT(b) FILTER(WHERE b IS NULL) ... (assuming you're on a version new enough to support agg FILTER)

On the other hand if you were happy with just an approximation then you could look at pg_stats;

create table abc(a int, b int,c int);

insert into abc values(1, 1, NULL),(1, NULL, NULL),(NULL, NULL, NULL);

analyze abc;

select attname, null_frac from pg_stats where tablename = 'abc';

attname | null_frac

---------+-----------

a | 0.333333

b | 0.666667

c | 1

Keep in mind though that this is an *approximation* and possibly could be inaccurate. If you want exact do the COUNT(col).

If you're not on a version new enough for COUNT(col) FILTER WHERE ..., you could just SUM(CASE WHEN col IS NULL THEN 1 ELSE 0 END)

Regards

David Rowley

David Rowley http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Training & Services

Re: Count of non-null values per table column

From

John McKown

Date:

14 August 2015, 14:59:50

David,

It still came through as junk. But I reconstructed it below

=== original message ===

Apologies if this has been asked before. My search only turned up ways to list the total non-null values for all columns as a single number. I want the count for each column by column.

I have inherited a database consisting of two related huge monolithic tables that lack referential integrity between them, or even basic data constraints. One of the problems these tables have is every single non-PK column is NULLable. I am trying to understand the information that is actually stored and used so that I can implement a (hopefully) better design. Towards that end I would like to know the count of non-null values in each column per column. In other words I would like to get the following output from a table (the numbers are totally made up):

column_name | num_values

------------+-----------

col1 | 5787

------------+-----------

col2 | 17

------------+-----------

col3 | 567

------------+-----------

col4 | 5787

------------+-----------

col5 | 143

------------+-----------

col6 | 1

------------+-----------

====

I assume what "num_values" contains is the _distinctly different_ number of values in "column_name" for each and every column name in a table. E.g. if "col1" contains value 'x' twice and 'y' ten times,then "num_values" would be 2, not 12. Or do you really want the 12? I'm unsure.

Schrodinger's backup: The condition of any backup is unknown until a restore is attempted.

Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be.

He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! <><
John McKown

Re: Count of non-null values per table column

From

Tom Lane

Date:

14 August 2015, 15:01:13

David Rowley <david.rowley@2ndquadrant.com> writes:
> On 15 August 2015 at 02:32, David Nelson <dlnelson77808@outlook.com> wrote:
>> Hello list,<br><br>Apologies if this has been asked before. My search only
>> turned up ways to list the total non-null values for all columns as a
>> single number. I want the count for each column by column.

> I assume the tables are quite large if you don't want to just issue a:
> SELECT COUNT(a) FILTER(WHERE a IS NULL),COUNT(b) FILTER(WHERE b IS NULL)
> ... (assuming you're on a version new enough to support agg FILTER)

AFAIK this should work in any version, or indeed any SQL-compliant DBMS:

      select count(col1), count(col2), ... from table;

COUNT with an argument counts the non-null values of that argument.

> On the other hand if you were happy with just an approximation then you
> could look at pg_stats;

Yeah; you might want to ANALYZE the table first to be sure the stats are
up to date.

            regards, tom lane

Re: Count of non-null values per table column

From

David Nelson

Date:

15 August 2015, 01:31:39

On Fri, Aug 14, 2015 at 9:59 AM, John McKown <john.archie.mckown@gmail.com> wrote:
>
> David,
>
> It still came through as junk. But I reconstructed it below
>
> === original message ===
> Apologies if this has been asked before. My search only turned up ways to list the total non-null values for all columns as a single number. I want the count for each column by column.
>
> I have inherited a database consisting of two related huge monolithic tables that lack referential integrity between them, or even basic data constraints. One of the problems these tables have is every single non-PK column is NULLable. I am trying to understand the information that is actually stored and used so that I can implement a (hopefully) better design. Towards that end I would like to know the count of non-null values in each column per column. In other words I would like to get the following output from a table (the numbers are totally made up):
>
> column_name | num_values
> ------------+-----------
> col1 | 5787
> ------------+-----------
> col2 | 17
> ------------+-----------
> col3 | 567
> ------------+-----------
> col4 | 5787
> ------------+-----------
> col5 | 143
> ------------+-----------
> col6 | 1
> ------------+-----------
>
> ====
>
> I assume what "num_values" contains is the _distinctly different_ number of values in "column_name" for each and every column name in a table. E.g. if "col1" contains value 'x' twice and 'y' ten times,then "num_values" would be 2, not 12. Or do you really want the 12? I'm unsure.
>

Thanks John for fixing that which microsoft screwed up, and I've switched to my gmail account for this.

For my purposes 12 would be fine. I just want to know of the total number of
rows in the table, how many have a value in each column. I guess I'm actually trying to get the complement of the number of NULLs per column in the end.

BTW, aplologies for omitting basic info:
SELECT version();
version

----------------------------------------------------------------------------------------------------------------
PostgreSQL 9.1.18 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11), 64-bit

>
>
> --
>
> Schrodinger's backup: The condition of any backup is unknown until a restore is attempted.
>
> Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be.
>
> He's about as useful as a wax frying pan.
>
> 10 to the 12th power microphones = 1 Megaphone
>
> Maranatha! <><
> John McKown

Re: Count of non-null values per table column

From

David Nelson

Date:

15 August 2015, 01:35:15

On Fri, Aug 14, 2015 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <david.rowley@2ndquadrant.com> writes:
> > On 15 August 2015 at 02:32, David Nelson <dlnelson77808@outlook.com> wrote:
> >> Hello list, Apologies if this has been asked before. My search only
> >> turned up ways to list the total non-null values for all columns as a
> >> single number. I want the count for each column by column.
>
> > I assume the tables are quite large if you don't want to just issue a:
> > SELECT COUNT(a) FILTER(WHERE a IS NULL),COUNT(b) FILTER(WHERE b IS NULL)
> > ... (assuming you're on a version new enough to support agg FILTER)
>
> AFAIK this should work in any version, or indeed any SQL-compliant DBMS:
>
> select count(col1), count(col2), ... from table;

Thanks Tom (I've switched to my gmail account for this conversation). Tthat is the way I would do it for a table with a small number of columns, but these have several dozen so this would get tedious. Although I just realized I could output the list of column names from the pg_stat view to a file and whip up some vi find and replace to create the entire statement pretty quickly. I was just wondering if that was the only way or not.

BTW, aplologies for omitting basic info:
SELECT version();
version

----------------------------------------------------------------------------------------------------------------
PostgreSQL 9.1.18 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11), 64-bit

>
> COUNT with an argument counts the non-null values of that argument.
>
> > On the other hand if you were happy with just an approximation then you
> > could look at pg_stats;
>
> Yeah; you might want to ANALYZE the table first to be sure the stats are
> up to date.
>
> regards, tom lane
>

Re: Count of non-null values per table column

From

Ken Tanzer

Date:

15 August 2015, 02:19:12

On Fri, Aug 14, 2015 at 6:35 PM, David Nelson <dnelson77808@gmail.com> wrote:

On Fri, Aug 14, 2015 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <david.rowley@2ndquadrant.com> writes:

Tthat is the way I would do it for a table with a small number of columns, but these have several dozen so this would get tedious. Although I just realized I could output the list of column names from the pg_stat view to a file and whip up some vi find and replace to create the entire statement pretty quickly. I was just wondering if that was the only way or not.

You could use this to generate the SQL:

\set my_table my_real_table_name

SELECT 'SELECT COUNT(*) AS total_rows, '||array_to_string(array(SELECT 'COUNT('||column_name::text ||') AS ' || column_name::text FROM information_schema.columns WHERE table_name=:'my_table'),E',\n') || ' FROM ' || :'my_table' || ';';

Cheers,

Ken

AGENCY Software

A Free Software data system

By and for non-profits

http://agency-software.org/

https://agency-software.org/demo/client

ken.tanzer@agency-software.org

(253) 245-3801

Subscribe to the mailing list to

learn more about AGENCY or

follow the discussion.

Re: Count of non-null values per table column

From

David Nelson

Date:

16 August 2015, 00:59:13

On Fri, Aug 14, 2015 at 9:17 PM, Ken Tanzer <ken.tanzer@gmail.com> wrote:

On Fri, Aug 14, 2015 at 6:35 PM, David Nelson <dnelson77808@gmail.com> wrote:
On Fri, Aug 14, 2015 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <david.rowley@2ndquadrant.com> writes:

Tthat is the way I would do it for a table with a small number of columns, but these have several dozen so this would get tedious. Although I just realized I could output the list of column names from the pg_stat view to a file and whip up some vi find and replace to create the entire statement pretty quickly. I was just wondering if that was the only way or not.

You could use this to generate the SQL:

\set my_table my_real_table_name

SELECT 'SELECT COUNT(*) AS total_rows, '||array_to_string(array(SELECT 'COUNT('||column_name::text ||') AS ' || column_name::text FROM information_schema.columns WHERE table_name=:'my_table'),E',\n') || ' FROM ' || :'my_table' || ';';

Oh yeah, SQL to generate SQL. Done it many times to create ETL scripts. Don't know why I didn't think of that here. Definitely a viable option.

Thanks Ken.

Cheers,
Ken
--
AGENCY Software
A Free Software data system
By and for non-profits
http://agency-software.org/
https://agency-software.org/demo/client
ken.tanzer@agency-software.org
(253) 245-3801

Subscribe to the mailing list to
learn more about AGENCY or
follow the discussion.