Thread: Fwd: error in cur.mogrify line

Fwd: error in cur.mogrify line

From
Octavi Fors
Date:
Dear psycopg2 users/developers,

I'm trying to use psycopg2.2.6 to insert some columns from the attached FITS binary table file 'img_20130109_001427_2_10.0s_c1_calib.cat' into a postgresql 9.2 server table called 'referencecat'.


To do so I'm using the attached 'insert.py' script.
As you see I'm reading the FITS table with ATpy 0.9.7 module.
From there I convert the columns of tbl2 I want to insert into a numpy.array (L36), I connect to the database (password is fake but doesn't matter for the purpose of my question :), and then I get the error in around cur.mogrify in L54:

Traceback (most recent call last):
  File "insert.py", line 54, in <module>
    args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in targets)
  File "insert.py", line 54, in <genexpr>
    args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in targets)
TypeError: not all arguments converted during string formatting


The 'referencecat' table definition is the following:

CREATE TABLE referencecat
(
  ref_id bigserial NOT NULL, -- SEXtractor running reference object number.
  ref_alphawin_j2000 double precision, -- Windowed right ascension (J2000).
  ref_deltawin_j2000 double precision, -- Windowed declination (J2000).
  ref_object character varying(18), -- Source identificator from (RA,DEC): JHHMMSSSS+DDMMSSSS
  ref_snr real, -- SNR (flux_auto / fluxerr_auto)
  ref_elongation real, -- A_IMAGE/B_IMAGE
  ref_fwhm_image real, -- FWHM assuming a gaussian core.
  ref_flags character varying(3), -- Extraction flags.
  CONSTRAINT pk_ref_id PRIMARY KEY (ref_id)
)
WITH (
  OIDS=FALSE
);


I've been struggling and googling searching for a solution but unsuccessfully :(

Could anybody please provide some light into this?
Any advise/help would be more than welcome.

Thanks a lot in advance,

Octavi.

PS: I read here from ant32 that with the kind of arg_str and cur.execute (with a for x loop inside) the insert speed can be boosted up to 100x. This is why I'm trying to replicate the same approach.

--
Octavi Fors
Postdoctoral Research Associate
Department of Physics and Astronomy
The University of North Carolina at Chapel Hill
CB #3255, #157 Phillips Hall
Chapel Hill, NC 27599
Office: (919) 962-3606
Fax:    (919) 962-0480
http://octavi.web.unc.edu/
Attachment

Re: Fwd: error in cur.mogrify line

From
Adrian Klaver
Date:
On 03/27/2015 07:30 AM, Octavi Fors wrote:
> Dear psycopg2 users/developers,
>
> I'm trying to use psycopg2.2.6 to insert some columns from the attached
> FITS binary table file 'img_20130109_001427_2_10.0s_c1_calib.cat
> <http://img_20130109_001427_2_10.0s_c1_calib.cat>' into a postgresql 9.2
> server table called 'referencecat'.
>
>
> To do so I'm using the attached 'insert.py' script.
> As you see I'm reading the FITS table with ATpy 0.9.7
> <https://atpy.readthedocs.org/en/latest/> module.
>  From there I convert the columns of tbl2 I want to insert into a
> numpy.array (L36), I connect to the database (password is fake but
> doesn't matter for the purpose of my question :), and then I get the
> error in around cur.mogrify in L54:
>
> Traceback (most recent call last):
>    File "insert.py", line 54, in <module>
>      args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s)", x) for
> x in targets)
>    File "insert.py", line 54, in <genexpr>
>      args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s)", x) for
> x in targets)
> TypeError: not all arguments converted during string formatting

Typically that means there is a mismatch between the %s placeholders and
the values passed in. Best guess is you are not unpacking the array the
way you think you are.

>
>
> The 'referencecat' table definition is the following:
>
> CREATE TABLE referencecat
> (
>    ref_id bigserial NOT NULL, -- SEXtractor running reference object number.
>    ref_alphawin_j2000 double precision, -- Windowed right ascension (J2000).
>    ref_deltawin_j2000 double precision, -- Windowed declination (J2000).
>    ref_object character varying(18), -- Source identificator from
> (RA,DEC): JHHMMSSSS+DDMMSSSS
>    ref_snr real, -- SNR (flux_auto / fluxerr_auto)
>    ref_elongation real, -- A_IMAGE/B_IMAGE
>    ref_fwhm_image real, -- FWHM assuming a gaussian core.
>    ref_flags character varying(3), -- Extraction flags.
>    CONSTRAINT pk_ref_id PRIMARY KEY (ref_id)
> )
> WITH (
>    OIDS=FALSE
> );
>
>
> I've been struggling and googling searching for a solution but
> unsuccessfully :(
>
> Could anybody please provide some light into this?
> Any advise/help would be more than welcome.

Look at here:

http://initd.org/psycopg/docs/usage.html#passing-parameters-to-sql-queries
>
> Thanks a lot in advance,
>
> Octavi.
>
> PS: I read here
> <http://stackoverflow.com/questions/8134602/psycopg2-insert-multiple-rows-with-one-query>
> from ant32 that with the kind of arg_str and cur.execute (with a for x
> loop inside) the insert speed can be boosted up to 100x. This is why I'm
> trying to replicate the same approach.
>
> --
> Octavi Fors
> Postdoctoral Research Associate
> Department of Physics and Astronomy
> The University of North Carolina at Chapel Hill
> CB #3255, #157 Phillips Hall
> Chapel Hill, NC 27599
> Office: (919) 962-3606
> Fax:    (919) 962-0480
> http://octavi.web.unc.edu/
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Fwd: error in cur.mogrify line

From
Adrian Klaver
Date:
On 03/27/2015 07:30 AM, Octavi Fors wrote:
> Dear psycopg2 users/developers,
>
> I'm trying to use psycopg2.2.6 to insert some columns from the attached
> FITS binary table file 'img_20130109_001427_2_10.0s_c1_calib.cat
> <http://img_20130109_001427_2_10.0s_c1_calib.cat>' into a postgresql 9.2
> server table called 'referencecat'.
>
>
> To do so I'm using the attached 'insert.py' script.
> As you see I'm reading the FITS table with ATpy 0.9.7
> <https://atpy.readthedocs.org/en/latest/> module.
>  From there I convert the columns of tbl2 I want to insert into a
> numpy.array (L36), I connect to the database (password is fake but
> doesn't matter for the purpose of my question :), and then I get the
> error in around cur.mogrify in L54:
>
> Traceback (most recent call last):
>    File "insert.py", line 54, in <module>
>      args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s)", x) for
> x in targets)
>    File "insert.py", line 54, in <genexpr>
>      args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s)", x) for
> x in targets)
> TypeError: not all arguments converted during string formatting
>
>
> The 'referencecat' table definition is the following:
>
> CREATE TABLE referencecat
> (
>    ref_id bigserial NOT NULL, -- SEXtractor running reference object number.
>    ref_alphawin_j2000 double precision, -- Windowed right ascension (J2000).
>    ref_deltawin_j2000 double precision, -- Windowed declination (J2000).
>    ref_object character varying(18), -- Source identificator from
> (RA,DEC): JHHMMSSSS+DDMMSSSS
>    ref_snr real, -- SNR (flux_auto / fluxerr_auto)
>    ref_elongation real, -- A_IMAGE/B_IMAGE
>    ref_fwhm_image real, -- FWHM assuming a gaussian core.
>    ref_flags character varying(3), -- Extraction flags.
>    CONSTRAINT pk_ref_id PRIMARY KEY (ref_id)
> )
> WITH (
>    OIDS=FALSE
> );
>
>
> I've been struggling and googling searching for a solution but
> unsuccessfully :(
>
> Could anybody please provide some light into this?
> Any advise/help would be more than welcome.

To follow up on my previous post, not sure why you put the parameters
into a numpy.array? Why not create a dictionary and use the name style
placeholder. One added benefit is it documents the arguments used in the
SQL string.

>
> Thanks a lot in advance,
>
> Octavi.
>
> PS: I read here
> <http://stackoverflow.com/questions/8134602/psycopg2-insert-multiple-rows-with-one-query>
> from ant32 that with the kind of arg_str and cur.execute (with a for x
> loop inside) the insert speed can be boosted up to 100x. This is why I'm
> trying to replicate the same approach.

I would start simple, test and then move up to more sophisticated.

>
> --
> Octavi Fors
> Postdoctoral Research Associate
> Department of Physics and Astronomy
> The University of North Carolina at Chapel Hill
> CB #3255, #157 Phillips Hall
> Chapel Hill, NC 27599
> Office: (919) 962-3606
> Fax:    (919) 962-0480
> http://octavi.web.unc.edu/
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Fwd: error in cur.mogrify line

From
Octavi Fors
Date:
Hi Adrian,

I've made some progress.
First I realized that tbl2['OBJECT'] should be the 4th parameters in the list targets1 (does this matter when using dicts)?).

Now I can insert into the database by either:

  - transforming the list targets1 (see L39, attached 'insert_list_tuple.py') into the tuple targets,

  - or transforming the numpy.array targets1 (see L39, attached 'insert_nparray_tuple.py') into the tuple targets.

Executions timings when inserting the 17586-row 'img_20130109_001427_2_10.0s_c1_calib.cat' file I passed in my previous message, are a bit better for numpy.array script:

Script                           Real time execution
insert_nparray_tuple.py        4.3-4.9
insert_list_tuple.py               5.1-5.5

Fluctuations in timings might be due to I'm reading the 'img_20130109_001427_2_10.0s_c1_calib.cat' file from a NAS.


To follow up on my previous post, not sure why you put the parameters into a numpy.array? Why not create a dictionary and use the name style placeholder. One added benefit is it documents the arguments used in the SQL string.

good question.
Yes, I wouldn't bother at all to give up putting the parameters into a numpy.array and doing it in a dictionary as you suggest, as long as the insert timing does not increase.

The problem is that I don't have experience converting a list targets1 (L36) into a dict.
Also I'm not able to find an example of cur.execute("""INSERT with dicts.

I would be very grateful if you and/or someone else could give me a hand one these those last issues.

Thanks in advance,

Octavi.


On Fri, Mar 27, 2015 at 12:52 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 03/27/2015 07:30 AM, Octavi Fors wrote:
Dear psycopg2 users/developers,

I'm trying to use psycopg2.2.6 to insert some columns from the attached
FITS binary table file 'img_20130109_001427_2_10.0s_c1_calib.cat
<http://img_20130109_001427_2_10.0s_c1_calib.cat>' into a postgresql 9.2
server table called 'referencecat'.


To do so I'm using the attached 'insert.py' script.
As you see I'm reading the FITS table with ATpy 0.9.7
<https://atpy.readthedocs.org/en/latest/> module.

 From there I convert the columns of tbl2 I want to insert into a
numpy.array (L36), I connect to the database (password is fake but
doesn't matter for the purpose of my question :), and then I get the
error in around cur.mogrify in L54:

Traceback (most recent call last):
   File "insert.py", line 54, in <module>
     args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s)", x) for
x in targets)
   File "insert.py", line 54, in <genexpr>
     args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s)", x) for
x in targets)
TypeError: not all arguments converted during string formatting


The 'referencecat' table definition is the following:

CREATE TABLE referencecat
(
   ref_id bigserial NOT NULL, -- SEXtractor running reference object number.
   ref_alphawin_j2000 double precision, -- Windowed right ascension (J2000).
   ref_deltawin_j2000 double precision, -- Windowed declination (J2000).
   ref_object character varying(18), -- Source identificator from
(RA,DEC): JHHMMSSSS+DDMMSSSS
   ref_snr real, -- SNR (flux_auto / fluxerr_auto)
   ref_elongation real, -- A_IMAGE/B_IMAGE
   ref_fwhm_image real, -- FWHM assuming a gaussian core.
   ref_flags character varying(3), -- Extraction flags.
   CONSTRAINT pk_ref_id PRIMARY KEY (ref_id)
)
WITH (
   OIDS=FALSE
);


I've been struggling and googling searching for a solution but
unsuccessfully :(

Could anybody please provide some light into this?
Any advise/help would be more than welcome.

To follow up on my previous post, not sure why you put the parameters into a numpy.array? Why not create a dictionary and use the name style placeholder. One added benefit is it documents the arguments used in the SQL string.


Thanks a lot in advance,

Octavi.

PS: I read here
<http://stackoverflow.com/questions/8134602/psycopg2-insert-multiple-rows-with-one-query>
from ant32 that with the kind of arg_str and cur.execute (with a for x
loop inside) the insert speed can be boosted up to 100x. This is why I'm
trying to replicate the same approach.

I would start simple, test and then move up to more sophisticated.



--
Octavi Fors
Postdoctoral Research Associate
Department of Physics and Astronomy
The University of North Carolina at Chapel Hill
CB #3255, #157 Phillips Hall
Chapel Hill, NC 27599
Office: (919) 962-3606
Fax:    (919) 962-0480
http://octavi.web.unc.edu/





--
Adrian Klaver
adrian.klaver@aklaver.com



Attachment

Re: Fwd: error in cur.mogrify line

From
Adrian Klaver
Date:
On 03/27/2015 11:53 AM, Octavi Fors wrote:
> Hi Adrian,
>
> I've made some progress.
> First I realized that tbl2['OBJECT'] should be the 4th parameters in the
> list targets1 (does this matter when using dicts)?).

No, that is why I like the named parameters, order is not important. The
placeholder and the values in the parameter list are matched up on name,
this is where the self documenting feature comes in. Also if you change
your table and add fields and are doing a SELECT * somewhere you do not
have to contend with new fields hitting your SQL string.

>

> Now I can insert into the database by either:
>
>    - transforming the list targets1 (see L39, attached
> 'insert_list_tuple.py') into the tuple targets,
>
>    - or transforming the numpy.array targets1 (see L39, attached
> 'insert_nparray_tuple.py') into the tuple targets.
>
> Executions timings when inserting the 17586-row
> 'img_20130109_001427_2_10.0s_c1_calib.cat
> <http://img_20130109_001427_2_10.0s_c1_calib.cat>' file I passed in my
> previous message, are a bit better for numpy.array script:
>
> Script                           Real time execution
> insert_nparray_tuple.py        4.3-4.9
> insert_list_tuple.py               5.1-5.5
>
> Fluctuations in timings might be due to I'm reading the
> 'img_20130109_001427_2_10.0s_c1_calib.cat
> <http://img_20130109_001427_2_10.0s_c1_calib.cat>' file from a NAS.
>
>
>     To follow up on my previous post, not sure why you put the
>     parameters into a numpy.array? Why not create a dictionary and use
>     the name style placeholder. One added benefit is it documents the
>     arguments used in the SQL string.
>
>
> good question.
> Yes, I wouldn't bother at all to give up putting the parameters into a
> numpy.array and doing it in a dictionary as you suggest, as long as the
> insert timing does not increase.
>
> The problem is that I don't have experience converting a list targets1
> (L36) into a dict.

See dict comprehension:

http://stackoverflow.com/questions/14507591/python-dictionary-comprehension

> Also I'm not able to find an example of cur.execute("""INSERT with dicts.

http://initd.org/psycopg/docs/usage.html#passing-parameters-to-sql-queries

cur.execute(
...     """INSERT INTO some_table (an_int, a_date, another_date, a_string)
...         VALUES (%(int)s, %(date)s, %(date)s, %(str)s);""",
...     {'int': 10, 'str': "O'Reilly", 'date': datetime.date(2005, 11, 18)})


>
> I would be very grateful if you and/or someone else could give me a hand
> one these those last issues.
>
> Thanks in advance,
>
> Octavi.
>
>



--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Fwd: error in cur.mogrify line

From
Adrian Klaver
Date:
On 03/27/2015 11:53 AM, Octavi Fors wrote:
> Hi Adrian,
>
> I've made some progress.
> First I realized that tbl2['OBJECT'] should be the 4th parameters in the
> list targets1 (does this matter when using dicts)?).
>
> Now I can insert into the database by either:
>
>    - transforming the list targets1 (see L39, attached
> 'insert_list_tuple.py') into the tuple targets,
>
>    - or transforming the numpy.array targets1 (see L39, attached
> 'insert_nparray_tuple.py') into the tuple targets.
>
> Executions timings when inserting the 17586-row
> 'img_20130109_001427_2_10.0s_c1_calib.cat
> <http://img_20130109_001427_2_10.0s_c1_calib.cat>' file I passed in my
> previous message, are a bit better for numpy.array script:
>
> Script                           Real time execution
> insert_nparray_tuple.py        4.3-4.9
> insert_list_tuple.py               5.1-5.5
>
> Fluctuations in timings might be due to I'm reading the
> 'img_20130109_001427_2_10.0s_c1_calib.cat
> <http://img_20130109_001427_2_10.0s_c1_calib.cat>' file from a NAS.
>
>
>     To follow up on my previous post, not sure why you put the
>     parameters into a numpy.array? Why not create a dictionary and use
>     the name style placeholder. One added benefit is it documents the
>     arguments used in the SQL string.
>
>
> good question.
> Yes, I wouldn't bother at all to give up putting the parameters into a
> numpy.array and doing it in a dictionary as you suggest, as long as the
> insert timing does not increase.
>
> The problem is that I don't have experience converting a list targets1
> (L36) into a dict.
> Also I'm not able to find an example of cur.execute("""INSERT with dicts.
>
> I would be very grateful if you and/or someone else could give me a hand
> one these those last issues.

A little digging found atpy is now part of astropy and astropy.Table has:

http://docs.astropy.org/en/stable/io/unified.html#built-in-table-readers-writers

In particular ascii.csv.

Seems this opens up reading the Table data into a io object:

https://docs.python.org/2/library/io.html#module-io

and then using the psycopg2 COPY methods to move the data in bulk:

http://initd.org/psycopg/docs/cursor.html#cursor.copy_from

The methods start here. I generally end up using copy_expert() as it
allows for more customization.

The exact usage of the above is dependent on the size of the files you
are working with.

>
> Thanks in advance,
>
> Octavi.
>


--
Adrian Klaver
adrian.klaver@aklaver.com