Re: Connection function - Mailing list psycopg
From | Bill House |
---|---|
Subject | Re: Connection function |
Date | |
Msg-id | 4F6FBFBD.9060408@house-grp.net Whole thread Raw |
In response to | Re: Connection function (Adrian Klaver <adrian.klaver@gmail.com>) |
Responses |
Re: Connection function
Re: Connection function |
List | psycopg |
On 03/25/2012 02:12 PM, Adrian Klaver wrote: > On 03/25/2012 08:14 AM, Bill House wrote: >> On 03/24/2012 08:25 PM, Adrian Klaver wrote: >>> On 03/24/2012 05:03 PM, Bill House wrote: >>>> Hello all, >>>> >>>> I am very new to postgresql and python, but I have written a >>>> function to >>>> retrieve some admin data from the server. >>>> >>>> I think this is more of a python question rather than a psycopg2 >>>> question, but the context is psycopg2. >>>> >>>> I was wondering if this is the right place to post it and ask for a >>>> critique? >>> >>> I would say yes, though I am not sure what the question is:)? >>> >>>> >>>> Thanks, >>>> >>>> Bill >>>> >>> >>> >> Thanks, >> >> Sometimes it is helpful to know how many records one is working with. >> >> From my reading in the psycopg2 documentation (2.4.4) and >> experimentation, I think that this information is only available if one >> uses an unnamed cursor (in which case all the data will have been >> fetched to the client) or with an named cursor (but only after all the >> data has been fetched). >> >> If there is a large amount of data, this may be problematic. >> >> Am I wrong about this? If so, please let me know. > > See comments in line. > >> >> Anyway, the way I approached this issue was to count the records I was >> interested in and return the value. >> >> The code below works. >> >> I am interested in a critique of it from more experienced persons. Are >> there more elegant or efficient ways to do what I have done? >> >> In particular, I am interested in: >> 1) the python string substitution technique I used > > That works, though it also possible to use dictionaries for the > Psycopg strings see here: > > http://initd.org/psycopg/docs/usage.html#query-parameters > > I find the dict form easier to use for many parameters because it is > self documenting. > > > The rest of the string substitution seems to be tied to using exec > which I do not think is necessary, see below for more. > > >> 2) the choice of the postgresql source of data I was seeking (the >> record count) >> 3) if there was an existing function I overlooked which does the >> same thing (in psycopg2 or postgresql) >> >> Regarding # 2 above, I had been relying on the table: >> pg_stat_user_tables and the n_live_tup value. This item has shown the >> correct count for many days. Today, after updating my machines and >> rebooting, the count is 0 and I am not really sure why. Records are >> still there; and even though I have selected data from the database, the >> count is still 0. > > The above are system tables/views used by the statistics mechanism of > Postgres. For a starting point on this look at: > > http://www.postgresql.org/docs/9.0/interactive/sql-analyze.html > > Short version running Analyze will repopulate the tables. > >> >> So I took the more direct but time consuming approach of just counting >> the records in the table. > > That is the only way to get a 'true' count due to the MVCC nature of > Postgres. There has been a recent optimization of that, but I cannot > remember what version of Postgres has it. > >> >> This will be better anyway because many times one will not be selecting >> all the records in a table, so this function can be expanded to include >> a parameter for the selection criteria so that that identical criteria >> may be used again in the work portion of any operation. >> >> Thanks in advance for your input. >> >> Bill >> >> code (indent 2): >> ------------------------------------------------------- >> #!/usr/bin/python >> # z_psycopg2_row_cnt7f.py >> # A program to illustrate the methods of creating or supplying runtime >> variables >> # to functions using psycopg2 functions >> >> import sys >> import os >> >> import psycopg2 >> import psycopg2.extras >> import psycopg2.extensions >> >> #for string generator >> import string >> import random >> >> con = None >> >> def gen_random_str(str_len): >> """generates a random string str_len long - >> randomly chooses and assembles a number of characters from the digits >> and the upper and lower case alphabet """ >> rand_str = ''.join(random.choice(string.ascii_uppercase + >> string.digits + string.ascii_lowercase) for x in range(str_len)) >> return rand_str >> >> def rec_cnt_func(conn, table_name): >> """Count the records selected from the table, right now all of >> them""" >> #Generate a random string to use as a curson name (short right now >> for >> debugging purposes) >> #The commands following will use this string >> rand_curs_name = gen_random_str(3) >> #Substitute data into a command string and execute >> #create the cursor >> exec "%s = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)" % >> (rand_curs_name,) >> #count the records in the table >> exec "%s.execute(\"SELECT COUNT (*) FROM %s;\")" % (rand_curs_name, >> table_name) > > Not sure why you are using exec? Why not?: > > cur = > conn.cursor(cursor_factory=psycopg2.extras.DictCursor,name=rand_curs_name) > > Same for below. > >> #fetch the data >> exec "row = %s.fetchone()" % (rand_curs_name,) >> #All done now, throw the cursor away >> exec "%s.close()" % (rand_curs_name,) >> #return the data to the calling program >> return row[0] >> >> >> def main(): >> #supply your connection string here >> current_dsn = "host=xx" >> >> try: >> con = psycopg2.connect(current_dsn) >> >> #supply your table name here >> table_name_in = "xxxx" >> >> #Call the function which will retrieve the data from postgresql >> rec_cnt_val = rec_cnt_func(con, table_name_in) >> >> #print the formatted data in an informative way >> print "Records in table: {}: {:> 20,}".format(table_name_in, >> rec_cnt_val) >> >> except psycopg2.DatabaseError, e: >> print "psycopg2 reports error: {}".format(e) >> sys.exit(1) >> >> finally: >> >> if con: >> con.close() >> return >> >> if __name__ == '__main__': >> main() >> ----------------------------------------------------- >> end code >> > > Thanks, I will do the reading you have suggested. Regarding your question about why the use of exec: I read where psycopg2 has quite a bit of capability for parameter substitution, but that functionality is limited to just psycopg2. I grew to rely on runtime construction, substitution and execution of commands in another environment and I was trying to mimic that behavior in a more general way in python. This way works, I was just wondering if there was a better way. Regarding the statistics, I have since learned that the command: select reltuples from pg_class where relname = 'your_file_name'; Will give a record count. I don't know how well it will keep up in a dynamic environment but it's a start. Thanks, Bill