Re: New to the list; would this be an okay question? - Mailing list pgsql-general
From | Madison Kelly |
---|---|
Subject | Re: New to the list; would this be an okay question? |
Date | |
Msg-id | 40D70026.1020506@alteeve.com Whole thread Raw |
In response to | Re: New to the list; would this be an okay question? (Richard Huxton <dev@archonet.com>) |
Responses |
Re: New to the list; would this be an okay question?
|
List | pgsql-general |
Richard Huxton wrote: > Madison Kelly wrote: > >> Hi all, >> >> I am new to the list and I didn't want to seem rude at all so I >> wanted to ask if this was okay first. > > > No problem. Reading your message below, you might want to try the > performance list, but general is a good place to start. > >> I have a program I have written in perl which uses a postgresSQL >> database as the backend. The program works but the performance is >> really bad. I have been reading as much as I can on optimizing >> performance but still it isn't very reasonable. At one point I had my >> program able to process 175,000 records in 16min 10sec on a Pentium3 >> 650MHz, 448MB RAM test machine. Since then I got a Pentium3 1GHz, >> 512MB system and I have tried a lot of things to get the performance >> up but now it is substantially slower and I can't seem to figure out >> what I am doing wrong. > > > A few places to start: > 1. VACUUM FULL > This will make sure any unused space is reclaimed > 2. ANALYZE > This will recalculate stats for the tables > 3. Basic performce tuning: > http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php > There's also a good guide to the postgresql.conf file on varlena.com > >> Would it be appropriate to ask for help on my program on this list? >> Full disclosure: The program won't be initially GPL'ed because it is >> for my company but it will be released for free to home users and the >> source code will be made available (similar to other split-license >> programs) though once my company makes it's money back I think they >> will fully GPL it (I am on my boss's case about it :p ). > > > No problem - what you licence your software under is your concern. Once > you've taken the basic steps described above, try to pick out a specific > query that you think is too slow and provide: > > 1. PostgreSQL version > 2. Basic hardware info (as you have) > 3. Sizes of tables. > 4. Output of EXPLAIN ANALYZE <query here> > > The EXPLAIN ANALYZE runs the query and shows how much work PG thought it > would be and how much it actually turned out to be. > > HTH Thank you very much!! I am using Psql 7.4 on a stock install of Fedora Core 2 on my IBM thinkpad a22m (P3 1GHz, 512MB RAM, not the fastest HDD). The drive carrier I am using is connected via USB2 and uses a few different hard drives with the fastest being a couple of Barracuda 7200.7 drives (2MB cache, 7,200rpm). I described the program in my reply to Martijn so here is some of the code (code not related to psql snipped, let me know if posting it would help - sorry for the wrapping...): =-[ Calling the database ]-= # Open the connection to the database my $DB = DBI->connect("DBI:Pg:dbname=$db_name","$user")|| die("Connect error (Is PostgresSQL running?): $DBI::errstr"); # Prepare the select statements before using them for speed: $select_sth = $DB->prepare("SELECT null FROM file_dir WHERE file_src_uuid=? AND file_parent_dir=? AND file_name=?") || die "$DBI::errstr"; $select_up = $DB->prepare("UPDATE file_dir SET file_perm=?, file_own_user=?, file_own_grp=?, file_size=?, file_mod_date=?, file_mod_time=?, file_mod_time_zone=?, file_exist=? WHERE file_src_uuid=? AND file_parent_dir=? AND file_name=?") || die "$DBI::errstr"; $select_in = $DB->prepare("INSERT INTO file_dir ( file_src_uuid, file_name, file_dir, file_parent_dir, file_perm, file_own_user, file_own_grp, file_size, file_mod_date, file_mod_time, file_mod_time_zone, file_backup, file_restore, file_display, file_exist ) VALUES ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ? )") || die "$DBI::errstr"; # Set the 'file_exist' flag to 'false' and reset exiting files to 'true'. $DB->do("UPDATE file_dir SET file_exist='f' WHERE file_src_uuid='$file_src_uuid'") || die "$DBI::errstr"; # Start scanning the drive $num = $select_sth->execute($file_src_uuid,$relative_dir,$file_name) || die "$DBI::errstr"; if ( $num > 0 ) { $select_up->execute($file_perm,$file_own_user,$file_own_grp,$file_size,$file_mod_date,$file_mod_time,$file_mod_time_zone,$file_exist,$file_src_uuid,$file_parent_dir,$file_name) || die "$DBI::errstr"; } else { $select_in->execute($file_src_uuid,$file_name,$file_dir,$file_parent_dir,$file_perm,$file_own_user,$file_own_grp,$file_size,$file_mod_date,$file_mod_time,$file_mod_time_zone,$file_backup,$file_restore,$file_display,$file_exist) || die "$DBI::errstr"; } # We need to grab the existing file settings for the special file '/.' $DBreq=$DB->prepare("SELECT file_backup, file_restore, file_display FROM file_dir WHERE file_parent_dir='/' AND file_name='.' AND file_src_uuid='$file_src_uuid'") || die $DBI::errstr; $file_backup=$DBreq->execute(); @file_backup=$DBreq->fetchrow_array(); $file_backup=@file_backup[0]; $file_restore=@file_backup[1]; $file_display=@file_backup[2]; # Jump into the re-entrant subroutine to scan directories and sub-dirs &list_files($real_dir, $exclude_list_num, $relative_dir, $file_backup, $file_restore, $file_display); # Inside the sub routine # Does the directory/file/symlink already exist? (there are three of these for each file type) $num = $select_sth->execute($file_src_uuid,$relative_dir,$file_name) || die "$DBI::errstr"; if ( $num > 0 ) { $select_up->execute($file_perm,$file_own_user,$file_own_grp,$file_size,$file_mod_date,$file_mod_time,$file_mod_time_zone,$file_exist,$file_src_uuid,$file_parent_dir,$file_name) || die "$DBI::errstr"; } else { # The file did not exist so we will use the passed parent settings for the 'file_backup' flag and leave the 'file_display' flag set to 'f' $select_in->execute($file_src_uuid,$file_name,$file_dir,$file_parent_dir,$file_perm,$file_own_user,$file_own_grp,$file_size,$file_mod_date,$file_mod_time,$file_mod_time_zone,$file_backup,$file_restore,$file_display,$file_exist) || die "$DBI::errstr"; } # If this was a file I would loop and process the next file in the directory, if it was a directory itself I would now re-enter the subroutine to process it's contents and when I fell back I would pick up where I left off # Returning from the final subroutine and finishing up $DB->do("VACUUM ANALYZE"); =-[ finished DB related source code ]-= Here is the schema for the 'file_dir' table which I hit repeatedly here: =-[ file_dir table and index schemas ]-= CREATE TABLE file_dir ( -- Used to store info on every file on source partitions file_id serial unique, -- make this 'bigserial' if there may be more than 2 billion files in the database file_src_uuid varchar(40) not null, -- the UUID of the source partition hosting the original file file_org_uuid varchar(40), -- the UUID that the file came from (when the file was moved by TLE-BU) file_name varchar(255) not null, -- Name of the file or directory file_dir bool not null, -- t = is directory, f = file file_parent_dir varchar(255) not null, -- if directory '/foo/bar', parent is '/foo', if file '/foo/bar/file', parent is '/foo/bar'. The mount directory is treated as '/' so any directories below it will be ignored for this record. file_perm varchar(10) not null, -- file or directory permissions file_own_user varchar(255) not null, -- The file's owning user (by name, not UID!!) file_own_grp varchar(255) not null, -- The file's owning group (by name, not GID!!) file_size bigint not null, -- File size in bytes file_mod_date varchar(12) not null, -- File's last edited date file_mod_time varchar(20) not null, -- File's last edited time file_mod_time_zone varchar(6) not null, -- File's last edited time zone file_backup boolean not null default 'f', -- 't' = Include in backup jobs, 'f' = Do not include in backup jobs file_restore boolean not null default 'f', -- 't' = Include in restore jobs, 'f' = Do not include in restore jobs file_display boolean not null default 'f', -- 't' = display, 'f' = hide file_exist boolean default 't' -- Used to catch files that have been deleted since the last scan. Before rescan, all files in a given src_uuid are set to 0 (deleted) and then as each file is found or updated it is reset back to 1 (exists) and anything left with a value of '0' at the end of the scan is deleted and we will remove their record. ); -- CREATE INDEX file_dir_idx ON file_dir (file_src_uuid,file_name,file_parent_dir); =-[ Finish file_dir table and index schemas ]-= Thanks so much!! Madison
pgsql-general by date: