Thread: freebsd + postgresql 8.3.3 = 100% of cpu usage on stats collector?
freebsd + postgresql 8.3.3 = 100% of cpu usage on stats collector?
From
hubert depesz lubaczewski
Date:
hi first of all - i know next to nothing about bsd. friend asked me to setup replication on their machines, and i noticed that one of cpus (2 quad xeons) is used to 100%: pgsql 58241 99.0 0.2 22456 7432 ?? Rs Thu10AM 1530:35.93 postgres: stats collector process (postgres) what might be the reason? here are settings: name | setting ---------------------------------+--------------------------------------- add_missing_from | off allow_system_table_mods | off archive_command | (disabled) archive_mode | off archive_timeout | 0 array_nulls | on authentication_timeout | 60 autovacuum | on autovacuum_analyze_scale_factor | 0.1 autovacuum_analyze_threshold | 50 autovacuum_freeze_max_age | 200000000 autovacuum_max_workers | 3 autovacuum_naptime | 60 autovacuum_vacuum_cost_delay | 20 autovacuum_vacuum_cost_limit | -1 autovacuum_vacuum_scale_factor | 0.2 autovacuum_vacuum_threshold | 50 backslash_quote | safe_encoding bgwriter_delay | 200 bgwriter_lru_maxpages | 100 bgwriter_lru_multiplier | 2 block_size | 8192 bonjour_name | check_function_bodies | on checkpoint_completion_target | 0.5 checkpoint_segments | 3 checkpoint_timeout | 300 checkpoint_warning | 30 client_encoding | UTF8 client_min_messages | notice commit_delay | 0 commit_siblings | 5 config_file | /usr/local/pgsql/data/postgresql.conf constraint_exclusion | off cpu_index_tuple_cost | 0.005 cpu_operator_cost | 0.0025 cpu_tuple_cost | 0.01 custom_variable_classes | data_directory | /usr/local/pgsql/data DateStyle | ISO, YMD db_user_namespace | off deadlock_timeout | 1000 debug_assertions | off debug_pretty_print | off debug_print_parse | off debug_print_plan | off debug_print_rewritten | off default_statistics_target | 10 default_tablespace | default_text_search_config | pg_catalog.simple default_transaction_isolation | read committed default_transaction_read_only | off default_with_oids | off dynamic_library_path | $libdir effective_cache_size | 16384 enable_bitmapscan | on enable_hashagg | on enable_hashjoin | on enable_indexscan | on enable_mergejoin | on enable_nestloop | on enable_seqscan | on enable_sort | on enable_tidscan | on escape_string_warning | on explain_pretty_print | on external_pid_file | extra_float_digits | 0 from_collapse_limit | 8 fsync | on full_page_writes | on geqo | on geqo_effort | 5 geqo_generations | 0 geqo_pool_size | 0 geqo_selection_bias | 2 geqo_threshold | 12 gin_fuzzy_search_limit | 0 hba_file | /usr/local/pgsql/data/pg_hba.conf ident_file | /usr/local/pgsql/data/pg_ident.conf ignore_system_indexes | off integer_datetimes | off join_collapse_limit | 8 krb_caseins_users | off krb_realm | krb_server_hostname | krb_server_keyfile | krb_srvname | postgres lc_collate | C lc_ctype | pl_PL.UTF-8 lc_messages | pl_PL.UTF-8 lc_monetary | pl_PL.UTF-8 lc_numeric | pl_PL.UTF-8 lc_time | pl_PL.UTF-8 listen_addresses | 192.168.0.160 local_preload_libraries | log_autovacuum_min_duration | -1 log_checkpoints | off log_connections | off log_destination | syslog log_directory | pg_log log_disconnections | off log_duration | off log_error_verbosity | default log_executor_stats | off log_filename | postgresql-%Y-%m-%d_%H%M%S.log log_hostname | off log_line_prefix | log_lock_waits | off log_min_duration_statement | -1 log_min_error_statement | error log_min_messages | notice log_parser_stats | off log_planner_stats | off log_rotation_age | 1440 log_rotation_size | 10240 log_statement | none log_statement_stats | off log_temp_files | -1 log_timezone | Poland log_truncate_on_rotation | off logging_collector | off maintenance_work_mem | 16384 max_connections | 180 max_files_per_process | 1000 max_fsm_pages | 204800 max_fsm_relations | 1000 max_function_args | 100 max_identifier_length | 63 max_index_keys | 32 max_locks_per_transaction | 64 max_prepared_transactions | 5 max_stack_depth | 2048 password_encryption | on port | 5432 post_auth_delay | 0 pre_auth_delay | 0 random_page_cost | 4 regex_flavor | advanced search_path | "$user",public seq_page_cost | 1 server_encoding | UTF8 server_version | 8.3.3 server_version_num | 80303 session_replication_role | origin shared_buffers | 4096 shared_preload_libraries | silent_mode | on sql_inheritance | on ssl | off ssl_ciphers | ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH standard_conforming_strings | off statement_timeout | 0 superuser_reserved_connections | 3 synchronize_seqscans | on synchronous_commit | on syslog_facility | LOCAL0 syslog_ident | postgres tcp_keepalives_count | 0 tcp_keepalives_idle | 0 tcp_keepalives_interval | 0 temp_buffers | 1024 temp_tablespaces | TimeZone | Poland timezone_abbreviations | Default trace_notify | off trace_sort | off track_activities | on track_counts | on transaction_isolation | read committed transaction_read_only | off transform_null_equals | off unix_socket_directory | unix_socket_group | unix_socket_permissions | 511 update_process_title | off vacuum_cost_delay | 0 vacuum_cost_limit | 200 vacuum_cost_page_dirty | 20 vacuum_cost_page_hit | 1 vacuum_cost_page_miss | 10 vacuum_freeze_min_age | 100000000 wal_buffers | 8 wal_sync_method | fsync wal_writer_delay | 200 work_mem | 1024 xmlbinary | base64 xmloption | content zero_damaged_pages | off (189 rows) any ideas? Best regards, depesz
hubert depesz lubaczewski <depesz@depesz.com> writes: > friend asked me to setup replication on their machines, and i noticed > that one of cpus (2 quad xeons) is used to 100%: > pgsql 58241 99.0 0.2 22456 7432 ?? Rs Thu10AM 1530:35.93 postgres: stats collector process (postgres) Hmm, we had some problems with the stats collector going nuts a couple of versions ago; maybe you've found another way to trigger that. Anything unusual about this DB (lots of tables, for instance)? Can you try strace and/or gdb to figure out what the collector is doing? regards, tom lane
On Fri, 27 Jun 2008 18:04:19 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > hubert depesz lubaczewski <depesz@depesz.com> writes: > > friend asked me to setup replication on their machines, and i noticed > > that one of cpus (2 quad xeons) is used to 100%: > > pgsql 58241 99.0 0.2 22456 7432 ?? Rs Thu10AM 1530:35.93 postgres: stats collector process (postgres) > > Hmm, we had some problems with the stats collector going nuts a couple > of versions ago; maybe you've found another way to trigger that. > Anything unusual about this DB (lots of tables, for instance)? > Can you try strace and/or gdb to figure out what the collector is doing? Just in case you're not a FreeBSD expert, it's ktrace on FreeBSD. strace is the Linux equivalent. -- Bill Moran <wmoran@collaborativefusion.com> -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: freebsd + postgresql 8.3.3 = 100% of cpu usage on stats collector?
From
hubert depesz lubaczewski
Date:
On Fri, Jun 27, 2008 at 06:37:45PM -0400, Bill Moran wrote: > Just in case you're not a FreeBSD expert, it's ktrace on FreeBSD. strace > is the Linux equivalent. i'm not an freebsd expert. i ran ktrace -p <pid> - it exited immediately. then i ran ktrace -p <pid> -f stats.ktrace.log -t\+ and it also exited immediately. in cwd i have 2 binary files - not text ones as i would expect from strace. ktrace.out and stats.ktrace.log. are these files "ok"? i mean - i can make them available, but i'm not sure if they're ok given the fact that they don't look like text output of strace. Best regards, depesz
hubert depesz lubaczewski wrote: > On Fri, Jun 27, 2008 at 06:37:45PM -0400, Bill Moran wrote: >> Just in case you're not a FreeBSD expert, it's ktrace on FreeBSD. strace >> is the Linux equivalent. > > i'm not an freebsd expert. > > i ran ktrace -p <pid> - it exited immediately. > then i ran ktrace -p <pid> -f stats.ktrace.log -t\+ > and it also exited immediately. > in cwd i have 2 binary files - not text ones as i would expect from > strace. > > ktrace.out and stats.ktrace.log. > > are these files "ok"? i mean - i can make them available, but i'm not > sure if they're ok given the fact that they don't look like text output > of strace. > > Best regards, > > depesz > > From ktrace man page.... DESCRIPTION The ktrace utility enables kernel trace logging for the specified pro- cesses. Kernel trace data is logged to the file ktrace.out. The kernel operations that are traced include system calls, namei translations,sig- nal processing, and I/O. Once tracing is enabled on a process, trace data will be logged until either the process exits or the trace point is cleared. A traced process can generate enormous amounts of log data quickly; It is strongly suggested that users memorize how to disable tracing before attempting to trace a process. The following command is sufficient to disable tracing on all user owned processes, and, if executed by root, all processes: $ ktrace -C So, read man page, and at least ktrace -C and read the output at ktrace.out file. Best regards Rodrigo Gonzalez
Re: freebsd + postgresql 8.3.3 = 100% of cpu usage on stats collector?
From
hubert depesz lubaczewski
Date:
On Fri, Jun 27, 2008 at 06:04:19PM -0400, Tom Lane wrote: > Anything unusual about this DB (lots of tables, for instance)? sorry, missed that question - nothing unusual. couple of dbs (10-15), around 20 tables per databases, only 1 database really used - all others tend to be used only for testing purposes. Best regards, depesz
Re: freebsd + postgresql 8.3.3 = 100% of cpu usage on stats collector?
From
hubert depesz lubaczewski
Date:
On Sat, Jun 28, 2008 at 02:22:26AM -0300, Rodrigo Gonzalez wrote: > So, read man page, and at least ktrace -C and read the output at > ktrace.out file. ok, i've read it and didn't understand. it says how to disable tracing but it doesn't say anything about enabling tracing. yesterday ktrace -p <pid> produced output, and today it generated empty file. so - apparently it's much more complicated than linux strace, and since i generally never use bsd, i i'm not really interested in learning what to do in this system to get trace. the only problem is that there definitelly is some kind of problem in stats collector on bsd, but i'm not able to help diagnose it at this moment. Best regards, depesz
On Sat, 28 Jun 2008 11:35:24 +0200 hubert depesz lubaczewski <depesz@depesz.com> wrote: > On Sat, Jun 28, 2008 at 02:22:26AM -0300, Rodrigo Gonzalez wrote: > > So, read man page, and at least ktrace -C and read the output at > > ktrace.out file. > > ok, i've read it and didn't understand. it says how to disable tracing > but it doesn't say anything about enabling tracing. Use "ktrace -p [pid]" to start tracing. Then use "ktrace -C" to stop tracing. Trace data is dumped in binary format to the file ktrace.out (unless you use the -f option to specify another file) Use the kdump utility to convert the ktrace.out file to something usable. Something like "kdump > ktrace.txt" will probably get you what you want, assuming your ktrace file is ktrace.out. -- Bill Moran <wmoran@collaborativefusion.com>
Re: freebsd + postgresql 8.3.3 = 100% of cpu usage on stats collector?
From
hubert depesz lubaczewski
Date:
On Sun, Jun 29, 2008 at 05:13:59PM -0400, Bill Moran wrote: > Use the kdump utility to convert the ktrace.out file to something usable. > Something like "kdump > ktrace.txt" will probably get you what you want, > assuming your ktrace file is ktrace.out. ok. did it. in about 7 seconds, i got 250mb of ktrace.out. ktrace.txt is 176 megabytes. content of this is pretty simple: 58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0) 58241 postgres RET poll -1 errno 4 Interrupted system call 58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0) 58241 postgres RET poll -1 errno 4 Interrupted system call 58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0) 58241 postgres RET poll -1 errno 4 Interrupted system call 58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0) 58241 postgres RET poll -1 errno 4 Interrupted system call 58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0) 58241 postgres RET poll -1 errno 4 Interrupted system call 58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0) 58241 postgres RET poll -1 errno 4 Interrupted system call 58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0) should i make the ktrace.txt available, or should i do something else to help you ( i mean, the pg-hackers community) diagnose the issue? Best regards, depesz
On Fri, 27 Jun 2008 18:04:19 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > hubert depesz lubaczewski <depesz@depesz.com> writes: > > friend asked me to setup replication on their machines, and i noticed > > that one of cpus (2 quad xeons) is used to 100%: > > pgsql 58241 99.0 0.2 22456 7432 ?? Rs Thu10AM 1530:35.93 postgres: stats collector process (postgres) > > Hmm, we had some problems with the stats collector going nuts a couple > of versions ago; maybe you've found another way to trigger that. > Anything unusual about this DB (lots of tables, for instance)? > Can you try strace and/or gdb to figure out what the collector is doing? Just in case you're not a FreeBSD expert, it's ktrace on FreeBSD. strace is the Linux equivalent. -- Bill Moran <wmoran@collaborativefusion.com> -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
hubert depesz lubaczewski <depesz@depesz.com> writes: > 58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0) > 58241 postgres RET poll -1 errno 4 Interrupted system call > 58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0) > 58241 postgres RET poll -1 errno 4 Interrupted system call > 58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0) > 58241 postgres RET poll -1 errno 4 Interrupted system call > 58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0) > 58241 postgres RET poll -1 errno 4 Interrupted system call > 58241 postgres CALL poll(0x7fffffffd4e0,0x1,0x7d0) > 58241 postgres RET poll -1 errno 4 Interrupted system call Hmm. The wait-for-input loop in the stats collector assumes that if it gets EINTR, it should just retry the poll() call immediately. Which AFAIK is correct. I think you're looking at a kernel or libc bug here --- somehow the EINTR failure is recurring continuously. It's probably time to ask about this on a FreeBSD-specific list. regards, tom lane