Thread: BUG #4787: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error
BUG #4787: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error
From
"Mark Kramer"
Date:
The following bug has been logged online: Bug reference: 4787 Logged by: Mark Kramer Email address: root@asarian-host.net PostgreSQL version: 8.3.7 Operating system: FreeBSD 7.1 Description: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error Details: I have my PostgreSQL installed in /usr/local/PostgreSQL/ (cleaner for updates, instead of just /usr/local) As a result, I made hard-links like this, cd /usr/local/bin/ ln /usr/local/PostgreSQL/bin/pg_ctl pg_ctl Etc. Seems PostgreSQL can't handle the fact. I try and start the server, with: /usr/bin/su -l pgsql -c "exec /usr/local/bin/pg_ctl start -D /var/db/PostgreSQL -w -s -m fast" I get this error, though: May 1 04:40:26 asarian-host postgres[9742]: [6-1] FATAL: invalid value for parameter "timezone_abbreviations": "Default" Which is a silly error, because it's rather untrue, and it's rather strange, honestly, for PostgreSQL to want to be started from a hardcoded location. Starting up the usual way, with: /usr/bin/su -l pgsql -c "exec /usr/local/PostgreSQL/bin/pg_ctl start -D /var/db/PostgreSQL -w -s -m fast" Works just fine. So, at the very least, change the error message to something that actually makes sense, like: "FATAL: Binary started from location other than the one used at compile-time;" or something to that affect. But better still, there's no need for PostgreSQL to have this hard location requirement: its lib paths has been set (at boot) with ldconfig, so it should find whatever libs it need, regardless from where the binary resides that I use to start the server.
Re: BUG #4787: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error
From
Tom Lane
Date:
"Mark Kramer" <root@asarian-host.net> writes: > I have my PostgreSQL installed in /usr/local/PostgreSQL/ (cleaner for > updates, instead of just /usr/local) As a result, I made hard-links like > this, > cd /usr/local/bin/ > ln /usr/local/PostgreSQL/bin/pg_ctl pg_ctl This isn't going to work because pg_ctl assumes it can find postgres in the same directory it is in. Try using a symlink instead. (It'll be less likely to fail miserably after an upgrade, too.) > I get this error, though: > May 1 04:40:26 asarian-host postgres[9742]: [6-1] FATAL: invalid value for > parameter "timezone_abbreviations": "Default" I agree this is an odd error message though. Perhaps you hardlinked a few other things you didn't tell us about? I'm not sure what it would take to make this be the first complaint. What is probably happening is that postgres is trying to find /usr/local/PostgreSQL/share/ relative to itself, but I'd have thought it would notice the problem sooner. regards, tom lane
Re: BUG #4787: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error
From
Mark
Date:
-----Original Message----- From: pgsql-bugs-owner@postgresql.org [mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of Tom Lane Sent: vrijdag 1 mei 2009 17:46 To: Mark Kramer Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #4787: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error "Mark Kramer" <root@asarian-host.net> writes: > > I have my PostgreSQL installed in /usr/local/PostgreSQL/ (cleaner for > > updates, instead of just /usr/local) As a result, I made hard-links > > like this, > > cd /usr/local/bin/ > > ln /usr/local/PostgreSQL/bin/pg_ctl pg_ctl > This isn't going to work because pg_ctl assumes it can find postgres in > the same directory it is in. Try using a symlink instead. (It'll be > less likely to fail miserably after an upgrade, too.) I tried a symlink as well. Then pg_ctl *can* start the server (which is kinda odd, by itself, that it can do so now, whereas not with a hardlink; unless pg_ctl actually reads the symlink content, which is very unlikely), but it reports a spurious error nonetheless: "could not start server" (whilst it DOES start the server just fine). As for pg_ctl assuming it can find postgres in the same directory it is in, it SHOULD. :) Basically, I hard-linked all files in /usr/local/PostgreSQL/bin/ to /usr/local/bin/. So, even when pg_ctl got started from /usr/local/bin/, it should have found /usr/local/bin/postgres right under its very nose! Also, the error message actually DOES seem to come from postgres (postgres[9742]: [6-1] FATAL), but that may well be an optical illusion on my end (as pg_ctl could log as 'postgres' too: haven't examined that yet). Clearly, seems PostgreSQL just really wants to be started from its original install-location. > > I get this error, though: > > May 1 04:40:26 asarian-host postgres[9742]: [6-1] FATAL: invalid > > value for parameter "timezone_abbreviations": "Default" > I agree this is an odd error message though. Perhaps you hardlinked a > few other things you didn't tell us about? I'm not sure what it would > take to make this be the first complaint. What is probably happening is > that postgres is trying to find /usr/local/PostgreSQL/share/ relative > to itself, but I'd have thought it would notice the problem sooner. The /share/ thingy is what I strongly suspected too; but since the bug report FAQ strongly discourages one from writing your assumptions about what you *think* might be the issue, I refrained from mentioning it. :) But yes, that seems like a logical place to look. - Mark
Re: BUG #4787: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error
From
Tom Lane
Date:
Mark <admin@asarian-host.net> writes: > As for pg_ctl assuming it can find postgres in the same directory it is > in, it SHOULD. :) Basically, I hard-linked all files in > /usr/local/PostgreSQL/bin/ to /usr/local/bin/. So, even when pg_ctl got > started from /usr/local/bin/, it should have found /usr/local/bin/postgres > right under its very nose! Well, it did (else you'd not have got as far as you did). The point you are missing is that other components of the distribution, such as the share/ directory, are expected to be found relative to where the binaries are. (This behavior isn't a bug, but intentional to allow relocatable distribution packages.) If postgres is executed via a symlink then it will correctly determine its own location and successfully locate the share/ directory; otherwise not so much. I think pg_ctl needs to be able to find share/ as well, though that might depend on other things such as whether you have NLS enabled. I was under the impression that there was some code in there to complain if the path-finding code failed, but maybe it's being executed too late. Anyway the bug here is an inadequate error message, not that we should support the configuration. regards, tom lane
Re: BUG #4787: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error
From
Tom Lane
Date:
I wrote: > I was under the impression that there was some code in there to complain > if the path-finding code failed, but maybe it's being executed too late. I looked at this a bit more, and found that there is no such code. Mark's complaint is easy to reproduce if you move (or hardlink) the postgres executable into some other directory away from the share directory and then try to start it on a valid data directory. (If it doesn't find postgresql.conf it'll fail sooner.) initdb behaves a bit more sanely under similar circumstances: $ initdb initdb: file "/home/tgl/trial/share/postgresql/postgres.bki" does not exist This might mean you have a corrupted installation or identified the wrong directory with the invocation option -L. $ The postmaster however is much less dependent on the contents of the share dir than initdb is, so the first time it really notices something is wrong is when it tries to find the file that the timezone_abbreviations GUC is supposed to reference. And when we get there, in perhaps an overabundance of brevity we intentionally don't report the file path: get_share_path(my_exec_path, share_path); snprintf(file_path, sizeof(file_path), "%s/timezonesets/%s", share_path, filename); tzFile = AllocateFile(file_path, "r"); if (!tzFile) { /* at level 0, if file doesn't exist, guc.c's complaint is enough */ if (errno != ENOENT || depth > 0) ereport(tz_elevel, (errcode_for_file_access(), errmsg("could not read time zone file \"%s\": %m", filename))); return -1; } So there are a number of things we could consider doing about this, including just tweaking the above bit of code. But that only helps so long as this is the first such reference to fail during startup --- which is surely pretty coincidental. What I'm inclined to do is modify PostmasterMain so that immediately after find_my_exec, it checks that get_share_path returns the name of a readable directory. (I see that it's already invoking get_pkglib_path at that point, but not checking that the result points to anything --- maybe we should check that too?) The error message would then be something similar to what initdb is saying above, ie, misconfigured installation. Maybe initdb should have an explicit test of this nature too, because the message quoted above could still be misinterpreted. Or maybe this is more work than its worth. I don't recall many similar complaints previously. Comments? regards, tom lane
Re: BUG #4787: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error
From
Mark
Date:
-----Original Message----- From: pgsql-bugs-owner@postgresql.org [mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of Tom Lane Sent: vrijdag 1 mei 2009 23:57 To: Mark; pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #4787: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error > What I'm inclined to do is modify PostmasterMain so that immediately > after find_my_exec, it checks that get_share_path returns the name of > a readable directory. I understand the rationale for relocatable packages. So, I guess hardlinks are out. But, barring hardlinks, perhaps, in the existence of a symlink, a simple 'readlink' function could be done to auto-correct PostgreSQL's base-location? Ala: char buf[1024]; ssizet_t len; .... if ((len = readlink ("/usr/local/bin/pg_ctl", buf, sizeof(buf)-1)) != -1) buf[len] = '\0'; Symlinks are used quite often, *especially* when dealing with relocatable packages (read: that will likely not reside in /usr/local/, etc.). And it would only requires two or three extra lines of code, no? At any rate, I appreciate you looking into this. - Mark
Re: BUG #4787: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error
From
Tom Lane
Date:
Mark <admin@asarian-host.net> writes: > I understand the rationale for relocatable packages. So, I guess hardlinks > are out. But, barring hardlinks, perhaps, in the existence of a symlink, a > simple 'readlink' function could be done to auto-correct PostgreSQL's > base-location? Ala: That's exactly what it already does, and why it would've worked if you'd used symlinks not hardlinks. regards, tom lane
Re: BUG #4787: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error
From
Mark
Date:
On Sat, 02 May 2009 14:47:48 GMT, Tom Lane wrote > Mark <admin@asarian-host.net> writes: > > I understand the rationale for relocatable packages. So, > > I guess hardlinks are out. But, barring hardlinks, > > perhaps, in the existence of a symlink, a simple 'readlink' > > function could be done to auto-correct PostgreSQL's > > base-location? Ala: > > That's exactly what it already does, and why it would've worked > if you'd used symlinks not hardlinks. Interesting. Yet, as I reported earlier, whilst a symlink does seem to start the server, pg_ctl takes a long time to do so, and then report: "could not start server" anyway. But it actually *does* get started. So I figured maybe something was not entirely right with the symlink, either. - Mark
Re: BUG #4787: Hardlink (ln) causes startup failure with bizarre "timezone_abbreviations" error
From
Tom Lane
Date:
Mark <admin@asarian-host.net> writes: > Interesting. Yet, as I reported earlier, whilst a symlink does seem to start > the server, pg_ctl takes a long time to do so, and then report: "could not > start server" anyway. But it actually *does* get started. So I figured maybe > something was not entirely right with the symlink, either. That sounds like pg_ctl isn't finding the postmaster's socket file ... were you playing games with the location of that, too? pg_ctl is not terribly bright about relocated socket files (in particular, it does not read the postmaster's postgresql.conf, so a nonstandard setting there for unix_socket_directory will confuse it). regards, tom lane