Thread: ERROR: invalid memory alloc request size, and others

ERROR: invalid memory alloc request size, and others

From

Jonathan Hedstrom

Date:

09 January 2007, 15:38:37

We recently upgraded from 8.1.4 to 8.2.0 on Fedora Core 6, and are now
seeing a few rather ominous-looking messages.

The problem started with this one, during an update involving a rather
complex view:

ERROR:  invalid memory alloc request size 1174405120

Then, when attempting to re-run the same update statement:

PANIC:  cannot abort transaction 8682091, it was already committed

and then again:

PANIC:  right sibling's left-link doesn't match


I ran reindex on the tables in question, which fixed the problem in the
short term and allowed the update to complete, but then I got the memory
allocation error and these overnight:

ERROR:  invalid page header in block 3362 of relation
"index_clin_dal_staff_id"
ERROR:  invalid page header in block 2325 of relation "index_clin_dal_batch"

I will gladly provide additional information to help track down and
hopefully solve this problem, but at this point I'm not sure what else
to include.

Any help would be much appreciated.

Thanks,
Jonathan

Re: ERROR: invalid memory alloc request size, and others

From

Scott Marlowe

Date:

09 January 2007, 16:01:57

On Tue, 2007-01-09 at 13:38, Jonathan Hedstrom wrote:
> We recently upgraded from 8.1.4 to 8.2.0 on Fedora Core 6, and are now
> seeing a few rather ominous-looking messages.
>
> The problem started with this one, during an update involving a rather
> complex view:
>
> ERROR:  invalid memory alloc request size 1174405120
>
> Then, when attempting to re-run the same update statement:
>
> PANIC:  cannot abort transaction 8682091, it was already committed
>
> and then again:
>
> PANIC:  right sibling's left-link doesn't match
>
>
> I ran reindex on the tables in question, which fixed the problem in the
> short term and allowed the update to complete, but then I got the memory
> allocation error and these overnight:
>
> ERROR:  invalid page header in block 3362 of relation
> "index_clin_dal_staff_id"
> ERROR:  invalid page header in block 2325 of relation "index_clin_dal_batch"
>
> I will gladly provide additional information to help track down and
> hopefully solve this problem, but at this point I'm not sure what else
> to include.

First step, update to 8.2.1, it came out yesterday, and there were a few
bugs that got stomped.  Don't know if they are related to your problem,
but having the latest version is usually a "good thing".

Also, schedule some maintenance window for your server to run memtest86
and possibly something to check for bad blocks on your drives.  Often
errors like the invalid memory alloc request size you're seeing, and the
link doesn't match one are caused by bad hardware.

Re: ERROR: invalid memory alloc request size, and others

From

Jonathan Hedstrom

Date:

09 January 2007, 17:13:53

Scott Marlowe wrote:
> On Tue, 2007-01-09 at 13:38, Jonathan Hedstrom wrote:
>
>> We recently upgraded from 8.1.4 to 8.2.0 on Fedora Core 6, and are now
>> seeing a few rather ominous-looking messages.
>>
>> The problem started with this one, during an update involving a rather
>> complex view:
>>
>> ERROR:  invalid memory alloc request size 1174405120
>>
>> Then, when attempting to re-run the same update statement:
>>
>> PANIC:  cannot abort transaction 8682091, it was already committed
>>
>> and then again:
>>
>> PANIC:  right sibling's left-link doesn't match
>>
>>
>> I ran reindex on the tables in question, which fixed the problem in the
>> short term and allowed the update to complete, but then I got the memory
>> allocation error and these overnight:
>>
>> ERROR:  invalid page header in block 3362 of relation
>> "index_clin_dal_staff_id"
>> ERROR:  invalid page header in block 2325 of relation "index_clin_dal_batch"
>>
>> I will gladly provide additional information to help track down and
>> hopefully solve this problem, but at this point I'm not sure what else
>> to include.
>>
>
> First step, update to 8.2.1, it came out yesterday, and there were a few
> bugs that got stomped.  Don't know if they are related to your problem,
> but having the latest version is usually a "good thing".
>
>
I noticed that 8.2.1 had been released shortly after I sent off my
initial email. I've upgraded and hopefully that will take care of it.
> Also, schedule some maintenance window for your server to run memtest86
> and possibly something to check for bad blocks on your drives.  Often
> errors like the invalid memory alloc request size you're seeing, and the
> link doesn't match one are caused by bad hardware.

I'll try to run these tests soon, but this is a production server, so
scheduling downtime takes a bit of planning. I should also mention that
this is the exact same server we've been running 8.1.4 on for about 6
months without a problem (doing the same nightly update causing the
problem etc), so I'm a bit skeptical about having suddenly developed
hardware issues.

Thanks,
Jonathan

Re: ERROR: invalid memory alloc request size, and others

From

Tom Lane

Date:

09 January 2007, 17:24:28

Scott Marlowe <smarlowe@g2switchworks.com> writes:
> On Tue, 2007-01-09 at 13:38, Jonathan Hedstrom wrote:
>> We recently upgraded from 8.1.4 to 8.2.0 on Fedora Core 6, and are now
>> seeing a few rather ominous-looking messages.

> First step, update to 8.2.1, it came out yesterday, and there were a few
> bugs that got stomped.

Definitely good advice, although none of these messages look related to
the known fixes.

> Also, schedule some maintenance window for your server to run memtest86
> and possibly something to check for bad blocks on your drives.

+1 ... I have not seen any instance of "invalid page header" that could
be traced to a Postgres bug.  The cases I've been able to study all
seemed to involve either flaky hardware or kernel-level bugs (such as
dumping a fragment of some unrelated file into a Postgres table :-()

            regards, tom lane

Re: ERROR: invalid memory alloc request size, and others

From

Jonathan Hedstrom

Date:

09 January 2007, 18:12:28

Tom Lane wrote:
> Scott Marlowe <smarlowe@g2switchworks.com> writes:
>
>> Also, schedule some maintenance window for your server to run memtest86
>> and possibly something to check for bad blocks on your drives.
>>
>
> +1 ... I have not seen any instance of "invalid page header" that could
> be traced to a Postgres bug.  The cases I've been able to study all
> seemed to involve either flaky hardware or kernel-level bugs (such as
> dumping a fragment of some unrelated file into a Postgres table :-()
>
>             regards, tom lane
>
Since it sounds like this is either a hardware or a kernel issue, we're
wondering if our downtime would be better spent rebooting to the
standard FC6 kernel, or trying some of the aforementioned hardware tests...

We are running a xen kernel:   2.6.18-1.2798.fc6xen

and getting these kernel errors in our logs:

Jan  7 18:51:23 ws116 kernel: SKB BUG: Invalid truesize (4012)
len=16384, sizeof(sk_buff)=172
Jan  7 18:51:23 ws116 kernel: SKB BUG: Invalid truesize (4012)
len=16384, sizeof(sk_buff)=172
Jan  9 08:52:12 ws116 kernel: SKB BUG: Invalid truesize (4012)
len=16384, sizeof(sk_buff)=172
Jan  9 13:07:35 ws116 kernel: SKB BUG: Invalid truesize (4012)
len=16384, sizeof(sk_buff)=172

(The memory alloc error first occured early in the morning on the 8th).

Thanks,
Jonathan

Re: ERROR: invalid memory alloc request size, and others

From

Tom Lane

Date:

09 January 2007, 18:31:18

Jonathan Hedstrom <jhedstrom@desc.org> writes:
> We are running a xen kernel:   2.6.18-1.2798.fc6xen

When did you start doing that ... any relation to the time when the
problems started?

I've heard some unkind remarks about the stability of Xen, though
I have no direct knowledge about it.

            regards, tom lane

Re: ERROR: invalid memory alloc request size, and others

From

Andrew Kroeger

Date:

09 January 2007, 18:40:03

Jonathan Hedstrom wrote:
> Tom Lane wrote:
>> Scott Marlowe <smarlowe@g2switchworks.com> writes:
>>
>>> Also, schedule some maintenance window for your server to run memtest86
>>> and possibly something to check for bad blocks on your drives.
>>>
>> +1 ... I have not seen any instance of "invalid page header" that could
>> be traced to a Postgres bug.  The cases I've been able to study all
>> seemed to involve either flaky hardware or kernel-level bugs (such as
>> dumping a fragment of some unrelated file into a Postgres table :-()
>>
>>             regards, tom lane
>>
> Since it sounds like this is either a hardware or a kernel issue, we're
> wondering if our downtime would be better spent rebooting to the
> standard FC6 kernel, or trying some of the aforementioned hardware tests...
>
> We are running a xen kernel:   2.6.18-1.2798.fc6xen

This is the base Xen kernel from the FC 6 release.  There have been 3
updates released since then (most recently 01-Jan).  I see a number of
Xen fixes in the changelog, and I know that the major factor in the
slippage of the FC 6 release was getting Xen into the distro -- so I
would definitely expect some Xen bugs in the initial cut from the release.

Simplest advice I can think of:  if you don't need Xen, go back to the
stock (albeit most recent update) kernel.  If you do need Xen, try the
most recent update of the stock kernel anyway.  If the problems persist,
you've at least eliminated one variable.  If they go away, you've got
the culprit.

Note that I don't use Xen, so I'm not completely up-to-date...  Last I
knew there were issues preventing Xen from being included in the
upstream Linux kernel (vendors are patching it in individually), and
that speaks volumes as to the "newness" of the technology.

Andrew

Re: ERROR: invalid memory alloc request size, and others

From

Andrew Kroeger

Date:

09 January 2007, 18:42:44

Jonathan Hedstrom wrote:
> Scott Marlowe wrote:
>> On Tue, 2007-01-09 at 13:38, Jonathan Hedstrom wrote:
>>
>>> We recently upgraded from 8.1.4 to 8.2.0 on Fedora Core 6, and are now
>>> seeing a few rather ominous-looking messages.

[ SNIP ]

>> Also, schedule some maintenance window for your server to run memtest86
>> and possibly something to check for bad blocks on your drives.  Often
>> errors like the invalid memory alloc request size you're seeing, and the
>> link doesn't match one are caused by bad hardware.
>
> I'll try to run these tests soon, but this is a production server, so
> scheduling downtime takes a bit of planning. I should also mention that
> this is the exact same server we've been running 8.1.4 on for about 6
> months without a problem (doing the same nightly update causing the
> problem etc), so I'm a bit skeptical about having suddenly developed
> hardware issues.

Not having experienced any of the above issues myself, the following are
totally off-the-cuff:

You indicated you are now running Fedora Core 6, and that your previous
8.1.4 configuration was running for ~6 months prior to upgrading.  I
assume that because it is a production server, the FC 6 upgrade was
performed at the same time as the 8.2 upgrade.  You indicated that
you're running on the exact same hardware as before, so unless you
opened the machine, relocated it, etc. as part of the upgrade, hardware
issues would seem unlikely.

FC 6 brings a number of changes to the table -- the biggest one being
the 2.6.18 Linux kernel.  The 2.6.18 kernel updates were released for FC
5 in the middle of October, but as you are talking about a production
server, I would imagine you did not upgrade at that time.

Have you applied all current FC 6 updates?  I definitely recommend
making sure you are current with all updates.  A quick check shows that
FC 6 has released almost as many package updates as since its release as
FC 5 has since it was released (6 months more time).  I know there were
a number of issues that were queued up and released right after the FC 6
release, to avoid slipping the release date any further than it had
already slipped.

If you can't attribute the issues to any software problems, I would
definitely start looking at the hardware, as Scott & Tom suggested.  As
unlikely as hardware issues might seem, that seems to be the most
frequent time those little buggers manage to show their teeth :^)

Hope this helps...

Andrew

Re: ERROR: invalid memory alloc request size, and others

From

Jonathan Hedstrom

Date:

09 January 2007, 19:36:10

Andrew Kroeger wrote:
> This is the base Xen kernel from the FC 6 release.  There have been 3
> updates released since then (most recently 01-Jan).  I see a number of
> Xen fixes in the changelog, and I know that the major factor in the
> slippage of the FC 6 release was getting Xen into the distro -- so I
> would definitely expect some Xen bugs in the initial cut from the
> release.
>
We had downloaded the kernel updates, but after doing so, forgot to
reboot in order to use the updated kernel...
> Simplest advice I can think of:  if you don't need Xen, go back to the
> stock (albeit most recent update) kernel.  If you do need Xen, try the
> most recent update of the stock kernel anyway.  If the problems
> persist, you've at least eliminated one variable.  If they go away,
> you've got the culprit.
>
We downloaded the most recent stock FC6 kernel and rebooted to that.
Hopefully this will take care of the issue.

I reindexed the tables related to the failing update. Other than that,
is there any cleanup work I should do relating to these errors?

Thanks for all the quick responses.

-Jonathan

Re: ERROR: invalid memory alloc request size, and others

From

Jonathan Hedstrom

Date:

09 January 2007, 20:05:36

Tom Lane wrote:
> Jonathan Hedstrom <jhedstrom@desc.org> writes:
>
>> We are running a xen kernel:   2.6.18-1.2798.fc6xen
>>
>
> When did you start doing that ... any relation to the time when the
> problems started?
>

We started using the xen kernel on the 3rd, and there wasn't any
indication of a problem until the 8th, so that means the update ran w/o
problem on the 4th and 5th (it doesn't run over the weekend). We have
decided to run the stock fc6 kernel for now.

-Jonathan

Attachment

jhedstrom.vcf

Re: ERROR: invalid memory alloc request size, and others

From

Jonathan Hedstrom

Date:

11 January 2007, 15:32:47

Jonathan Hedstrom wrote:
> We downloaded the most recent stock FC6 kernel and rebooted to that.
> Hopefully this will take care of the issue.

We've been up and running for 2 days now on the stock kernel, and
haven't seen any of these errors. I'm thinking the issue is resolved.
Thanks again for all the replies.

-Jonathan