Thread: PostgreSQL logical decoder output plugin - unchanged toast data

PostgreSQL logical decoder output plugin - unchanged toast data

From

Georgy Buranov

Date:

31 August 2018, 16:36:26

Hi. I have a question about PostgreSQL logical decoder output plugin.

I am not specialist in Postgres at all, so maybe I miss some very
basic point. In the plugin, I want to always get all the values (even
those that are unchanged  toast data)


When I try to get the datum that is internal on disk (here is code)

```
        struct varlena *s = (struct varlena *)DatumGetPointer(origval);
        struct varlena * ret = heap_tuple_untoast_attr(s);
        Datum result = PointerGetDatum(PG_DETOAST_DATUM(ret));
```
it fails with no known snapshots error (from heap_tuple_untoast_attr).
My question is why is it that.

So, even theoretically it is not possible to the the varlena on disk
from logical replication plugin?



Mit freundlichen Grüßen,
Georgy Buranov

Re: PostgreSQL logical decoder output plugin - unchanged toast data

From

Andres Freund

Date:

31 August 2018, 17:36:37

Hi,

Hi,

On 2018-08-31 15:36:26 +0200, Georgy Buranov wrote:
> I am not specialist in Postgres at all, so maybe I miss some very
> basic point. In the plugin, I want to always get all the values (even
> those that are unchanged  toast data)

> When I try to get the datum that is internal on disk (here is code)
> 
> ```
>         struct varlena *s = (struct varlena *)DatumGetPointer(origval);
>         struct varlena * ret = heap_tuple_untoast_attr(s);
>         Datum result = PointerGetDatum(PG_DETOAST_DATUM(ret));
> ```
> it fails with no known snapshots error (from heap_tuple_untoast_attr).
> My question is why is it that.

Yes, that's not possible in general. On-disk toasted data for tuples
from the WAL are not guaranteed in any way to be retained. If that
weren't the case database tables would bloat while logical replication
is behind, and the sequential reads (i.e. fast) reads of logical
decoding would turn into random IO.

You can however alter the replication identity of tables to FULL. Then
the "old" tuple in change callbacks will have the full old tuple. But
that will increase the size of the WAL stream obviously.

> So, even theoretically it is not possible to the the varlena on disk
> from logical replication plugin?

Correct.

Greetings,

Andres Freund

Re: PostgreSQL logical decoder output plugin - unchanged toast data

From

Georgy Buranov

Date:

31 August 2018, 17:55:37

Ok, thank you very much for your explanation,

maybe I need something else in my case.

As far as I understand, "On-disk toasted data for tuples from the WAL
are not guaranteed in any way to be retain", but still, the LATEST
value for the same cell should exist in postgres (in on-disk toast if
it is huge, or not).

If I cannot get access to the on-disk toasted data for tuple from the
WAL, can I have the access to the _latest_ value in this case
(hopefully I describe it correct)

> Yes, that's not possible in general. On-disk toasted data for tuples
> from the WAL are not guaranteed in any way to be retained. If that
> weren't the case database tables would bloat while logical replication
> is behind, and the sequential reads (i.e. fast) reads of logical
> decoding would turn into random IO.

Mit freundlichen Grüßen,
Georgy Buranov

Re: PostgreSQL logical decoder output plugin - unchanged toast data

From

Andres Freund

Date:

31 August 2018, 18:07:03

Hi,

On 2018-08-31 16:55:37 +0200, Georgy Buranov wrote:
> Ok, thank you very much for your explanation,
> 
> maybe I need something else in my case.
> 
> As far as I understand, "On-disk toasted data for tuples from the WAL
> are not guaranteed in any way to be retain", but still, the LATEST
> value for the same cell should exist in postgres (in on-disk toast if
> it is huge, or not).

No, there's absolutely no such guarantee.  The tuple could since have
been deleted, the column could have been updated to non-toasted, the
table could have been dropped, ...


> If I cannot get access to the on-disk toasted data for tuple from the
> WAL, can I have the access to the _latest_ value in this case
> (hopefully I describe it correct)

Again, you can set REPLICA IDENTITY to FULL and it'll be there.

Greetings,

Andres Freund

Re: PostgreSQL logical decoder output plugin - unchanged toast data

From

Georgy Buranov

Date:

31 August 2018, 18:34:02

> Again, you can set REPLICA IDENTITY to FULL and it'll be there.
>

So, why I think this is complicated

* We use primary keys for all tables, so we do not need REPLICA
IDENTITY full actually. As far as I understand, it will make
master/slave replication ineffective as well
* I need the information about this primary key in rd_replidindex  of
relation (to send it later to kafka). As far as I understand, with
REPLICA IDENTITY FULL the rd_replidindex will not be the primary key

Re: PostgreSQL logical decoder output plugin - unchanged toast data

From

Andres Freund

Date:

31 August 2018, 18:39:12

On 2018-08-31 17:34:02 +0200, Georgy Buranov wrote:
> > Again, you can set REPLICA IDENTITY to FULL and it'll be there.
> >
> 
> So, why I think this is complicated
> 
> * We use primary keys for all tables, so we do not need REPLICA
> IDENTITY full actually. As far as I understand, it will make
> master/slave replication ineffective as well

What do you mean with "ineffective"?  That there's more data in the WAL?
Sure. Otherwise I don't know what you could mean.

There's no free lunch :/

> * I need the information about this primary key in rd_replidindex  of
> relation (to send it later to kafka). As far as I understand, with
> REPLICA IDENTITY FULL the rd_replidindex will not be the primary key

If you want the pkey, that's in rd_pkindex. rd_replidindex will only
differ if the identity is manually set to another candidate key
(possibly because there's no pkey).

Greetings,

Andres Freund

Re: PostgreSQL logical decoder output plugin - unchanged toast data

From

Georgy Buranov

Date:

31 August 2018, 19:30:28

Ok, I got false understanding that REPLICA IDENTITY is used for
something more than a WAL. This is basically not true. So, what I can
do

* Set the REPLICA IDENTITY to full, and in this case I can still get
the pk from rd_pkindex. In this case the WAL will be bigger, but we
will have all the values in it.
* Other solution is basically (since we know exact primary key) - get
the row, and take the latest value from there using SQL query on the
client side (other part of kafka)

Thank you very much fro your help

> What do you mean with "ineffective"?  That there's more data in the WAL?
> Sure. Otherwise I don't know what you could mean.
>
> There's no free lunch :/

Re: PostgreSQL logical decoder output plugin - unchanged toast data

From

Georgy Buranov

Date:

04 September 2018, 12:45:10

Hi Andres. Thank you very much for your help.

I tried the following solution and I have some problems.

* I have 9.6 postgres and I do not have separate field for rd_pkindex
* All I have is rd_replidindex field. Usually (when REPLICA IDENTITY
is NOT FULL), it still contains the primary key
* But in the case of REPLICA IDENTITY FULL - rd_replidindex is NULL
and rd_pkindex does not exist

So, is there a way to get the primary key on 9.6 postgres?

> If you want the pkey, that's in rd_pkindex. rd_replidindex will only
> differ if the identity is manually set to another candidate key
> (possibly because there's no pkey).
>

Re: PostgreSQL logical decoder output plugin - unchanged toast data

From

Andres Freund

Date:

04 September 2018, 19:42:40

Hi,

On 2018-09-04 11:45:10 +0200, Georgy Buranov wrote:
> Hi Andres. Thank you very much for your help.
> 
> I tried the following solution and I have some problems.
> 
> * I have 9.6 postgres and I do not have separate field for rd_pkindex
> * All I have is rd_replidindex field. Usually (when REPLICA IDENTITY
> is NOT FULL), it still contains the primary key
> * But in the case of REPLICA IDENTITY FULL - rd_replidindex is NULL
> and rd_pkindex does not exist
> 
> So, is there a way to get the primary key on 9.6 postgres?

You have to to query the syscache yourself, in that case.  I know
nothing about what you're doing in your output plugin.  In the ones I
wrote there's a side-cache for per-relation output plugin specific data,
do you have something like that?  You can probably just crib the code
for the lookup from dblink.c:get_pkey_attnames().  If you have a cache,
itmight be worth caching the results.

Greetings,

Andres Freund