Re: Questions about update, delete, ctid... - Mailing list pgsql-general
From | DANTE Alexandra |
---|---|
Subject | Re: Questions about update, delete, ctid... |
Date | |
Msg-id | 44CDF926.4030500@bull.net Whole thread Raw |
In response to | Re: Questions about update, delete, ctid... (Martijn van Oosterhout <kleptog@svana.org>) |
Responses |
Re: Questions about update, delete, ctid...
|
List | pgsql-general |
Hello Martijn, hello List, Last question on this subject, what's happened during a SELECT query on a tuple just updated and commited ? I followed in the source code the links between these methods : CreateQueryDesc, ExecutorStart, ExecutorRun, ExecutePlan, ExecSelect but I still have a question : - during a SELECT query on a tuple just updated and commited, does the executor first detect the old tuple and then via the c_tid link go to the new version of the tuple ? or go directly to the new version ? - is it the same for the index ? Could you tell me where theses checks are done in the source code ? Thank you very much ! Regards, Alexandra DANTE Martijn van Oosterhout wrote: >On Mon, Jul 31, 2006 at 11:04:58AM +0200, DANTE Alexandra wrote: > > >>I've just seen that I've done a mistake in my example. My question was : >>is it correct to think that the ctid of the old version of the tuple is >>a link to newer version ? In my example, is it correct to think that the >>tuple : >>140049 | 0 | (0,12) | 11 | IRAQ >>has become : >>new value | 0 | (0,26) | 11 | *IRAQ* >>Could you give me more details about the link between the old and the >>new version, please ? >>For me, the link is the c_tid, but maybe I'm wrong... >> >> > >Well, in your case where there are no other transactions running, yes. >In the general case there may have been other updates so all you know >is that the new tuple is a descendant of the old one. The chain of >t_ctid links can be arbitrarily long. > >Note: with multiple psql sessions you can see some of this happening. > >======= Session 1 ======= >test=# begin; >BEGIN >test=# set TRANSACTION ISOLATION LEVEL SERIALIZABLE ; >SET >test=# select xmin, xmax, cmin, cmax, * from a; > xmin | xmax | cmin | cmax | country >--------+------+------+------+--------- > 277264 | 0 | 0 | 0 | IRAQ >(1 row) > >======= Session 2 ======= >test=# select xmin, xmax, cmin, cmax, * from a; > xmin | xmax | cmin | cmax | country >--------+------+------+------+--------- > 277264 | 0 | 0 | 0 | IRAQ >(1 row) > >test=# update a set country = 'ITALY'; >UPDATE 1 >test=# select xmin, xmax, cmin, cmax, * from a; > xmin | xmax | cmin | cmax | country >--------+------+------+------+--------- > 277269 | 0 | 0 | 0 | ITALY >(1 row) > >======= Session 1 again ======= >test=# select xmin, xmax, cmin, cmax, * from a; > xmin | xmax | cmin | cmax | country >--------+--------+--------+------+--------- > 277264 | 277269 | 277269 | 0 | IRAQ >(1 row) > >As you can see now, both sessions are seeing different views of the >same table. The old tuple has now updated xmax and cmin values. If >session two updated the tuple again it would get a cmax value also. You >can't see the t_ctid link here, but there is one from the old row to >the new one. > > > >>My question about the "infomask" strucutre was linked to the code of >>VACUUM. I've seen in the "lazy_scan_heap method that the >>"HeapTupleSatisfiesVacuum" method is called. In this method, according >>to the value of "infomask", a tuple is defined as "dead" or not. >>That's why I wonder if the "infomask" structure is changed after an >>commited update or delete, and what are the values set ? >> >> > >It was Tom who pointed this out to me first: the infomask is not the >important part. The infomask is just a cache of the results of tests. > >The problem is that checking if a transaction has been committed or not >can be reasonably expensive since it might have to check on disk. If >you had to do that every time you looked up a tuple the performence >would be terrible. So what happens is that the first time someone looks >up the status of a transaction and finds it's committed or aborted, it >sets that bit so no-one else has to do the test. > >The basic result is that you can read the code as if the infomask was >blank and the result should be the same. The only difference is that >various bits allow the code to skip certain tests because somebody has >already done them before. The end result should be the same. > > > >>So, consequently, it is not necessary to rebuild the B-tree index after >>an update or a delete. >>Is it correct ? >> >> > >You never have to rebuild the index. The whole system is designed so >many people can be reading and writing the index simultaneously without >getting in eachothers way. > >Hope this helps, > >
pgsql-general by date: