From bade818d2a77dd4f5cf93cfaba05f6a11899732c Mon Sep 17 00:00:00 2001 From: Kommi Date: Mon, 18 Feb 2019 12:41:34 +1100 Subject: [PATCH 10/10] Table access method API explanation All the table access method API's and their details are explained. --- doc/src/sgml/am.sgml | 548 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 544 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/am.sgml b/doc/src/sgml/am.sgml index 579187ed1b..d440ebeb58 100644 --- a/doc/src/sgml/am.sgml +++ b/doc/src/sgml/am.sgml @@ -18,12 +18,552 @@ All Tables in PostgreSQL are the primary data store. Each table is stored as its own physical relation - and so is described by an entry in the pg_class - catalog. The contents of an table are entirely under the control of its - access method. (All the access methods furthermore use the standard page - layout described in .) + and is described by an entry in the pg_class + catalog. A table's content is entirely controlled by its access method, although + all access methods use the same standard page layout described in . + + + Table access method API + + + Each table access method is described by a row in the + pg_am system + catalog. The pg_am entry specifies a type + of the access method and a handler function for the + access method. These entries can be created and deleted using the + and SQL commands. + + + + A table access method handler function must be declared to accept a + single argument of type internal and to return the + pseudo-type table_am_handler. The argument is a dummy value that + simply serves to prevent handler functions from being called directly from + SQL commands. The result of the function must be a palloc'd struct of + type TableAmRoutine, which contains everything + that the core code needs to know to make use of the table access method. + The TableAmRoutine struct, also called the access + method's API struct, includes fields specifying assorted + fixed properties of the access method, such as whether it can support + bitmap scans. More importantly, it contains pointers to support + functions for the access method, which do all of the real work to access + tables. These support functions are plain C functions and are not + visible or callable at the SQL level. The support functions are described + in TableAmRoutine structure. For more details, please + refer the file src/include/access/tableam.h. + + + + Any new TABLE ACCSESS METHOD developers can refer the exisitng HEAP + implementation present in the src/backend/heap/heapam_handler.c for more details of + how it is implemented for HEAP access method. + + + + There are differnt type of API's that are defined and those details are below. + + + + Slot implementation functions + + + +const TupleTableSlotOps *(*slot_callbacks) (Relation rel); + + + This API expects the function should return the slot implementation that is specific to the AM. + Following are the predefined types of slot implementations that are available, + TTSOpsVirtual, TTSOpsHeapTuple, + TTSOpsMinimalTuple and TTSOpsBufferHeapTuple. + The AM implementations can use any one of them. For more details of these slot + specific implementations, you can refer src/include/executor/tuptable.h. + + + + + Table scan functions + + + The following API's are used for scanning of a table. + + + + +TableScanDesc (*scan_begin) (Relation rel, + Snapshot snapshot, + int nkeys, struct ScanKeyData *key, + ParallelTableScanDesc parallel_scan, + bool allow_strat, + bool allow_sync, + bool allow_pagemode, + bool is_bitmapscan, + bool is_samplescan, + bool temp_snap); + + + This API to start a scan of a relation pointed by rel using specified options + and returns the TableScanDesc. parallel_scan can be used + by the AM, in case if it support parallel scan. + + + + +void (*scan_end) (TableScanDesc scan); + + + This API to end the scan that is started by the API scan_begin. + + + + +void (*scan_rescan) (TableScanDesc scan, struct ScanKeyData *key, bool set_params, + bool allow_strat, bool allow_sync, bool allow_pagemode); + + + This API to restart the given scan that is already started by the + API scan_begin using the provided options, releasing + any resources (such as buffer pins) that are held by the scan. + + + + +TupleTableSlot *(*scan_getnextslot) (TableScanDesc scan, + ScanDirection direction, TupleTableSlot *slot); + + + This API to return the next satisified tuple from the scan started by the API + scan_begin. + + + + + + parallel table scan functions + + + The following API's are used to perform the parallel scan. + + + + +Size (*parallelscan_estimate) (Relation rel); + + + This API to return the total size that is required for the AM to perform + the parallel table scan. The minimum size that is required is + ParallelBlockTableScanDescData. + + + + +Size (*parallelscan_initialize) (Relation rel, ParallelTableScanDesc parallel_scan); + + + This API to perform the initialization of the parallel_scan + that is required for the parallel scan to be performed by the AM and also return + the total size that is required for the AM to perform the parallel table scan. + + + + +void (*parallelscan_reinitialize) (Relation rel, ParallelTableScanDesc parallel_scan); + + + This API to reinitalize the parallel scan structure pointed by the parallel_scan. + + + + + + Index scan functions + + + +struct IndexFetchTableData *(*begin_index_fetch) (Relation rel); + + + This API to return the allocated and initialized IndexFetchTableData + strutucture that is used to perform the table scan from the index. + + + + +void (*reset_index_fetch) (struct IndexFetchTableData *data); + + + This API to release the AM specific resources that are held by IndexFetchTableData + of a index scan. + + + + +void (*end_index_fetch) (struct IndexFetchTableData *data); + + + This API to release AM-specific resources held by the IndexFetchTableData + of a given index scan and free the memory of IndexFetchTableData itself. + + + + +TransactionId (*compute_xid_horizon_for_tuples) (Relation rel, + ItemPointerData *items, + int nitems); + + + This API to get the newest xid among the provided tuples by items. This is used + to compute what snapshots to conflict with the items when replaying WAL records + for page-level index vacuums. + + + + + Manipulation of physical tuples functions + + + +void (*tuple_insert) (Relation rel, TupleTableSlot *slot, CommandId cid, + int options, struct BulkInsertStateData *bistate); + + + This API to insert the tuple contained in the provided slot into the relation + and update the unique identifier of the tuple ItemPointerData + in the slot, use the BulkInsertStateData if available. + + + + +void (*tuple_insert_speculative) (Relation rel, + TupleTableSlot *slot, + CommandId cid, + int options, + struct BulkInsertStateData *bistate, + uint32 specToken); + + + This API is similar like tuple_insert API, but it inserts the tuple + with addtional information that is necessray for speculative insertion, the insertion will + be confirmed later based on its successful insertion to the index. + + + + +void (*tuple_complete_speculative) (Relation rel, + TupleTableSlot *slot, + uint32 specToken, + bool succeeded); + + + This API to complete the speculative insertion of a tuple started by tuple_insert_speculative, + invoked after finishing the index insert and returns whether the operation is successfule or not? + + + + +HTSU_Result (*tuple_delete) (Relation rel, + ItemPointer tid, + CommandId cid, + Snapshot snapshot, + Snapshot crosscheck, + bool wait, + HeapUpdateFailureData *hufd, + bool changingPart); + + + This API to delete a tuple of the relation pointed by the ItemPointer and returns the + result of the operation. In case of any failure updates the hufd. + + + + +HTSU_Result (*tuple_update) (Relation rel, + ItemPointer otid, + TupleTableSlot *slot, + CommandId cid, + Snapshot snapshot, + Snapshot crosscheck, + bool wait, + HeapUpdateFailureData *hufd, + LockTupleMode *lockmode, + bool *update_indexes); + + + This API to perform updating a tuple with the new tuple pointed by the ItemPointer and returns + the result of the operation and also updates the flag whether the index needs an update or not? + In case of any failure it should update the hufd flag. + + + + +void (*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots, + CommandId cid, int options, struct BulkInsertStateData *bistate); + + + This API to perform insertion of multiple tuples into the relation for faster data insertion. + use the BulkInsertStateData if available. + + + + +HTSU_Result (*tuple_lock) (Relation rel, + ItemPointer tid, + Snapshot snapshot, + TupleTableSlot *slot, + CommandId cid, + LockTupleMode mode, + LockWaitPolicy wait_policy, + uint8 flags, + HeapUpdateFailureData *hufd); + + + This API to lock the specified tuple pointed by the ItemPointer tid + of its newest version and returns the result of the operation. In case of failure updates the hufd. + + + + +void (*finish_bulk_insert) (Relation rel, int options); + + + This API to perform the operations necessary to complete insertions made + via tuple_insert and multi_insert with a + BulkInsertState specified. This e.g. may e.g. used to flush the relation when + inserting with skipping WAL or may be no operation. + + + + + + Non modifying tuple functions + + + +bool (*tuple_fetch_row_version) (Relation rel, + ItemPointer tid, + Snapshot snapshot, + TupleTableSlot *slot, + Relation stats_relation); + + + This API to fetches the latest tuple specified by the ItemPointer tid + and store it in the slot. For e.g, in the case if Heap AM, the update chains are created + whenever the tuple is updated, so the function should fetch the latest tuple. + + + + +void (*tuple_get_latest_tid) (Relation rel, + Snapshot snapshot, + ItemPointer tid); + + + This API to get the TID of the latest version of the tuple based on the specified + ItemPointer. For e.g, in the case of Heap AM, the update chains are created whenever + any tuple is updated. This API is useful to find out latest ItemPointer. + + + + +bool (*tuple_fetch_follow) (struct IndexFetchTableData *scan, + ItemPointer tid, + Snapshot snapshot, + TupleTableSlot *slot, + bool *call_again, bool *all_dead); + + + This API is used to fetch the tuple pointed by the ItemPointer based on the + IndexFetchTableData and store it in the specified slot and also updates the flags. + This API is called from the index scan operation. + + + + +bool (*tuple_satisfies_snapshot) (Relation rel, + TupleTableSlot *slot, + Snapshot snapshot); + + + This API performs the tuple visibility based on provided snapshot and returns + "true" if the current tuple is visible, otherwise "false". + + + + + + DDL related functions + + + +void (*relation_set_new_filenode) (Relation rel, + char persistence, + TransactionId *freezeXid, + MultiXactId *minmulti); + + + This API to create the storage that is necessary to store the tuples of the relation + and also updates the minimum XID that is possible to insert the tuples. For e.g, the Heap AM, + should create the relfilenode that is necessary to store the heap tuples. + + + + +void (*relation_nontransactional_truncate) (Relation rel); + + + This API is used to truncate the specified relation, this operation is not non-reversible. + + + + +void (*relation_copy_data) (Relation rel, RelFileNode newrnode); + + + This API to perform the copy of the relation from existing filenode to the new filenode + specified by the newrnode and removes the existing filenode. + + + + +void (*relation_vacuum) (Relation onerel, int options, + struct VacuumParams *params, BufferAccessStrategy bstrategy); + + + This API performs vacuuming of the relation based on the specified params. + It Gathers all the dead tuples of the relation and clean them including + the indexes. + + + + +void (*scan_analyze_next_block) (TableScanDesc scan, BlockNumber blockno, + BufferAccessStrategy bstrategy); + + + This API to return a relation block, required to perform tuple analysis. Analysis of this + information is used by the planner to optimize the query planning on this relation. + + + + +bool (*scan_analyze_next_tuple) (TableScanDesc scan, TransactionId OldestXmin, + double *liverows, double *deadrows, TupleTableSlot *slot); + + + This API to get the next visible tuple from the block being scanned based on the snapshot + and also updates the number of live and dead tuples encountered. + + + + +void (*relation_copy_for_cluster) (Relation NewHeap, Relation OldHeap, Relation OldIndex, + bool use_sort, + TransactionId OldestXmin, TransactionId FreezeXid, MultiXactId MultiXactCutoff, + double *num_tuples, double *tups_vacuumed, double *tups_recently_dead); + + + This API to make a copy of the content of a relation, optionally sorted using either the specified index or by sorting + explicitly. It also removes the dead tuples. + + + + +double (*index_build_range_scan) (Relation heap_rel, + Relation index_rel, + IndexInfo *index_nfo, + bool allow_sync, + bool anyvisible, + BlockNumber start_blockno, + BlockNumber end_blockno, + IndexBuildCallback callback, + void *callback_state, + TableScanDesc scan); + + + This API to scan the specified blocks of a given relation and insert them into the specified index + using the provided the callback function. + + + + +void (*index_validate_scan) (Relation heap_rel, + Relation index_rel, + IndexInfo *index_info, + Snapshot snapshot, + struct ValidateIndexState *state); + + + This API to scan the table according to the given snapshot and insert tuples + satisfying the snapshot into the specified index, provided their TIDs are + also present in the ValidateIndexState struct; + this API is used as the last phase of a concurrent index build. + + + + + + planner functions + + + +void (*relation_estimate_size) (Relation rel, int32 *attr_widths, + BlockNumber *pages, double *tuples, double *allvisfrac); + + + This API estimates the total size of the relation and also returns the number of + pages, tuples and etc related to the corresponding relation. + + + + + + executor functions + + + +bool (*scan_bitmap_pagescan) (TableScanDesc scan, + TBMIterateResult *tbmres); + + + This API to scan the relation block specified in the scan descriptor to collect and return the + tuples requested by the tbmres based on the visibility. + + + + +bool (*scan_bitmap_pagescan_next) (TableScanDesc scan, + TupleTableSlot *slot); + + + This API to get the next tuple from the set of tuples of a given page specified in the scan descriptor + and return the provided slot; returns false in case if there are no more tuples. + + + + +bool (*scan_sample_next_block) (TableScanDesc scan, + struct SampleScanState *scanstate); + + + This API to select the next block of a relation using the given sampling method or sequentially and + set its information in the scan descriptor. + + + + +bool (*scan_sample_next_tuple) (TableScanDesc scan, + struct SampleScanState *scanstate, + TupleTableSlot *slot); + + + This API get the next tuple to sample from the current sampling block based on + the sampling method, otherwise get the next visible tuple of the block that is + choosen from the scan_sample_next_block. + + + + -- 2.20.1.windows.1