Table AM modifications to accept column projection lists - Mailing list pgsql-hackers
From | Soumyadeep Chakraborty |
---|---|
Subject | Table AM modifications to accept column projection lists |
Date | |
Msg-id | CAE-ML+9RmTNzKCNTZPQf8O3b-UjHWGFbSoXpQa3Wvuc8YBbEQw@mail.gmail.com Whole thread Raw |
Responses |
Re: Table AM modifications to accept column projection lists
|
List | pgsql-hackers |
Hello, This patch introduces a set of changes to the table AM APIs, making them accept a column projection list. That helps columnar table AMs, so that they don't need to fetch all columns from disk, but only the ones actually needed. The set of changes in this patch is not exhaustive - there are many more opportunities that are discussed in the TODO section below. Before digging deeper, we want to elicit early feedback on the API changes and the column extraction logic. TableAM APIs that have been modified are: 1. Sequential scan APIs 2. Index scan APIs 3. API to lock and return a row 4. API to fetch a single row We have seen performance benefits in Zedstore for many of the optimized operations [0]. This patch is extracted from the larger patch shared in [0]. ------------------------------------------------------------------------ Building the column projection set: In terms of building the column projection set necessary for each of these APIs, this patch builds off of the scanCols patch [1], which Ashwin and Melanie had started earlier. As noted in [1], there are cases where the scanCols set is not representative of the columns to be projected. For instance, in a DELETE .. RETURNING query, there is typically a sequential scan and a separate invocation of tuple_fetch_row_version() in order to satisfy the RETURNING clause (see ExecDelete()). So for a query such as: DELETE from foo WHERE i < 100 && j < 1000 RETURNING k, l; We need to pass the set (i, j) to the scan and (k, l) to the tuple_fetch_row_version() invocation. This is why we had to introduce the returningCols field. In the same spirit, separate column projection sets are computed for any operations that involve an EPQ check (INSERT, DELETE, UPDATE, row-level locking etc), the columns involved in an ON CONFLICT UPDATE etc. Recognizing and collecting these sets of columns is done at various stages: analyze and rewrite, planner and executor - depending on the type of operation for which the subset of columns is calculated. The column bitmaps are stored in different places as well - such as the ones for scans and RETURNING are stored in RangeTblEntry, whereas the set of columns for ON CONFLICT UPDATE are stored in OnConflictSetState. ------------------------------------------------------------------------ Table AM API changes: The changes made to the table AM API, introducing the column projection set, come in different flavors. We would like feedback on what style we need to converge to or if we should use different styles depending on the situation. - A new function variant that takes a column projection list, such as: TableScanDesc (*scan_begin) (Relation rel, Snapshot snapshot, int nkeys, struct ScanKeyData *key, ParallelTableScanDesc pscan, uint32 flags); -> TableScanDesc (*scan_begin_with_column_projection)(Relation relation, Snapshot snapshot, int nkeys, struct ScanKeyData *key, ParallelTableScanDesc parallel_scan, uint32 flags, Bitmapset *project_columns); - Modifying the existing function to take a column projection list, such as: TM_Result (*tuple_lock) (Relation rel, ItemPointer tid, Snapshot snapshot, TupleTableSlot *slot, CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy, uint8 flags, TM_FailureData *tmfd); -> TM_Result (*tuple_lock) (Relation rel, ItemPointer tid, Snapshot snapshot, TupleTableSlot *slot, CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy, uint8 flags, TM_FailureData *tmfd, Bitmapset *project_cols); - A new function index_fetch_set_column_projection() to be called after index_beginscan() to set the column projection set, which will be used later by index_getnext_slot(). void (*index_fetch_set_column_projection) (struct IndexFetchTableData *data, Bitmapset *project_columns); The set of columns expected by the new/modified functions is represented as a Bitmapset of attnums for a specific base relation. An empty/NULL bitmap signals to the AM that no data columns are needed. A bitmap containing the single element 0 indicates that we want all data columns to be fetched. The bitmaps do not include system columns. Additionally, the TupleTableSlots populated by functions such as table_scan_getnextslot(), need to be densely filled upto the highest numbered column in the projection list (any column not in the projection list should be populated with NULL). This is due to the implicit assumptions of the slot_get_***() APIs. ------------------------------------------------------------------------ TODOs: - Explore opportunities to push the column extraction logic to the planner or pre-planner stages from the executor stage (like scanCols and returningCols), or at least elevate the column extraction logic to be done once per executor run instead of once per tuple. - As was requested in [1], we should guard column projection set extraction logic with a table_scans_leverage_column_projection() call. We wouldn't want a non-columnar AM to incur the overhead. - Standardize the table AM API for passing columns. - The optimization for DELETE RETURNING does not currently work for views. We have to populate the list of columns for the base relation beneath the view properly. - Currently the benefit of passing in an empty projection set for ON CONFLICT DO UPDATE (UPSERT) and ON CONFLICT DO NOTHING (see ExecCheckTIDVisible()) is masked by a preceding call to check_exclusion_or_unique_constraint() which has not yet been modified to pass a column projection list to the index scan. - Compute scanCols earlier than set_base_rel_sizes() and use that information to produce better relation size estimates (relation size will depend on the number of columns projected) in the planner. Essentially, we need to absorb the work done by Pengzhou [2]. - Right now, we do not extract a set of columns for the call to table_tuple_lock() within GetTupleForTrigger() as it may be hard to determine the list of columns used in a trigger body [3]. - validateForeignKeyConstraint() should only need to fetch the foreign key column. - List of index scan callsites that will benefit from calling index_fetch_set_column_projection(): -- table_index_fetch_tuple_check() does not need to fetch any columns (we have to pass an empty column bitmap), fetching the tid should be enough. -- unique_key_recheck() performs a liveness check for which we do not need to fetch any columns (we have to pass an empty column bitmap) -- check_exclusion_or_unique_constraint() needs to only fetch the columns that are part of the exclusion or unique constraint. -- IndexNextWithReorder() needs to only fetch columns being projected along with columns in the index qual and columns in the ORDER BY clause. -- get_actual_variable_endpoint() only performs visibility checks, so we don't need to fetch any columns (we have to pass an empty column projection bitmap) - BitmapHeapScans can benefit from a column projection list the same way as an IndexScan and SeqScan can. We can possibly pass down scanCols in ExecInitBitmapHeapScan(). We would have to modify the BitmapHeapScan table AM calls to take a column projection bitmap. - There may be more callsites where we can pass a column projection list. Regards, Soumyadeep & Jacob [0] https://www.postgresql.org/message-id/CAE-ML%2B-HwY4X4uTzBesLhOotHF7rUvP2Ur-rvEpqz2PUgK4K3g%40mail.gmail.com [1] https://www.postgresql.org/message-id/flat/CAAKRu_Yj%3DQ_ZxiGX%2BpgstNWMbUJApEJX-imvAEwryCk5SLUebg%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAG4reAQc9vYdmQXh%3D1D789x8XJ%3DgEkV%2BE%2BfT9%2Bs9tOWDXX3L9Q%40mail.gmail.com [3] https://www.postgresql.org/message-id/23194.1560618101%40sss.pgh.pa.us
Attachment
pgsql-hackers by date: