Re: [HACKERS] WIP: [[Parallel] Shared] Hash - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: [HACKERS] WIP: [[Parallel] Shared] Hash |
Date | |
Msg-id | CAFjFpReeHV+nyroU11M98yBRB5yzcsvs0ppr5mdJv6nbewTfsg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] WIP: [[Parallel] Shared] Hash (Thomas Munro <thomas.munro@enterprisedb.com>) |
Responses |
Re: [HACKERS] WIP: [[Parallel] Shared] Hash
|
List | pgsql-hackers |
> >> However, ExecHashIncreaseNumBatches() may change the >> number of buckets; the patch does not seem to account for spaceUsed changes >> because of that. > > That's what this hunk is intended to do: > > @@ -795,6 +808,12 @@ ExecHashIncreaseNumBuckets(HashJoinTable hashtable) > TRACE_POSTGRESQL_HASH_INCREASE_BUCKETS(hashtable->nbuckets, > > hashtable->nbuckets_optimal); > > + /* account for the increase in space that will be used by buckets */ > + hashtable->spaceUsed += sizeof(HashJoinTuple) * > + (hashtable->nbuckets_optimal - hashtable->nbuckets); > + if (hashtable->spaceUsed > hashtable->spacePeak) > + hashtable->spacePeak = hashtable->spaceUsed; > + Sorry, I missed that hunk. You are right, it's getting accounted for. >> >> In ExecHashTableReset(), do we want to update spacePeak while setting >> spaceUsed. > > I figured there was no way that the new spaceUsed value could be > bigger than spacePeak, because we threw out all chunks and have just > the bucket array, and we had that number of buckets before, so > spacePeak must at least have been set to a number >= this number > either when we expanded to this many buckets, or when we created the > hashtable in the first place. Perhaps I should > Assert(hashtable->spaceUsed <= hashtable->spacePeak). That would help, better if you explain this with a comment before Assert. > >> While this patch tracks space usage more accurately, I am afraid we might be >> overdoing it; a reason why we don't track space usage accurately now. But I >> think I will leave it to be judged by someone who is more familiar with the >> code and possibly has historical perspective. > > Well it's not doing more work; it doesn't make any practical > difference whatsoever but it's technically doing less work than > master, by doing memory accounting only when acquiring a new 32KB > chunk. This patch does more work while counting the space used by buckets, I guess. AFAIU, right now, that happens only after the hash table is built completely. But that's fine. I am not worried about whether the it's less work or more. > But if by overdoing it you mean that no one really cares about > the tiny increase in accuracy so the patch on its own is a bit of a > waste of time, you're probably right. This is what I meant by overdoing; you have spelled it better. > Depending on tuple size, you > could imagine something like 64 bytes of header and unused space per > 32KB chunk that we're not accounting for, and that's only 0.2%. So I > probably wouldn't propose this refactoring just on accuracy grounds > alone. > > This refactoring is intended to pave the way for shared memory > accounting in the later patches, which would otherwise generate ugly > IPC if done for every time a tuple is allocated. I considered using > atomic add to count space per tuple, or maintaining per-backend local > subtotals and periodically summing. Then I realised that switching to > per-chunk accounting would fix the IPC problem AND be justifiable on > theoretical grounds. When we allocate a new 32KB chunk, we really are > using 32KB more of your memory. +1. Thanks for considering the comments. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
pgsql-hackers by date: