Re: PATCH: pgbench - merging transaction logs - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: PATCH: pgbench - merging transaction logs |
Date | |
Msg-id | 55060DF8.9050600@2ndquadrant.com Whole thread Raw |
In response to | Re: PATCH: pgbench - merging transaction logs (Fabien COELHO <coelho@cri.ensmp.fr>) |
Responses |
Re: PATCH: pgbench - merging transaction logs
|
List | pgsql-hackers |
On 15.3.2015 20:35, Fabien COELHO wrote: > >> Firstly, the fact that pgbench produces one file per thread is awkward. > > I agree, but I think it is due to the multi process thread emulation: if > you have real threads, you can do a simple fprintf, possibly with some > mutex, and you're done. There is really nothing to do to implement this > feature. My fear was that this either requires explicit locking, or some implicit locking - for example while fprintf() in glibc is thread-safe, I really am not sure about Win32 for example. This might influence the results for very short transactions - I haven't tried that, though, so this might be a false assumption. The way I implemented the merge is immute to this. >> Secondly, a separate tool (even if provided with pgbench) would >> require passing detailed info about the log format - whether it's >> aggregated or not, throttling or not, .... > > Hmmm. > >>> It could belong to pgbench if pgbench was *generating* the merged >>> log file directly. [...] >> >> I considered this approach, i.e. adding another 'collector' thread >> receiving results from all the other threads, but decided not to >> do that. That would require a fair amount of inter-process >> communication, locking etc. and might affect the measurements > > I agree that inter-process stuff should be avoided. This is not what > I had in mind. I was thinking of "fprintf" on the same file handler > by different threads. That still involves some sort of 'implicit' locking, no? And as I mentioned, I'm not sure fprintf() is thread-safe on all the platforms we support. >>> Another issue raised by your patch is that the log format may be >>> improved, say by providing a one-field timestamp at the beginning >>> of the line. >> >> I don't see how this is relevant to this patch? > > For the simple log format, all the parsing needed would be for the > timestamp, and the remainder would just be text to pass along, no > need to %d %f... whatever. Oh, ok. Well, that's true, but I don't think that significantly changes the overall complexity. >>> The current IO complexity is in p²n where it should be simply pn... >> >> Maybe. Implementing a 2-way merge sort was the simplest solution at >> the moment, and I don't think this really matters because the I/O >> generated by the benchmark itself is usually much higher than this. > > If you do not do a n-way merge, you could do a 2-way merge on a > binary tree so that the IO complexity would be p.log(p).n (I think), > and not p²n. Yes, I could do that. I still think this probably an overengineering, but not a big deal. I'll leave this for later, though (this patch is in 2015-06 CF anyway). >>> For aggregates, some data in the output may be special values >>> "NaN/-/...", [...] >> You mean the aggregated log? I can't think of a way to get there such >> values - can you provide a plausible scenario how that could happen? > > Possibly I'm wrong. Ok, I tried it: > > sh> ./pgbench --rate=0.5 --aggregate-interval=1 --log > sh> cat pgbench_log.12671 > 1426445236 1 5034 25341156 5034 5034 687 471969 687 687 > 1426445237 0 0 0 0 0 0 0 0 0 > 1426445238 0 0 0 0 0 0 0 0 0 > 1426445239 1 8708 75829264 8708 8708 2063 4255969 2063 2063 > ... > > Good news, I could not generate strange values. Just when there are > 0 transactions possibly some care should be taken when combining > values. Yeah, I wasn't able to come up with such scenario, but wasn't sure. > >>> It seems that system function calls are not tested for errors. >> >> That's true, but that's how the rest of pgbench code is written. > > Hmmm.... This is not entirely true, there are *some* checks if you look > carefully:-) > > if ((fd = fopen(filename, "r")) == NULL) ... > if ((fp = popen(command, "r")) == NULL) ... > nsocks = select(...) > if (nsocks < 0) ... OK, there are a few checks ;-) But none of the fprintf calls IIRC. Anyway, I plan to refactor this part of the patch to get rid of the copy'n'paste pieces, so I'll take care of this too. >>> (b) if we could get rid of the "thread emulation", pgbench could >>> generate the merged logs directly and simply, and the option could >>> be provided then. >> >> That however is not the goal of this patch. > > Sure. My point is that the amount of code you write to implement this > merge stuff is due to this feature. Without it, the patch would > probably need 10 lines of code. Moreover, the way it is implement > requires scanning and reprinting, which means more work in many > places to update the format later. Possibly. But it's not written like that :-( >> The thread emulation is there for a reason, > > My opinion is that it *was* there for a reason. Whether it makes > sense today to still have it, maintain it, and have to write such > heavy code for a trivial feature just because of it is another > matter. Possibly. I can't really decide that. >> and I certainly am not going to work on eliminating it >> (not sure that's even possible). > > I wish it will be:-) > > I would suggest this that I would support: implement this feature the > simple way (aka fprintf, maybe a mutex) when compiled with threads, and > generate an error "feature not available with process-based thread > emulation" when compiled with processes. This way we avoid a, lot of > heavy code to maintain in the future, and you still get the feature > within pgbench. There are already some things which are not the same > with thread emulation because it would have been tiring to implement for > it for very little benefit. I really dislike the features supported only in some cases (although most deployments probably support proper threading these days). -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: