Re: COPY-able csv log outputs - Mailing list pgsql-patches
From | Andrew Dunstan |
---|---|
Subject | Re: COPY-able csv log outputs |
Date | |
Msg-id | 4651108D.9070806@dunslane.net Whole thread Raw |
In response to | Re: COPY-able csv log outputs (Greg Smith <gsmith@gregsmith.com>) |
Responses |
Re: COPY-able csv log outputs
Re: COPY-able csv log outputs |
List | pgsql-patches |
Greg Smith wrote: > I got a chance to review this patch over the weekend. Basic API seems > good, met all my requirements, no surprises with how the GUC variable > controlled the feature. > > The most fundamental issue I have with the interface is that using > COPY makes it difficult to put any unique index on the resulting > table. I like to have a unique index on my imported log table because > it rejects the dupe records if you accidentally import the same > section of log file twice. COPY tosses the whole thing if there's an > index violation, which is a problem during a regular import because > you will occasionally come across lines with the same timestamp that > are similar in every way except for their statment; putting an index > on the timestamp+statement seems impractical. Does the format not include the per-process line number? (I know i briefly looked at this patch previously, but I forget the details.) One reason I originally included line numbers in log_line-prefix was to handle this sort of problem. > > I've had a preference for INSERT from the beginning here that this > reinforces. COPY is our standard bulk insert mechanism. I think arguing against it would be a very hard sell. > I'm planning to just work around this issue by doing the COPY into a > temporary table and then INSERTing from there. I didn't want to just > let the concern pass by without mentioning it though. It crosses my > mind that inserting some sort of unique log file line ID number would > prevent the dupe issue and make for better ordering (it's possible to > have two lines with the same timestamp show up in the wrong order > now), not sure that's a practical idea to consider. I guess that answers my question. We should definitely provide a unique line key. > > The basic coding of the patch seemed OK to me, but someone who is much > more familiar than myself with the mechanics of pipes should take a > look at that part of the patch before committing; it's complicated > code and I can't comment on it. There are some small formatting > issues that need to be fixed, particularly in the host+port mapping. > I can fix those myself and submit a slightly updated patch. There's > some documentation improvements I want to make before this goes in as > well. > > The patch is actually broken fairly hard right now because of the > switch from INSERT to COPY FROM CSV as the output format at the last > minute. It outputs missing fields as NULL (fine for INSERT) that > chokes the CSV import when the session_start timestamp is missing. > All of those NULL values need to be just replaced with nothing for > proper CSV syntax; there should just the comma for the next field. I > worked around this with > > copy pglog from '/opt/pgsql/testlog.csv' with CSV null as 'NULL'; > > I missed that before - yes it should be fixed. cheers andrew
pgsql-patches by date: