Re: PG vs ElasticSearch for Logs - Mailing list pgsql-general
From | Andy Colson |
---|---|
Subject | Re: PG vs ElasticSearch for Logs |
Date | |
Msg-id | 054bf95c-93ad-3ae0-2ad3-9652ed81a3af@squeakycode.net Whole thread Raw |
In response to | Re: PG vs ElasticSearch for Logs (Thomas Güttler <guettliml@thomas-guettler.de>) |
Responses |
Re: PG vs ElasticSearch for Logs
|
List | pgsql-general |
On 8/22/2016 2:39 AM, Thomas Güttler wrote: > > > Am 19.08.2016 um 19:59 schrieb Andy Colson: >> On 8/19/2016 2:32 AM, Thomas Güttler wrote: >>> I want to store logs in a simple table. >>> >>> Here my columns: >>> >>> Primary-key (auto generated) >>> timestamp >>> host >>> service-on-host >>> loglevel >>> msg >>> json (optional) >>> >>> I am unsure which DB to choose: Postgres, ElasticSearch or ...? >>> >>> We don't have high traffic. About 200k rows per day. >>> >>> My heart beats for postgres. We use it since several years. >>> >>> On the other hand, the sentence "Don't store logs in a DB" is >>> somewhere in my head..... >>> >>> What do you think? >>> >>> >>> >> >> I played with ElasticSearch a little, mostly because I wanted to use >> Kibana which looks really pretty. I dumped a ton >> of logs into it, and made a pretty dashboard ... but in the end it >> didn't really help me, and wasn't that useful. My >> problem is, I don't want to have to go look at it. If something goes >> bad, then I want an email alert, at which point >> I'm going to go run top, and tail the logs. >> >> Another problem I had with kibana/ES is the syntax to search stuff is >> different than I'm used to. It made it hard to >> find stuff in kibana. >> >> Right now, I have a perl script that reads apache logs and fires off >> updates into PG to keep stats. But its an hourly >> summary, which the website turns around and queries the stats to show >> pretty usage graphs. > > You use Perl to read apache logs. Does this work? > > Forwarding logs reliably is not easy. Logs are streams, files in unix > are not streams. Sooner or later > the files get rotated. RELP exists, but AFAIK it's usage is not wide > spread: > > https://en.wikipedia.org/wiki/Reliable_Event_Logging_Protocol > > Let's see how to get the logs into postgres .... > >> In the end, PG or ES, all depends on what you want. > > Most of my logs start from a http request. I want a unique id per request > in every log line which gets created. This way I can trace the request, > even if its impact spans to several hosts and systems which do not > receive http requests. > > Regards, > Thomas Güttler > > I don't read the file. In apache.conf: # v, countyia, ip, sess, ts, url, query, status LogFormat "3,%{countyName}e,%a,%{VCSID}C,%{%Y-%m-%dT%H:%M:%S%z}t,\"%U\",\"%q\",%>s" csv3 CustomLog "|/usr/local/bin/statSender.pl -r 127.0.0.1" csv3 I think I read somewhere that if you pipe to a script (like above) and you dont read fast enough, it could slow apache down. That's why the script above dumps do redis first. That way I can move processes around, restart the database, etc, etc, and not break apache in any way. The important part of the script: while (my $x = <>) { chomp($x); next unless ($x); try_again: if ($redis) { eval { $redis->lpush($qname, $x); }; if ($@) { $redis = redis_connect(); goto try_again; } # just silence this one eval { $redis->ltrim($qname, 0, 1000); }; } } Any other machine, or even multiple, then reads from redis and inserts into PG. You can see, in my script, I trim the queue to 1000 items, but that's because I'm not as worried about loosing results. Your setup would probably be different. I also setup redis to not save anything to disk, again, because I don't mind if I loose a few hits here or there. But you get the idea. -Andy
pgsql-general by date: