We’ve got an auto-scaling EC2 setup, so servers can appear and disappear at will, each with random IP addresses. We want to store the apache error/access logs (or system logs) from these servers for later debugging and analysis, but nothing else on them matters (they’re basically read only, except the logging). How should we store these?
Seems like a simple problem, there must be a simple solution? Here are the existing options:
- syslog-ng or rsyslogd logging to a central location.
- Pros: centralized logs, easy to analyse, near to real time
- Cons: Requires extra dedicated server (costs). Have to extend storage or upload to S3 periodically. PITA setup from a security POV unless you know the IP addresses in advance (you can set up your own CA and generate a number of SSL keys and distribute these to the servers on startup and use TLS encrypted communications, but I really CBA with all that hassle - not to mention that last time I checked Amazon Linux didn’t support these servers OOTB so I’d have to install them from a CentOS RPM or similar)
- Message queuing
- (I don’t think this is a perfect match)
- Hadoop cluster
- Pros: awesome data crunching ability
- Cons: we don’t have 10million users yet, I think this is a bit heavy handed (not to mention expensive)
- Scribe from Facebook
- Cons: too much setup, requires specific server heirarchy.
- logg.ly and splunk
- Cons: too expensive/untested/unknown for now
- Something else…
I chose “something else” - a custom solution using logrotate to upload the logs to S3 on a per server basis. Here’s how I did it (and how you can do it too) You’ll need:
/usr/local/bin/s3uploadfile
download - a script that uploads a file to S3…/etc/init.d/s3logrotate
download - an init script to triggerlogrotate
to be called (twice) on server shutdown/usr/local/bin/s3logrotated
download - a callback script to call when a log has been rotated (process the logrotate message and callss3uploadfile
to upload the log to S3)/root/.s3env
- a simple bash script to export your s3 credentials fors3uploadfile
to work
1 2 3 4 |
|
Step 1: Create your dedicated server log S3 bucket (if you haven’t already)
Step 2: Download and install the scripts above to the
locations above, including /root/.s3env
and then make them all executable (chmod +x [file]
)
Step 3: Modify the relevant logrotate scripts to use s3logrotated
callback and add s3logrotate
to the server rc*.d
scripts. When
configuring logrotate
: don’t use dateext
, do use compress
, do use
delaycompress
, do use sharedscripts
. Here’s what I use:
1 2 3 4 5 6 7 |
|
Note that the ".2.gz"
is because of compress
and delaycompress
-
it is the first compressed log file. This is why we need
/etc/init.d/s3logrotate
to run “logrotate -f
” twice on server
shutdown.
Step 4: Analytics I suppose, but I leave that as an exercise for the reader. I hope this helps someone - if it does, let me know in the comments. If you want any help setting it up (or understanding how it works) then just shoot me an email or leave me a comment.