Log collectors — How to set them up - Business Process Automation und Orchestration

Requirements

All the log collectors tried below will be using Elasticsearch as the backend system and Kibana as it’s web interface.

Installed Elasticsearch and Kibana
Elasticsearch as a storage for the logs and Kibana as the Elasticsearch user interface. We’re going to need them as they come so just download from the official website.

Let’s just run them both with the following commands. (*nix/macos)

$ ./bin/elasticsearch
$ ./bin/kibana

Fluentd

This tool comes with a service that needs to be installed in the system; td-agent.

td-agent is a tool that collects the logs and conveys them to a storage system, in this case Elasticsearch. I was able to set it up and use as inputs the system log and a rest API that fluent listens to.

To make it work with Elasticsearch I had to install a plugin used to connect the two tools together. Fluentd installation guide is very straightforward.

I can send logs to Fluentd and visualize them in Elasticsearch in the following ways:

$ curl -X POST -d ‘json={“message”:”message2", “ident”: “rest”}’ http://localhost:8888/td.rest.test

the configuration for this listener in the Fluentd is done as follows in
/etc/td-agent/td-agent.conf

An input is configured:

# HTTP input
# POST http://localhost:8888/<tag>?json=<json>
# POST http://localhost:8888/td.rest.login?json={"user"%3A"me"}
# @see http://docs.fluentd.org/articles/in_http
<source>
  @type http
  @id input_http
  port 8888
</source># live debugging agent
<source>
  @type debug_agent
  @id input_debug_agent
  bind 127.0.0.1
  port 24230
</source>

And then a match which is used to send data to Elasticsearch

<match td.rest.*>
  @type elasticsearch
  logstash_format true
  <buffer>
    flush_interval 10s # for testing
  </buffer>
</match>

To send a system log to Fluentd just run:

$ logger -t test foobar22

Also in this case there is an input section and a match section in the configuration file.

#get logs from syslog
<source>
  @type syslog
  port 42185
  tag syslog
</source><source>
  @type forward
</source><match syslog.**>
  @type elasticsearch
  logstash_format true
  <buffer>
    flush_interval 10s # for testing
  </buffer>
</match>

With Fluentd filters can be used to transform the logs in input before saving them. There are not a lot of predefined filters but new plugins can be developed to extend Fluentd.

Output sources:

For integration with Elasticsearch

$ sudo /usr/sbin/td-agent-gem install fluent-plugin-elasticsearch — no-document

Bottomline

Documentation: PoorCommunity: Poor
Distributed extensions
Personally had some issues setting it up; not straightforward.
fluentd-ui helps a bit with the setup.

Graylog

Prerequisites:

Java (>= 8)
MongoDB (>= 2.4)
Elasticsearch (>= 2.x)

Installation is somewhat tricky by just following the documentaion. A few missing steps in the documentation led to erros when trying to set up the system.

It was necessary to insert a password secret and a sha2 hashed password for the root admin in the configuration file.

$ systemctl start graylog-server

Configuration file: /etc/graylog/server/server.conf
Web UI reachable from the browser at: http://localhost:9000

This solution needs Mongodb and Elasticsearch as specified also above but it doesn’t need Kibana to visualise the data, because it offers it’s own UI.

From the web interface it’s possible to fully manage the Graylog system.

Manage input sources
Manage extractors (methods to transform, show and extract messages)
Configure alerts

All configurations are done directly through the web interface; input sources, transformations, outputs, alerts and so on. It is a full- stack environment.

Pipelines can be created to build a chain of transformations to log messages.

Plugins can be written to customly handle

Inputs: Accept/write any messages into Graylog
Outputs: Forward ingested messages to other systems as they are processed
Services: Run at startup and able to implement any functionalities
Alert Conditions: Decide whether an alert will be triggered depending on a condition
Alert Notifications: Called when a stream alert condition has been triggered
Processors: Transform/drop incoming messages (can create multiple new messages)
Filters: (Deprecated) Transform/drop incoming messages during processing
REST API Resources: An HTTP resource exposed as part of the Graylog REST API
Periodical: Called at periodical intervals during server runtime
Decorators: Used during search time to modify the presentation of messages
Authentication Realms: Allowing to implement different authentication mechanisms (like single sign-on or 2FA)

Nice overall impression. It comes with many tools out of the box but has to be configured.

Bottomline

Documentation: Tricky
Overall impression: fully fledged
Offers a great tool for managing logs. In fact is considered a log manager rather than a log collector

Logstash

Installation and setup is easy by following through the documentation provided online.

Can be integrated with different input sources out of the box and accepts complex filters to manage and extract data from messages. Grok patterns can be used, like in Graylog, to extract data from log messages. Messages are sent to Elasticsearch and visualized with Kibana.

Easy to configure but doesn’t offer a UI tool for the configuration but a config file (or set of files).

$ ./bin/logstash -f config/logstash.conf

{put here own logstash configuration … }

Bottomline

Documentation: Tricky
Considered one of the two most popular options (the other being fluentd)
Beautifully integrated with elasticsearch, kibana and elastic beats
Overall impression: works pretty nice especially with filebeat
Must enable the option to have a persistent layer of events because the fixed size of the event queue that can handle only 20 events might not be sufficient.

Beats

Filebeat, Heartbeat, Metricbeat, Auditbeat

Filebeat

Is a nice system that reads from many different sources at the same time and feeds one or more outputs. I tried it with Logstash, made it read all the log files in

– /var/log/*.log

and it seems to handle the stream very well.

Run

$ sudo filebeat -e -c /etc/filebeat/filebeat.yml

Outputting the FileBeat stream directly to Elasticsearch works as well but the data, although already a bit structured, cannot be transformed and extracted as it can through Logstash Filters.

Filebeat ensures at-least-once-delivery for messages to the defined output source.

Documentation: Good

Heartbeat

It’s an elastic beat exclusively designed to monitor uptime. It continuously pings the configured web resources through http. The preconfigured setup monitors a local Elasticsearch installation at http://127.0.0.1:9200.

$ sudo heartbeat -e -c /etc/heartbeat/heartbeat.yml

Configuration: Easy
Documentation: Trivial

Metricbeat

Collect metrics from your systems and services. From CPU to memory, Redis to NGINX, and much more, Metricbeat is a lightweight way to send system and service statistics. Kind of the same reporting/monitoring that Icinga does. We’ll get to it in a bit.

$ sudo metricbeat -e -c /etc/metricbeat/metricbeat.yml

Configuration: Easy
Documentation: Trivial

Auditbeat

Collecting audit Linux data and monitoring file integrity through the system. Comes along with Kibana predefined dashboards as do other beats.

To install the available Kibana dashboards, ensure that Kibana is up and running and then run the following:

$ sudo auditbeat setup -e [-c /etc/auditbeat/auditbeat.yml]

Same command to install the available dashboards is available for all beats. Generally every Beats comes with predefined dashboards that visualise log data beautifully.

To run Auditbeat:

$ sudo auditbeat -e -c /etc/auditbeat/auditbeat.yml

Bottomline

Configuration: Easy
Documentation: Trivial
Beautiful integration with kibana and the predefined dashboards and visualisations are awesome
Only downside is that I didn’t figure out how to use the predefined dashboards if I make the logs go through logstash first

Icinga2

Icinga 2 is an open source monitoring system which checks the availability of your network resources, notifies users of outages and generates performance data for reporting.

Prerequisites:

$ apt-get install monitoring-plugins

Installation is easy. It’s just enough to add the package repository for it and then install the icinga2 package. It’s not that immediate to make it work right away because there are some external plugins that need to be installed as well.

Icinga is used to check Hosts and Services so it’s a service monitoring system.

Configuration seems simple but there are too manz files and the documentations desn’t explain completely how to set up the environment.

Offers an API interface for configuration but no UI out of the box.
It can be integrated with Elasticsearch / Logstash through an ElasticBeets component.

The UI can be installed but needs:

mysql
apache2 (web server)
php

It can be reached at: http://localhost/icingaweb2
It has a lot of configuration files and a lot of checkers.

Bottomline

Documentation: Somewhat incomplete
Fork from nagios. The out-of-the-box configuration gives a really great system monitoring service but it?s not a log collector. It’s a monitoring system that can be used to check and keep under surveillance the state of different network objects

I also created a simple Java application and a simple nodejs application. Both applications are just producing logs of different levels continuously.

run with gradle clean run

In the java app I’ve configured with logback different appenders:

Log file
Log file with logs formatted in JSON
UDP Gelf formatted logs sent to Graylog
Logs sent to Logstash

I configured Logstash to read the log files. I made the first one pass through some filters using a GROK pattern I defined. For the second one I just defined it as a JSON type of log. Logstash manages to identify the fields by itself this way. Just to mix things up, the Graylog server stores log messages on my Elasticsearch instance.

So basically, from the same logs produced by my java application and outputted in 4 different ways, I end up with 4 indexes on Elasticsearch that have the same data. This to compare the differences of the various methods.

Opinion first: JSON formatted logs where better structured. GROK parsing eventually ends up in failures every now and again.

I created a dashboard displaying the same data for the four indexes put together. Basically the charts I created shows the count for each error level logged.

enisinanaj/logging-app-java

Because of the GROK failures some were missing in the plain text log file case. In the others was easier to play around.

Eni Sinanaj
JIT Senior Software Engineer

Photo by Stephen Dawson on Unsplash