04.03.2021. | Author: Eni Sinanaj

Log collectors — How to set them up


All the log collectors tried below will be using Elasticsearch as the backend system and Kibana as it’s web interface.

Installed Elasticsearch and Kibana
Elasticsearch as a storage for the logs and Kibana as the Elasticsearch user interface. We’re going to need them as they come so just download from the official website.

Let’s just run them both with the following commands. (*nix/macos)

$ ./bin/elasticsearch
$ ./bin/kibana


This tool comes with a service that needs to be installed in the system; td-agent.

td-agent is a tool that collects the logs and conveys them to a storage system, in this case Elasticsearch. I was able to set it up and use as inputs the system log and a rest API that fluent listens to.

To make it work with Elasticsearch I had to install a plugin used to connect the two tools together. Fluentd installation guide is very straightforward.

I can send logs to Fluentd and visualize them in Elasticsearch in the following ways:

$ curl -X POST -d ‘json={“message”:”message2", “ident”: “rest”}’ http://localhost:8888/

the configuration for this listener in the Fluentd is done as follows in

An input is configured:

# HTTP input
# POST http://localhost:8888/<tag>?json=<json>
# POST http://localhost:8888/{"user"%3A"me"}
# @see
@type http
@id input_http
port 8888
</source># live debugging agent
@type debug_agent
@id input_debug_agent
port 24230

And then a match which is used to send data to Elasticsearch

@type elasticsearch
logstash_format true
flush_interval 10s # for testing

To send a system log to Fluentd just run:

$ logger -t test foobar22

Also in this case there is an input section and a match section in the configuration file.

#get logs from syslog
@type syslog
port 42185
tag syslog
@type forward
</source><match syslog.**>
@type elasticsearch
logstash_format true
flush_interval 10s # for testing

With Fluentd filters can be used to transform the logs in input before saving them. There are not a lot of predefined filters but new plugins can be developed to extend Fluentd.

Output sources:

For integration with Elasticsearch

$ sudo /usr/sbin/td-agent-gem install fluent-plugin-elasticsearch — no-document


Documentation: PoorCommunity: Poor
Distributed extensions
Personally had some issues setting it up; not straightforward.
fluentd-ui helps a bit with the setup.



  • Java (>= 8)
  • MongoDB (>= 2.4)
  • Elasticsearch (>= 2.x)

Installation is somewhat tricky by just following the documentaion. A few missing steps in the documentation led to erros when trying to set up the system.

It was necessary to insert a password secret and a sha2 hashed password for the root admin in the configuration file.

$ systemctl start graylog-server

Configuration file: /etc/graylog/server/server.conf
Web UI reachable from the browser at: http://localhost:9000

This solution needs Mongodb and Elasticsearch as specified also above but it doesn’t need Kibana to visualise the data, because it offers it’s own UI.

From the web interface it’s possible to fully manage the Graylog system.

  • Manage input sources
  • Manage extractors (methods to transform, show and extract messages)
  • Configure alerts

All configurations are done directly through the web interface; input sources, transformations, outputs, alerts and so on. It is a full- stack environment.
Pipelines can be created to build a chain of transformations to log messages.

Plugins can be written to customly handle

  • Inputs: Accept/write any messages into Graylog
  • Outputs: Forward ingested messages to other systems as they are processed
  • Services: Run at startup and able to implement any functionalities
  • Alert Conditions: Decide whether an alert will be triggered depending on a condition
  • Alert Notifications: Called when a stream alert condition has been triggered
  • Processors: Transform/drop incoming messages (can create multiple new messages)
  • Filters: (Deprecated) Transform/drop incoming messages during processing
  • REST API Resources: An HTTP resource exposed as part of the Graylog REST API
  • Periodical: Called at periodical intervals during server runtime
  • Decorators: Used during search time to modify the presentation of messages
  • Authentication Realms: Allowing to implement different authentication mechanisms (like single sign-on or 2FA)

Nice overall impression. It comes with many tools out of the box but has to be configured.


Documentation: Tricky
Overall impression: fully fledged
Offers a great tool for managing logs. In fact is considered a log manager rather than a log collector


Installation and setup is easy by following through the documentation provided online.

Can be integrated with different input sources out of the box and accepts complex filters to manage and extract data from messages. Grok patterns can be used, like in Graylog, to extract data from log messages. Messages are sent to Elasticsearch and visualized with Kibana.

Easy to configure but doesn’t offer a UI tool for the configuration but a config file (or set of files).

$ ./bin/logstash -f config/logstash.conf

{put here own logstash configuration … }


Documentation: Tricky
Considered one of the two most popular options (the other being fluentd)
Beautifully integrated with elasticsearch, kibana and elastic beats
Overall impression: works pretty nice especially with filebeat
Must enable the option to have a persistent layer of events because the fixed size of the event queue that can handle only 20 events might not be sufficient.


Filebeat, Heartbeat, Metricbeat, Auditbeat


Is a nice system that reads from many different sources at the same time and feeds one or more outputs. I tried it with Logstash, made it read all the log files in

– /var/log/*.log

and it seems to handle the stream very well.


$ sudo filebeat -e -c /etc/filebeat/filebeat.yml

Outputting the FileBeat stream directly to Elasticsearch works as well but the data, although already a bit structured, cannot be transformed and extracted as it can through Logstash Filters.

Filebeat ensures at-least-once-delivery for messages to the defined output source.

Documentation: Good


It’s an elastic beat exclusively designed to monitor uptime. It continuously pings the configured web resources through http. The preconfigured setup monitors a local Elasticsearch installation at

$ sudo heartbeat -e -c /etc/heartbeat/heartbeat.yml

Configuration: Easy
Documentation: Trivial


Collect metrics from your systems and services. From CPU to memory, Redis to NGINX, and much more, Metricbeat is a lightweight way to send system and service statistics. Kind of the same reporting/monitoring that Icinga does. We’ll get to it in a bit.

$ sudo metricbeat -e -c /etc/metricbeat/metricbeat.yml

Configuration: Easy
Documentation: Trivial


Collecting audit Linux data and monitoring file integrity through the system. Comes along with Kibana predefined dashboards as do other beats.

To install the available Kibana dashboards, ensure that Kibana is up and running and then run the following:

$ sudo auditbeat setup -e [-c /etc/auditbeat/auditbeat.yml]

Same command to install the available dashboards is available for all beats. Generally every Beats comes with predefined dashboards that visualise log data beautifully.

To run Auditbeat:

$ sudo auditbeat -e -c /etc/auditbeat/auditbeat.yml


Configuration: Easy
Documentation: Trivial
Beautiful integration with kibana and the predefined dashboards and visualisations are awesome
Only downside is that I didn’t figure out how to use the predefined dashboards if I make the logs go through logstash first


Icinga 2 is an open source monitoring system which checks the availability of your network resources, notifies users of outages and generates performance data for reporting.


$ apt-get install monitoring-plugins

Installation is easy. It’s just enough to add the package repository for it and then install the icinga2 package. It’s not that immediate to make it work right away because there are some external plugins that need to be installed as well.

Icinga is used to check Hosts and Services so it’s a service monitoring system.

Configuration seems simple but there are too manz files and the documentations desn’t explain completely how to set up the environment.

Offers an API interface for configuration but no UI out of the box.
It can be integrated with Elasticsearch / Logstash through an ElasticBeets component.

The UI can be installed but needs:

  • mysql
  • apache2 (web server)
  • php

It can be reached at: http://localhost/icingaweb2
It has a lot of configuration files and a lot of checkers.


Documentation: Somewhat incomplete
Fork from nagios. The out-of-the-box configuration gives a really great system monitoring service but it?s not a log collector. It’s a monitoring system that can be used to check and keep under surveillance the state of different network objects

I also created a simple Java application and a simple nodejs application. Both applications are just producing logs of different levels continuously.

run with gradle clean run

In the java app I’ve configured with logback different appenders:

  • Log file
  • Log file with logs formatted in JSON
  • UDP Gelf formatted logs sent to Graylog
  • Logs sent to Logstash

I configured Logstash to read the log files. I made the first one pass through some filters using a GROK pattern I defined. For the second one I just defined it as a JSON type of log. Logstash manages to identify the fields by itself this way. Just to mix things up, the Graylog server stores log messages on my Elasticsearch instance.

So basically, from the same logs produced by my java application and outputted in 4 different ways, I end up with 4 indexes on Elasticsearch that have the same data. This to compare the differences of the various methods.

Opinion first: JSON formatted logs where better structured. GROK parsing eventually ends up in failures every now and again.

I created a dashboard displaying the same data for the four indexes put together. Basically the charts I created shows the count for each error level logged.


Because of the GROK failures some were missing in the plain text log file case. In the others was easier to play around.

Eni Sinanaj
JIT Senior Software Engineer

Photo by Stephen Dawson on Unsplash

More Blog articles


How to programmatically validate BPMN models

Author: Dmitrii Pisarenko

Did it happen to you? Someone makes a minor change to the BPMN model, then, when the system is deployed the next time, the deployment fails because Camunda thinks that BPMN file is invalid. (Even though all the tests are green.) There is a way to catch such errors earlier – in scope of automated […]

read more

Connect Excel and DMN

Author: Clemens Zumpf

Entering hundreds of lines into any program by hand is not the most satisfying work and a waste of time, if you consider the fact that Excel or other spreadsheet programs could reduce that time drastically. You are not able to copy and paste multiple lines into a DMN while editing it with the Camunda […]

read more

Consume REST Services in Camunda via Java Service Tasks

Author: Maximilian Kamenicky

Learn more about HOW you can call and consume REST Webservices with Camunda. In this blog series, our expert will show you 2 ways to accomplish this. This videoblog focuses on showing how to do that via Java Service Tasks. The second blog in this series will focus on how this can be achieved via […]

read more