Troubleshooting ELK Syslog Performance

Summary

When running Logstash in large scale environments it can be quite difficult to troubleshoot performance specifically when dealing with UDP packets.

The issue could occur at multiple layers, in order of dependent layers of concern:

  • Infrastructure
  • Logstash Application
  • Pipeline

The following steps assume installation of Logstash on a Linux machine (CentOS 7.4) but similar steps can be used for other machines.

1. Troubleshooting Infrastructure

Issue: Communication issues from source

Diagnose:

  1. Dump all packets on a protocol and port (Run on OS with Logstash) to check whether you are receiving data:
    tcpdump -i ens160 udp
    
  2. If it is TCP traffic that is being troubleshooted, you can telnet the port from the source to destination to determine the issue. Example below is run from the source to the destination to diagnose traffic flow to port 514 on Logstash with ip 10.10.10.4.
    telnet 10.10.10.4 514
    

Fixes:

  1. Check all interim networking devices (Firewalls, load-balancers, switches etc.) and ensure at every leg the traffic is getting through.

Issue: Dropped UDP Packets

Diagnose:

  1. View the network statistics (Run on OS with Logstash) to check whether your operating system is dropping packets.
    watch netstat -s --udp
    
    A good read on how to view the results of this command can be found here

Fixes:

  1. If there is packet loss, check the CPU of the nodes the Logstash is pointed at (should be hot).

  2. Commercial Only: Check the pipeline via monitoring to verify where there is a high processing time.

2. Troubleshooting Logstash Application

Issue: Logstash keeps restarting

Diagnose:

  1. Print the journal of the service to see the errors
journalctl -u logstash.service
  1. Cat logs stored at /var/log/logstash/~

Fix:

  1. The application maybe trying to listen on port 514 with insufficient permission, you can use iptables to forward the traffic to a privileged port. Discussion can be found here.
  2. Commercial Only (X-Pack security): The application maybe failing to connect to the Elasticsearch nodes due to incorrect certificate, check that the assigned CA is correct.

3. Troubleshooting Pipelines

Issue: Pipeline is not passing logs to Elasticsearch

Diagnose:

  1. Cat logs stored at /var/log/logstash/~
  2. Review the pipeline to ensure output is using Elasticsearch output plugin, add a stdout output to ensure logs are reaching end of pipeline
  3. Check the inputs to ensure the right port is binded

Fix

  1. Instead of using the syslog input, swap to the tcp/udp input to diagnose whether it is the input plugin
  2. Check all drop() commands in the filters



Categories: Devops, Elasticsearch

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: