Summary
When running Logstash in large scale environments it can be quite difficult to troubleshoot performance specifically when dealing with UDP packets.
The issue could occur at multiple layers, in order of dependent layers of concern:
- Infrastructure
- Logstash Application
- Pipeline
The following steps assume installation of Logstash on a Linux machine (CentOS 7.4) but similar steps can be used for other machines.
1. Troubleshooting Infrastructure
Issue: Communication issues from source
Diagnose:
- Dump all packets on a protocol and port (Run on OS with Logstash) to check whether you are receiving data:
tcpdump -i ens160 udp
- If it is TCP traffic that is being troubleshooted, you can telnet the port from the source to destination to determine the issue. Example below is run from the source to the destination to diagnose traffic flow to port 514 on Logstash with ip 10.10.10.4.
telnet 10.10.10.4 514
Fixes:
- Check all interim networking devices (Firewalls, load-balancers, switches etc.) and ensure at every leg the traffic is getting through.
Issue: Dropped UDP Packets
Diagnose:
- View the network statistics (Run on OS with Logstash) to check whether your operating system is dropping packets.
A good read on how to view the results of this command can be found herewatch netstat -s --udp
Fixes:
-
If there is packet loss, check the CPU of the nodes the Logstash is pointed at (should be hot).
-
Commercial Only: Check the pipeline via monitoring to verify where there is a high processing time.
2. Troubleshooting Logstash Application
Issue: Logstash keeps restarting
Diagnose:
- Print the journal of the service to see the errors
journalctl -u logstash.service
- Cat logs stored at /var/log/logstash/~
Fix:
- The application maybe trying to listen on port 514 with insufficient permission, you can use iptables to forward the traffic to a privileged port. Discussion can be found here.
- Commercial Only (X-Pack security): The application maybe failing to connect to the Elasticsearch nodes due to incorrect certificate, check that the assigned CA is correct.
3. Troubleshooting Pipelines
Issue: Pipeline is not passing logs to Elasticsearch
Diagnose:
- Cat logs stored at /var/log/logstash/~
- Review the pipeline to ensure output is using Elasticsearch output plugin, add a stdout output to ensure logs are reaching end of pipeline
- Check the inputs to ensure the right port is binded
Fix
- Instead of using the syslog input, swap to the tcp/udp input to diagnose whether it is the input plugin
- Check all drop() commands in the filters