It is a common requirement to export the data in Elasticsearch for users in a common format such as .csv. An example of this is exporting syslog data for audits. The easiest way to complete this task I have found is to use python as the language is accessible and the Elasticsearch packages are very well implemented.
In this post we will be adapting the full script found here.
To be able to test this script, we will need:
Assuming that your Elasticsearch cluster is ready, lets seed the data in Kibana by running:
POST logs/_doc
{
"host": "172.16.6.38",
"@timestamp": "2020-04-10T01:03:46.184Z",
"message": "this is a test log"
}
This will add a log in the "logs" index with what is commonly ingested via logstash using the syslog input plugin.
Now lets adapt the script by filling in our details for lines 7-13
By default the script will match all documents in the index however if you would like to adapt the query you can edit the query block.
Note: By default the script will also sort by the field "@timestamp" descending however you may want to change the sort for your data
Here is the tricky python part! You need to loop through your result and customize how you want to write your data-out. As .csv format uses commas (new column) and new line values (\n) to format the document the default document includes some basic formatting.
1.The output written to the file, each comma is a new column so the written message will look like the following for each hit returned:
column 1 | column 2 | column 3 |
---|---|---|
result._source.host | result._source.@timestamp | result._source.message |
2. Note that when there is a failure to write to the file, it will write the message to a array to print back.
3. At the end of the script, all the failed messages will be re-printed to the user
Looking at your directory you will see a output.csv now and the contents will look in excel like:
Logs are usually sent via UDP traffic and most commonly available as a syslog message:
These logs in many products (especially SIEM – Security information and event management) have their sources identified using the source-IP of the packet instead of the content of the message (Often the content of the log does not contain source identification information, this is a result of poor logging design), as such in more complex topologies with log caching or staged log propagation, the source of the logs cannot differentiated by the final system.
This may not matter in a environment where the cache is only caching logs from a single device, however if the cache is centralizing logs from multiple sources, it makes it impossible for the SIEM to differentiate device sources for the logs impacting the functionality available.
This has been a ticket since 04/05/2017.
Elasticsearch is commonly used as a centralization point for logs due to its high ingestion capability as well as the extensive libraries of plugins that can be used for collecting, transforming and forwarding almost all types of data. This is especially important for Service Providers who may have a responsibility such as to both store a copy of logs as well as send a copy to customers for their usage in their own SIEM solutions.
The default UDP Logstash output plugin does not allow for spoofing the source device (IP Address, Port and MAC). I have written a new Logstash output to enable this behavior. The plugin is able to, on every individual message spoof the Source IP, Source Port and Source MAC Address of the packet. To use the plugin you need to specify additional information about the target device:
The plugin uses the jnetpcap library and therefore requires a number of pre-requisites on the host to be completed:
It is possible to run the library on different operating systems, I have tested on Ubuntu 18.04. For instructions on how to run it on other operating systems, there are notes in the Release Notes of the library.
After deploying a new Ubuntu Server with the default Logstash installation, complete the following steps:
wget -O jnetpcap-1.4.r1425 https://downloads.sourceforge.net/project/jnetpcap/jnetpcap/Latest/jnetpcap-1.4.r1425-1.linux64.x86_64.tgz
tar -xvf jnetpcap-1.4.r1425
cp jnetpcap-1.4.r1425/libjnetpcap.so /lib/
sudo apt-get install libpcap-dev
Note: Using Centos, the package is only available via the RHEL optional channel.
Note: If you are running logstash as a service, the default permissions for the logstash user are not sufficient, run the service as root (If anyone knows the exact permissions to harden please DM me).
This can be done by editing /etc/systemd/system/logstash.service if you are using systemctl.
You can download the source code and build the code yourself. Alternatively you can download the gem directly from here.
cd /usr/share/logstash
./bin/logstash-plugin install --no-verify <path-to-gem>/logstash-output-spoof-0.1.0.gem
vi /usr/share/logstash/test.conf
Note: Remember to replace the values marked to be replaced
input {
generator { message => "Hello world!" count => 1 }
}
filter {
mutate {
add_field => {
"extra_field" => "this is the test field"
"src_host" => "3.3.3.3"
}
update => {"message" => "this should be the new message"}
}
}
output {
spoof {
dest_host => "<REPLACE WITH YOUR DESTINATION IP>"
dest_port => "<REPLACE WITH YOUR DESTINATION PORT>"
src_host => "%{src_host}"
src_port => "2222"
dest_mac => "<REPLACE WITH YOUR DESTINATION MAC ADDRESS>"
src_mac => "<REPLACE WITH YOUR MAC ADDRESS>"
message => "%{message}"
interface => "ens32"
}
}
sudo tcpdump -A -i any src 3.3.3.3 -v
Note: You may need to install tcpdump
./bin/logstash -f test.conf
Note: Be patient, Logstash is very slow to start up
On the target system you are capturing traffic from you should see the source of the packet is coming from 3.3.3.3! Congratulations on spoofing your first message.
In this post I have demonstrated how you can use the new Logstash Plugin to spoof traffic, as this can be done using event based data this plugin can be used to support many exotic deployment topologies that are SIEM compliant.
Using this plugin, hopefully you can support complex log forwarding topologies regardless of what technologies the end device uses.
When running Logstash in large scale environments it can be quite difficult to troubleshoot performance specifically when dealing with UDP packets.
The issue could occur at multiple layers, in order of dependent layers of concern:
The following steps assume installation of Logstash on a Linux machine (CentOS 7.4) but similar steps can be used for other machines.
Issue: Communication issues from source
Diagnose:
tcpdump -i ens160 udp
telnet 10.10.10.4 514
Fixes:
Issue: Dropped UDP Packets
Diagnose:
watch netstat -s --udp
A good read on how to view the results of this command can be found hereFixes:
If there is packet loss, check the CPU of the nodes the Logstash is pointed at (should be hot).
Commercial Only: Check the pipeline via monitoring to verify where there is a high processing time.
Issue: Logstash keeps restarting
Diagnose:
journalctl -u logstash.service
Fix:
Issue: Pipeline is not passing logs to Elasticsearch
Diagnose:
Fix
Elasticsearch is a fantastic tool for logging as it allows for logs to be viewed as just another time-series piece of data. This is important for any organizations’ journey through the evolution of data.
This evolution can be outlined as the following:
Data that is not purposely collected for this journey will simply be bits wondering through the abyss of computing purgatory without a meaningful destiny! In this article we will be discussing using Docker to scale out your Logstash deployment.
If you have ever used Logstash (LS) to push logs to Elasticsearch (ES) here are a number of different challenges you may encounter:
When looking at solutions, the approach I take is:
Using Docker, a generic infrastructure can be deployed due to the abstraction of containers and underlying OS (Besides the difference between Windows and Linux hosts).
Docker solves the challenges inherent in the LS deployment:
I.e. Let’s say you have 1M logs required to be logged per day, and have a requirement to have 3 virtual machines for a maximum of 1 virtual machine loss.
Why not deploy straight onto OS 3 Logstashes sized at 4 CPU and 8 GB RAM?
Let’s take a look at how this architecture looks,
When a node goes down the resulting environment looks like:
A added bonus to this deployment is if you wanted to ship logs from Logstash to Elasticsearch for central and real-time monitoring of the logs its as simple as adding Filebeats in the docker-compose.
What does the docker-compose look like?
version: '3.3'
As with most good things, there is a caveat. With Docker you add another layer of complexity however I would argue that as the docker images for Logstash are managed and maintained by Elasticsearch, it reduces the implementation headaches.
In saying this I found one big issue with routing UDP traffic within Docker.
This issue will cause you to lose a proportional number of logs after container re-deployments!!!
Disclaimer: This article only represents my personal opinion and should not be considered professional advice. Healthy dose of skeptism is recommended.
]]>If you are currently developing using C# (Particularly .NET Core 2.0+) here are some shortcuts I hope will be able to save you time I wish I could have back.
There is official documentation for C# Elasticsearch development however I found the examples to be quite lacking. I do recommend going through the documentation anyway especially for the NEST client as it is essential to understand Elasticsearch with C#.
“The low level client,
ElasticLowLevelClient
, is a low level, dependency free client that has no opinions about how you build and represent your requests and responses.”
Unfortunately the low level client in particular has very sparse documentation especially examples. The following was discovered through googling and painstaking testing.
JObjects are quite popular way to work with JSON objects in .NET, as such it may be required to parse JObjects to Elasticsearch, this may be a result of one of the following:
The JObject cannot be used as the generic for indexing as you will receive this error:
Instead use “BytesResponse” as the <T> Class
The examples given by the Elasticsearch documentation does not give an example of a bool query using the low-level client. Why is the “Bool” query particularly difficult? Using Query DSL in C#, “bool” will automatically resolve to the class and therefore will throw a error:
Not very anonymous type friendly… the solution to this one is quite simple, add a ‘@’ character in-front of the bool.
This one seems a-bit obvious but if you want to define an array for use with DSL, use the anonymous typed Array (Example can be seen in figure 4) new Object[]
.
Nested fields in Elasticsearch are stored as a full path, .
delimited string. This creates a problem when trying to query that field specifically as it creates a invalid type for anonymous types.
The solution is to define a Dictionary and use the dictionary in the anonymous type.
The Dictionary can be passed by the anonymous type and will successfully query the Nested field in Elasticsearch.
“The high level client,
ElasticClient
, provides a strongly typed query DSL that maps one-to-one with the Elasticsearch query DSL.”
The NEST documentation is much more comprehensive, the only issue I found was using keyword Term searches.
All string fields are mapped by default to both text and keyword, the documentation can be found here. Issue is that in the strong typed object used in the Elastic Mapping there is no “.keyword” field to reference therefore a error is thrown.
Example:
For the Object:
public class SampleObject
{
public string TextField { get; set; }
}
Searching would look like this
Unfortunately the .Keyword field does not exist, the solution is using the .Suffix function using property name inference. This is documented in the docs however it is not immediately apparent that is how you access “keyword”.
I hope this post was helpful and saved you some time. If you have any tips of your own please comment below!
]]>