It is a common requirement to export the data in Elasticsearch for users in a common format such as .csv. An example of this is exporting syslog data for audits. The easiest way to complete this task I have found is to use python as the language is accessible and the Elasticsearch packages are very well implemented.
In this post we will be adapting the full script found here.
To be able to test this script, we will need:
Assuming that your Elasticsearch cluster is ready, lets seed the data in Kibana by running:
POST logs/_doc
{
"host": "172.16.6.38",
"@timestamp": "2020-04-10T01:03:46.184Z",
"message": "this is a test log"
}
This will add a log in the "logs" index with what is commonly ingested via logstash using the syslog input plugin.
Now lets adapt the script by filling in our details for lines 7-13
By default the script will match all documents in the index however if you would like to adapt the query you can edit the query block.
Note: By default the script will also sort by the field "@timestamp" descending however you may want to change the sort for your data
Here is the tricky python part! You need to loop through your result and customize how you want to write your data-out. As .csv format uses commas (new column) and new line values (\n) to format the document the default document includes some basic formatting.
1.The output written to the file, each comma is a new column so the written message will look like the following for each hit returned:
column 1 | column 2 | column 3 |
---|---|---|
result._source.host | result._source.@timestamp | result._source.message |
2. Note that when there is a failure to write to the file, it will write the message to a array to print back.
3. At the end of the script, all the failed messages will be re-printed to the user
Looking at your directory you will see a output.csv now and the contents will look in excel like: