Exporting data from Elasticsearch using Python

It is a common requirement to export the data in Elasticsearch for users in a common format such as .csv. An example of this is exporting syslog data for audits. The easiest way to complete this task I have found is to use python as the language is accessible and the Elasticsearch packages are very well implemented.

In this post we will be adapting the full script found here.

1. Prerequisite

To be able to test this script, we will need:

  • Working Elasticsearch cluster
  • Workstation that can execute .py (python) files
  • Sample data to export

Assuming that your Elasticsearch cluster is ready, lets seed the data in Kibana by running:

POST logs/_doc
{
  "host": "172.16.6.38",
  "@timestamp": "2020-04-10T01:03:46.184Z",
  "message": "this is a test log"
}

This will add a log in the "logs" index with what is commonly ingested via logstash using the syslog input plugin.

2. Using the script

2.1. Update configuration values

Now lets adapt the script by filling in our details for lines 7-13

  • username: the username for your Elasticsearch cluster
  • password: the password for your Elasticsearch cluster
  • url: the url of ip address of a node in the Elasticsearch cluster
  • port: the transport port for your Elasticsearch cluster (defaults to 9200)
  • scheme: the scheme to connect to your Elasticsearch with (defaults to https)
  • index: the index to read from
  • output: the file to output all your data to

2.2. Customizing the Query

By default the script will match all documents in the index however if you would like to adapt the query you can edit the query block.

Note: By default the script will also sort by the field "@timestamp" descending however you may want to change the sort for your data

2.3. Customizing the Output

Here is the tricky python part! You need to loop through your result and customize how you want to write your data-out. As .csv format uses commas (new column) and new line values (\n) to format the document the default document includes some basic formatting.

1.The output written to the file, each comma is a new column so the written message will look like the following for each hit returned:

column 1 column 2 column 3
result._source.host result._source.@timestamp result._source.message

2. Note that when there is a failure to write to the file, it will write the message to a array to print back.

3. At the end of the script, all the failed messages will be re-printed to the user

2.4. Enjoying your hardwork!

Looking at your directory you will see a output.csv now and the contents will look in excel like:



Categories: Elasticsearch, python, Software Development

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: