{"id":266,"date":"2020-04-10T19:21:45","date_gmt":"2020-04-10T19:21:45","guid":{"rendered":"http:\/\/tonysbit.blog\/?p=266"},"modified":"2020-04-10T19:21:45","modified_gmt":"2020-04-10T19:21:45","slug":"exporting-data-from-elasticsearch-using-python","status":"publish","type":"post","link":"https:\/\/tonysbit.blog\/?p=266","title":{"rendered":"Exporting data from Elasticsearch using Python"},"content":{"rendered":"\n

It is a common requirement to export the data in Elasticsearch for users in a common format such as .csv. An example of this is exporting syslog data for audits. The easiest way to complete this task I have found is to use python as the language is accessible and the Elasticsearch packages are very well implemented.<\/p>\n

In this post we will be adapting the full script found here.<\/a><\/p>\n<\/div>\n\n\n\n

1. Prerequisite<\/h2>\n

To be able to test this script, we will need:<\/p>\n

    \n
  • Working Elasticsearch cluster<\/li>\n
  • Workstation that can execute .py (python) files<\/li>\n
  • Sample data to export<\/li>\n<\/ul>\n

    Assuming that your Elasticsearch cluster is ready, lets seed the data in Kibana by running:<\/p>\n

    POST logs\/_doc\n{\n  "host": "172.16.6.38",\n  "@timestamp": "2020-04-10T01:03:46.184Z",\n  "message": "this is a test log"\n}\n<\/code><\/pre>\n

    This will add a log in the "logs" index with what is commonly ingested via logstash using the syslog input plugin.<\/p>\n

    2. Using the script<\/h2>\n

    2.1. Update configuration values<\/h3>\n

    Now lets adapt the script by filling in our details for lines 7-13<\/p>\n<\/div>\n\n\n\n

    \"\"<\/figure><\/div>\n\n\n\n
      \n
    • username<\/strong>: the username for your Elasticsearch cluster<\/li>\n
    • password<\/strong>: the password for your Elasticsearch cluster<\/li>\n
    • url<\/strong>: the url of ip address of a node in the Elasticsearch cluster<\/li>\n
    • port<\/strong>: the transport port for your Elasticsearch cluster (defaults to 9200)<\/li>\n
    • scheme<\/strong>: the scheme to connect to your Elasticsearch with (defaults to https)<\/li>\n
    • index<\/strong>: the index to read from<\/li>\n
    • output<\/strong>: the file to output all your data to<\/li>\n<\/ul>\n<\/div>\n\n\n\n

      2.2. Customizing the Query<\/h2>\n

      By default the script will match all documents in the index however if you would like to adapt the query you can edit the query block.<\/p>\n<\/div>\n\n\n\n

      \"\"<\/figure><\/div>\n\n\n\n

      Note<\/strong>: By default the script will also sort by the field "@timestamp" descending however you may want to change the sort for your data<\/p>\n<\/div>\n\n\n\n

      2.3. Customizing the Output<\/h3>\n

      Here is the tricky python part! You need to loop through your result and customize how you want to write your data-out. As .csv format uses commas (new column) and new line values (\\n) to format the document the default document includes some basic formatting.<\/p>\n<\/div>\n\n\n\n

      \"\"<\/figure><\/div>\n\n\n\n

      1.The output written to the file, each comma is a new column so the written message will look like the following for each hit returned:<\/p>\n\n\n\n\n\n
      column 1<\/th>\ncolumn 2<\/th>\ncolumn 3<\/th>\n<\/tr>\n<\/thead>\n
      result._source.host<\/td>\nresult._source.@timestamp<\/td>\nresult._source.message<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n\n\n\n

      2. Note that when there is a failure to write to the file, it will write the message to a array to print back.<\/p>\n\n\n\n

      3. At the end of the script, all the failed messages will be re-printed to the user<\/p>\n\n\n\n

      2.4. Enjoying your hardwork!<\/h3>\n

      Looking at your directory you will see a output.csv now and the contents will look in excel like:<\/p>\n<\/div>\n\n\n\n

      \"\"<\/figure><\/div>\n","protected":false},"excerpt":{"rendered":"

      It is a common requirement to export the data in Elasticsearch for users in a common format such as .csv. An example of this is exporting syslog data for audits. The easiest way to complete this task I have found is to use python as the language is accessible and the Elasticsearch packages are very well implemented.<\/p>\n","protected":false},"author":1,"featured_media":285,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[7,11,13],"tags":[],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"https:\/\/i0.wp.com\/tonysbit.blog\/wp-content\/uploads\/2020\/04\/feature-1.png?fit=928%2C431&ssl=1","_links":{"self":[{"href":"https:\/\/tonysbit.blog\/index.php?rest_route=\/wp\/v2\/posts\/266"}],"collection":[{"href":"https:\/\/tonysbit.blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tonysbit.blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tonysbit.blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/tonysbit.blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=266"}],"version-history":[{"count":0,"href":"https:\/\/tonysbit.blog\/index.php?rest_route=\/wp\/v2\/posts\/266\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/tonysbit.blog\/index.php?rest_route=\/wp\/v2\/media\/285"}],"wp:attachment":[{"href":"https:\/\/tonysbit.blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=266"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tonysbit.blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=266"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tonysbit.blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=266"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}