{"id":266,"date":"2020-04-10T19:21:45","date_gmt":"2020-04-10T19:21:45","guid":{"rendered":"http:\/\/tonysbit.blog\/?p=266"},"modified":"2020-04-10T19:21:45","modified_gmt":"2020-04-10T19:21:45","slug":"exporting-data-from-elasticsearch-using-python","status":"publish","type":"post","link":"https:\/\/tonysbit.blog\/?p=266","title":{"rendered":"Exporting data from Elasticsearch using Python"},"content":{"rendered":"\n<div class=\"wp-block-jetpack-markdown\"><p>It is a common requirement to export the data in Elasticsearch for users in a common format such as .csv. An example of this is exporting syslog data for audits. The easiest way to complete this task I have found is to use python as the language is accessible and the Elasticsearch packages are very well implemented.<\/p>\n<p>In this post we will be adapting the full script found <a href=\"https:\/\/github.com\/xucito\/elasticsearch-tools\/blob\/master\/log-exporter\/log-exporter.py\">here.<\/a><\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><h2>1. Prerequisite<\/h2>\n<p>To be able to test this script, we will need:<\/p>\n<ul>\n<li>Working Elasticsearch cluster<\/li>\n<li>Workstation that can execute .py (python) files<\/li>\n<li>Sample data to export<\/li>\n<\/ul>\n<p>Assuming that your Elasticsearch cluster is ready, lets seed the data in Kibana by running:<\/p>\n<pre><code>POST logs\/_doc\n{\n  &quot;host&quot;: &quot;172.16.6.38&quot;,\n  &quot;@timestamp&quot;: &quot;2020-04-10T01:03:46.184Z&quot;,\n  &quot;message&quot;: &quot;this is a test log&quot;\n}\n<\/code><\/pre>\n<p>This will add a log in the &quot;logs&quot; index with what is commonly ingested via logstash using the syslog input plugin.<\/p>\n<h2>2. Using the script<\/h2>\n<h3>2.1. Update configuration values<\/h3>\n<p>Now lets adapt the script by filling in our details for lines 7-13<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"620\" height=\"167\" src=\"https:\/\/i0.wp.com\/vfy.mne.mybluehost.me\/tonysbit\/wp-content\/uploads\/2020\/04\/image-1.png?resize=620%2C167\" alt=\"\" class=\"wp-image-272\" srcset=\"https:\/\/i0.wp.com\/tonysbit.blog\/wp-content\/uploads\/2020\/04\/image-1.png?w=620&amp;ssl=1 620w, https:\/\/i0.wp.com\/tonysbit.blog\/wp-content\/uploads\/2020\/04\/image-1.png?resize=300%2C81&amp;ssl=1 300w\" sizes=\"(max-width: 620px) 100vw, 620px\" data-recalc-dims=\"1\" \/><\/figure><\/div>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><ul>\n<li><strong>username<\/strong>: the username for your Elasticsearch cluster<\/li>\n<li><strong>password<\/strong>: the password for your Elasticsearch cluster<\/li>\n<li><strong>url<\/strong>: the url of ip address of a node in the Elasticsearch cluster<\/li>\n<li><strong>port<\/strong>: the transport port for your Elasticsearch cluster (defaults to 9200)<\/li>\n<li><strong>scheme<\/strong>: the scheme to connect to your Elasticsearch with (defaults to https)<\/li>\n<li><strong>index<\/strong>: the index to read from<\/li>\n<li><strong>output<\/strong>: the file to output all your data to<\/li>\n<\/ul>\n<\/div>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><h2>2.2. Customizing the Query<\/h2>\n<p>By default the script will match all documents in the index however if you would like to adapt the query you can edit the query block.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"479\" height=\"283\" src=\"https:\/\/i0.wp.com\/vfy.mne.mybluehost.me\/tonysbit\/wp-content\/uploads\/2020\/04\/image-2.png?resize=479%2C283\" alt=\"\" class=\"wp-image-274\" srcset=\"https:\/\/i0.wp.com\/tonysbit.blog\/wp-content\/uploads\/2020\/04\/image-2.png?w=479&amp;ssl=1 479w, https:\/\/i0.wp.com\/tonysbit.blog\/wp-content\/uploads\/2020\/04\/image-2.png?resize=300%2C177&amp;ssl=1 300w\" sizes=\"(max-width: 479px) 100vw, 479px\" data-recalc-dims=\"1\" \/><\/figure><\/div>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p><strong>Note<\/strong>: By default the script will also sort by the field &quot;@timestamp&quot; descending however you may want to change the sort for your data<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><h3>2.3. Customizing the Output<\/h3>\n<p>Here is the tricky python part! You need to loop through your result and customize how you want to write your data-out. As .csv format uses commas (new column) and new line values (\\n) to format the document the default document includes some basic formatting.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"178\" src=\"https:\/\/i0.wp.com\/vfy.mne.mybluehost.me\/tonysbit\/wp-content\/uploads\/2020\/04\/image-3.png?resize=640%2C178\" alt=\"\" class=\"wp-image-277\" srcset=\"https:\/\/i0.wp.com\/tonysbit.blog\/wp-content\/uploads\/2020\/04\/image-3.png?w=880&amp;ssl=1 880w, https:\/\/i0.wp.com\/tonysbit.blog\/wp-content\/uploads\/2020\/04\/image-3.png?resize=300%2C84&amp;ssl=1 300w, https:\/\/i0.wp.com\/tonysbit.blog\/wp-content\/uploads\/2020\/04\/image-3.png?resize=768%2C214&amp;ssl=1 768w\" sizes=\"(max-width: 640px) 100vw, 640px\" data-recalc-dims=\"1\" \/><\/figure><\/div>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><p>1.The output written to the file, each comma is a new column so the written message will look like the following for each hit returned:<\/p>\n<table>\n<thead>\n<tr>\n<th>column 1<\/th>\n<th>column 2<\/th>\n<th>column 3<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>result._source.host<\/td>\n<td>result._source.@timestamp<\/td>\n<td>result._source.message<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n\n\n\n<p>2. Note that when there is a failure to write to the file, it will write the message to a array to print back.<\/p>\n\n\n\n<p>3. At the end of the script, all the failed messages will be re-printed to the user<\/p>\n\n\n\n<div class=\"wp-block-jetpack-markdown\"><h3>2.4. Enjoying your hardwork!<\/h3>\n<p>Looking at your directory you will see a output.csv now and the contents will look in excel like:<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"422\" height=\"258\" src=\"https:\/\/i0.wp.com\/vfy.mne.mybluehost.me\/tonysbit\/wp-content\/uploads\/2020\/04\/image-4.png?resize=422%2C258\" alt=\"\" class=\"wp-image-281\" srcset=\"https:\/\/i0.wp.com\/tonysbit.blog\/wp-content\/uploads\/2020\/04\/image-4.png?w=422&amp;ssl=1 422w, https:\/\/i0.wp.com\/tonysbit.blog\/wp-content\/uploads\/2020\/04\/image-4.png?resize=300%2C183&amp;ssl=1 300w\" sizes=\"(max-width: 422px) 100vw, 422px\" data-recalc-dims=\"1\" \/><\/figure><\/div>\n","protected":false},"excerpt":{"rendered":"<p>It is a common requirement to export the data in Elasticsearch for users in a common format such as .csv. An example of this is exporting syslog data for audits. The easiest way to complete this task I have found is to use python as the language is accessible and the Elasticsearch packages are very well implemented.<\/p>\n","protected":false},"author":1,"featured_media":285,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[7,11,13],"tags":[],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"https:\/\/i0.wp.com\/tonysbit.blog\/wp-content\/uploads\/2020\/04\/feature-1.png?fit=928%2C431&ssl=1","_links":{"self":[{"href":"https:\/\/tonysbit.blog\/index.php?rest_route=\/wp\/v2\/posts\/266"}],"collection":[{"href":"https:\/\/tonysbit.blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tonysbit.blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tonysbit.blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/tonysbit.blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=266"}],"version-history":[{"count":0,"href":"https:\/\/tonysbit.blog\/index.php?rest_route=\/wp\/v2\/posts\/266\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/tonysbit.blog\/index.php?rest_route=\/wp\/v2\/media\/285"}],"wp:attachment":[{"href":"https:\/\/tonysbit.blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=266"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tonysbit.blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=266"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tonysbit.blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=266"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}