Victor Ferrer´s Java and SW blog: How to clean up ElasticSearch with Curator 4.0.6

Summary

Today, I would like to share with you a quick introduction to a tool that cleans and maintains your ElasticSearch cluster clean and free from old data: Curator (thanks flopezlasanta for the tip on this!)

This tool, along with ElasticSearch itself, evolves very quickly, so when I was looking for information on the subject and I found this blog entry from 2014, I noticed how much the way of working with the tool has changed. Anyway, kudos to @ragingcomputer!

Installation instructions

Before you install Curator, you need to get the Python package Installer (Pip), so Python is another requirement, Note that if you are running a version of Python newer than 3.4 (version 3) or 2,7 (version 2), Pip comes already installed with Python. More info here.

Note: You need to be super user to perform the installation.

	# Install pip
	curl "https://bootstrap.pypa.io/get-pip.py" -o "get-pip.py"
	python get-pip.py

	# Get the curator
	pip install elasticsearch-curator

view raw Curator.sh hosted with ❤ by GitHub

And you are ready to go! Now let´s check the other two files needed to make it work.

Configuration file

This is a standard configuration file, using an ElasticSearch cluster running in you same host and in the default port. Note that you have many more configuration options for more advanced scenarios. Check this for further info on them.

	---
	# Remember, leave a key empty if there is no value. None will be a string,
	# not a Python "NoneType"
	client:
	hosts: "127.0.0.1"
	port: 9200
	url_prefix:
	use_ssl: False
	certificate:
	client_cert:
	client_key:
	aws_key:
	aws_secret_key:
	aws_region:
	ssl_no_validate: False
	http_auth:
	timeout: 30
	master_only: False

	logging:
	loglevel: INFO
	logfile:
	logformat: default
	blacklist: ['elasticsearch', 'urllib3']

view raw configuration.yml hosted with ❤ by GitHub

Action file

I don´t want to go too deep in detail, but curator allows multiples actions to be performed over your indices, among the most important are:

Delete an index: Pretty obvious, it deletes the data.
Close an index: This keeps the data but it is not loaded into memory.
Open an index: If you want to look further back, you can reopen a closed index.

You can also take snapshots and restore them... Plenty of choices but, for the moment, let´s stick to the basic delete:

	---
	# Remember, leave a key empty if there is no value. None will be a string,
	# not a Python "NoneType"
	#
	# Also remember that all examples have 'disable_action' set to True. If you
	# want to use this action as a template, be sure to set this to False after
	# copying it.
	actions:
	1:
	action: delete_indices
	description: >-
	Delete indices older than 80 days (based on index name), for logstash-
	prefixed indices. Ignore the error if the filter does not result in an
	actionable list of indices (ignore_empty_list) and exit cleanly.
	options:
	ignore_empty_list: True
	timeout_override:
	continue_if_exception: False
	disable_action: False
	filters:
	- filtertype: pattern
	kind: prefix
	value: logstash-
	exclude:
	- filtertype: age
	source: name
	direction: older
	timestring: '%Y.%m.%d'
	unit: days
	unit_count: 80
	exclude:

view raw actions.yml hosted with ❤ by GitHub

This is the basic delete action, that searches for indexes created by logstash (see the logstash-* search pattern) and deletes the ones older than 80 days. If you want something more advanced, there are a lot of useful examples pretty much ready to take and use here. Don´t forget to enable the actions as they come disabled by default.

Standalone execution

Now you are ready to execute a one-off cleaning of your cluster. Just invoke curator referencing the configuration file and the actions file.

Note: If you are not too sure about the outcome of this execution, it is recommended to use the --dry-run option, which will simulate the actions taken but will not change anything in your cluster. This is an example of a simulated execution:

$ /usr/bin/curator --config /home/victor/curator.yml /home/victor/action.yml --dry-run
Action #1: delete_indices
DRY-RUN MODE.  No changes will be made.
(CLOSED) indices may be shown that may not be acted on by action "delete_indices".
DRY-RUN: delete_indices: logstash-2016.06.01 with arguments: {}
DRY-RUN: delete_indices: logstash-2016.06.02 with arguments: {}
DRY-RUN: delete_indices: logstash-2016.06.03 with arguments: {}
Action #1: completed
Job completed.

As you can see, the actions taken by Curator if it would have been enabled, would be to delete the indexes related to the days first, second and third of June 2016.

Scheduled Execution with CronTab

To make this tool even more useful, you can schedule a periodic cleanup, which should run automatically without any manual action needed. You can do this very easily with crontab.
Note: If you are running on Windows, you can use schtasks or any other planning tool.

	crontab -e

	# Delete old ELK indices everyday 30 minutes after midnight
	30 0 * * * /usr/bin/curator --config /home/victor/curator.yml /home/victor/action.yml

view raw crontab.sh hosted with ❤ by GitHub

You can check the result of the cron execution by tailing this file: /var/log/cron

I hope it helps you having your ElasticSearch cluster hold only the relevant data to make it quick and useful!

Code Notes on Github.io
Stokker
Portfolio Manager
Sparkker

Victor Ferrer´s Java and SW blog

Sunday, 28 August 2016

How to clean up ElasticSearch with Curator 4.0.6

No comments:

Post a Comment