Sunday 28 August 2016

How to clean up ElasticSearch with Curator 4.0.6

Summary

Today, I would like to share with you a quick introduction to a tool that cleans and maintains your ElasticSearch cluster clean and free from old data: Curator (thanks flopezlasanta for the tip on this!)

This tool, along with ElasticSearch itself, evolves very quickly, so when I was looking for information on the subject and I found this blog entry from 2014, I noticed how much the way of working with the tool has changed. Anyway, kudos to @ragingcomputer!

Installation instructions

Before you install Curator, you need to get the Python package Installer (Pip), so Python is another requirement, Note that if you are running a version of Python newer than 3.4 (version 3) or 2,7 (version 2), Pip comes already installed with Python. More info here.

Note: You need to be super user to perform the installation.

And you are ready to go! Now let´s check the other two files needed to make it work.


Configuration file

This is a standard configuration file, using an ElasticSearch cluster running in you same host and in the default port. Note that you have many more configuration options for more advanced scenarios. Check this for further info on them.

Action file

I don´t want to go too deep in detail, but curator allows multiples actions to be performed over your indices, among the most important are:

  • Delete an index: Pretty obvious, it deletes the data.
  • Close an index: This keeps the data but it is not loaded into memory.
  • Open an index: If you want to look further back, you can reopen a closed index.
You can also take snapshots and restore them... Plenty of choices but, for the moment, let´s stick to the basic delete:

This is the basic delete action, that searches for indexes created by logstash (see the logstash-* search pattern) and deletes the ones older than 80 days. If you want something more advanced, there are a lot of useful examples pretty much ready to take and use here. Don´t forget to enable the actions as they come disabled by default.

Standalone execution

Now you are ready to execute a one-off cleaning of your cluster. Just invoke curator referencing the configuration file and the actions file.

Note: If you are not too sure about the outcome of this execution, it is recommended to use the --dry-run option, which will simulate the actions taken but will not change anything in your cluster. This is an example of a simulated execution:
$ /usr/bin/curator --config /home/victor/curator.yml /home/victor/action.yml --dry-run
Action #1: delete_indices
DRY-RUN MODE.  No changes will be made.
(CLOSED) indices may be shown that may not be acted on by action "delete_indices".
DRY-RUN: delete_indices: logstash-2016.06.01 with arguments: {}
DRY-RUN: delete_indices: logstash-2016.06.02 with arguments: {}
DRY-RUN: delete_indices: logstash-2016.06.03 with arguments: {}
Action #1: completed
Job completed.
As you can see, the actions taken by Curator if it would have been enabled, would be to delete the indexes related to the days first, second and third of June 2016.

Scheduled Execution with CronTab

To make this tool even more useful, you can schedule a periodic cleanup, which should run automatically without any manual action needed. You can do this very easily with crontab.
Note: If you are running on Windows, you can use schtasks or any other planning tool.

You can check the result of the cron execution by tailing this file: /var/log/cron

I hope it helps you having your ElasticSearch cluster hold only the relevant data to make it quick and useful!

No comments:

Post a Comment