Today, I would like to share with you a quick introduction to a tool that cleans and maintains your ElasticSearch cluster clean and free from old data: Curator (thanks flopezlasanta for the tip on this!)
This tool, along with ElasticSearch itself, evolves very quickly, so when I was looking for information on the subject and I found this blog entry from 2014, I noticed how much the way of working with the tool has changed. Anyway, kudos to @ragingcomputer!
Installation instructions
Before you install Curator, you need to get the Python package Installer (Pip), so Python is another requirement, Note that if you are running a version of Python newer than 3.4 (version 3) or 2,7 (version 2), Pip comes already installed with Python. More info here.
Note: You need to be super user to perform the installation.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Install pip | |
curl "https://bootstrap.pypa.io/get-pip.py" -o "get-pip.py" | |
python get-pip.py | |
# Get the curator | |
pip install elasticsearch-curator |
Configuration file
This is a standard configuration file, using an ElasticSearch cluster running in you same host and in the default port. Note that you have many more configuration options for more advanced scenarios. Check this for further info on them.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
# Remember, leave a key empty if there is no value. None will be a string, | |
# not a Python "NoneType" | |
client: | |
hosts: "127.0.0.1" | |
port: 9200 | |
url_prefix: | |
use_ssl: False | |
certificate: | |
client_cert: | |
client_key: | |
aws_key: | |
aws_secret_key: | |
aws_region: | |
ssl_no_validate: False | |
http_auth: | |
timeout: 30 | |
master_only: False | |
logging: | |
loglevel: INFO | |
logfile: | |
logformat: default | |
blacklist: ['elasticsearch', 'urllib3'] |
I don´t want to go too deep in detail, but curator allows multiples actions to be performed over your indices, among the most important are:
- Delete an index: Pretty obvious, it deletes the data.
- Close an index: This keeps the data but it is not loaded into memory.
- Open an index: If you want to look further back, you can reopen a closed index.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
# Remember, leave a key empty if there is no value. None will be a string, | |
# not a Python "NoneType" | |
# | |
# Also remember that all examples have 'disable_action' set to True. If you | |
# want to use this action as a template, be sure to set this to False after | |
# copying it. | |
actions: | |
1: | |
action: delete_indices | |
description: >- | |
Delete indices older than 80 days (based on index name), for logstash- | |
prefixed indices. Ignore the error if the filter does not result in an | |
actionable list of indices (ignore_empty_list) and exit cleanly. | |
options: | |
ignore_empty_list: True | |
timeout_override: | |
continue_if_exception: False | |
disable_action: False | |
filters: | |
- filtertype: pattern | |
kind: prefix | |
value: logstash- | |
exclude: | |
- filtertype: age | |
source: name | |
direction: older | |
timestring: '%Y.%m.%d' | |
unit: days | |
unit_count: 80 | |
exclude: |
Standalone execution
Now you are ready to execute a one-off cleaning of your cluster. Just invoke curator referencing the configuration file and the actions file.
Note: If you are not too sure about the outcome of this execution, it is recommended to use the --dry-run option, which will simulate the actions taken but will not change anything in your cluster. This is an example of a simulated execution:
$ /usr/bin/curator --config /home/victor/curator.yml /home/victor/action.yml --dry-run Action #1: delete_indices DRY-RUN MODE. No changes will be made. (CLOSED) indices may be shown that may not be acted on by action "delete_indices". DRY-RUN: delete_indices: logstash-2016.06.01 with arguments: {} DRY-RUN: delete_indices: logstash-2016.06.02 with arguments: {} DRY-RUN: delete_indices: logstash-2016.06.03 with arguments: {} Action #1: completed Job completed.
Scheduled Execution with CronTab
To make this tool even more useful, you can schedule a periodic cleanup, which should run automatically without any manual action needed. You can do this very easily with crontab.
Note: If you are running on Windows, you can use schtasks or any other planning tool.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
crontab -e | |
# Delete old ELK indices everyday 30 minutes after midnight | |
30 0 * * * /usr/bin/curator --config /home/victor/curator.yml /home/victor/action.yml |
I hope it helps you having your ElasticSearch cluster hold only the relevant data to make it quick and useful!
No comments:
Post a Comment