Friday, 31 July 2015

Display your data in Kibana + Query it with the ES REST API


At this stage, our application is up and running, sending data to the ElasticSearch node in order to be indexed. Now it´s time for us to exploit it, so what we are going to do is:
- Start Kibana and define a simple date histogram
- Query Elastic Search REST API and SOAP UI
- Test aggregations a couple of simple aggregations on queries

Fire up Kibana

Unlike ElasticSearch, the Spring Cloud libraries do not seem to support the embedded execution of Kibana (I have opened a question in StackOverflow and a ticket in Spring GitHub project, I will update this post if any reply fixes the problem).
Therefore, if order to run Kibana, you will have to
- Download it from here.
- Run it by executing /bin/kibana.bat (or kibana.sh from a Unix system)

By default, Kibana Web GUI will be listening on port 5601 and it will try to connect to a ElasticSearch node at localhost:9200.

These parameters might be changed by editing the file /config/kibana.yml accordingly.

Register your index and fields


The first thing that Kibana will ask you is the name of the index you want to work with and whether it contains a temporal reference (it does)
  • Fill the name of the index (stockquotations) and specify which field is the time reference (timestamp in this case)
  • Go to Discover and select stock, timestamp and value from the available field list.
    • You should see something like this image below (if not, expand the selected time frame to allow some result to be shown)
Now, let´s go to Visualizations and let´s create a date histogram with the evolution of our stocks along the time.

Stock date histogram

Kibana supports a large number of graph types, what we are showing here is just one of them, which is the most suitable for purpose in this case: The line chart.
Basically, two pieces of configuration need to be provided:

What is going to be shown in the Y-axis

  • This is very easy, we will show the average value of the field value.

What is going to be shown in the X-axis

  • In this case two aggregations are needed:
    • A date aggregation: Add a date histogram of the field timestamp. If you choose an automatic granularity, it will adapt seamlessly whenever you change the date range of the graph (i.e. granularity should be different if you are showing five years of data or five hours).
    • We will need one line per stock: Add a Sub-bucket, choose Split Lines and split by terms of the field stock.
And voilá! You have a graph showing all stocks, which changes automatically if you change the date range (try showing a couple of years and then switch to a couple of weeks).

Finally is also interesting to perform further filtering once the graph is defined. For instance you might want to see two stocks, let´s say MAP.MC and SAN.MC. In that case, just enter this query in the field above the graph:

stock in (map.mc, san.mc)


Use SOAP-UI to interact with ElasticSearch REST API

SOAP-UI is an incredibly useful tool for any developer working in integration projects, web services (SOAP, REST) or even coding Web interfaces. Among many other features it offers:
- Possibility of creating clients from WSDL, WADL contracts
- Perform load testing
- Request manual creation and play/replay (this is my favorite)

I would´t like to make this too long, but perhaps in the future I can write on how this tool eases the development and testing of this type of interfaces. In the example below, I have used the request builder to perform a POST on http://localhost:9200/stockquotations/search while comfortable editing the JSON payload in a text box.

Example of an aggregation on a search query

The Search API is huge and I am not entirely familiar with it. You can find extensive documentation here. Anyway, we can check a really simple example where we will query our index and retrieve:
- The list of stocks present in our system, grouping the rest of the query by stock.
- Ordered by the average stock value, descending.
- A number of useful statistical parameters per stock (standard deviation,max,min, etc.)

This is the request:

And this is the response (part of it for the sake of brevity):


Cool, isn´t? Later on, we will develop a service to retrieve this parameters and make further use of them.


Calculating Moving Averages

In the technical analysis of stock markets, there are a group of calculations that are specially useful: Moving Averages (either simple or exponential, weighted, etc.).
ElasticSearch will support this type of aggregation starting  from version 2.0.0 (current is 1.7.0). A really nice feature to have.
See more info here.

What is next?

Once we have played a little bit with Kibana and the REST interface, is time to make some queries from our Spring Boot application and make it available in our own REST interface to other applications or to our custom Web interface. We will see that in the next article.

As usual, you can find the source code in this Git Repository.


Sunday, 26 July 2015

Integrate ElasticSearch with Spring-Boot

Summary

As commented in the previous post, the destinations for the recovered information will be:
- A simple log
- An Elasticsearch node, fur further data indexing and analysis.
How do we accomplish this? Very easy.

As we saw, during the project creation, we included the dependency spring-boot-starter-data-elasticsearch. This dependency will provide us with:
- A "template" API to perform  REST operations in towards an ES node, but without all the REST boilerplate.
- An embedded ES node, that works great in order to perform tests and help you understand ES.

Launching an ES node along with your Spring-Boot application:

If you want to launch an embedded ES node, you just need to:

  • Enable it in your application .properties file by setting spring.data.elasticsearch.properties.http.enabled=true.
  • Check that is properly running this curl command:
    • $ curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
  • Some cluster health statistics will be displayed (name, shards available, etc.)

Sending the information to ES

If you recall, our Spring Integration flow ended in a Service Activator. This component specified that @ELKClient.pushToELK()  would be invoked along with the payload of the message received (that is the CSV quotation). Well, let´s see the code used to perform the sending (Which I think is pretty much self-explanatory)

Building the document to be indexed

Finally we just need to create a simple bean, with some annotations specifying which fields are to be sent, analyzed and stored and in which format:

Next Steps

Once we run the application, it will load some files containing historical data and it will query the spreadsheets stored in Google Docs. Each entry will be converted to a Java Object and sent to Elastic Search. In the next post we will see how can we build some nice graphs using Kibana (part of ELK stack) and the data we have indexed. One example (showing the historical data):

Example of one dashboard created with Kibana and the historical data

Moreover, we will see how can we deploy the micro-service, the elastic search node and Kibana to the cloud, so we can continuously gather stock quotations and perform better analysis.

Source code

You can find the code in this GitHub repository.

Friday, 24 July 2015

Spring Boot/Integration gateway microservice


Summary


As we saw in the previous article, we created a stream of information coming from stock quotations that we will process in our application. What are the objectives? For the moment two:
  • De-serialize the information and split it in tuples of (Stock,Price,Date)
  • Publish these items to two streams
    • A simple Log
    • To the ELK stack (Elastic Search + Logstash + Kibana). We won´t use Logstash, though.

As usual with Spring Boot + Integration, the implementation of this kind of services is extremely easy.

Creation of the application  with Spring STS

Of course you can do this manually or by using a Maven archetype. However, Spring STS provides a handy wizard for creating applications with Spring-Boot that apart from creating the basic project framework, it allows you to add whatever Spring modules you may consider needed:
  1. Modules chosen:
    • Spring-Integration: Support for integration flows and usage of reusable components (file reading, messaging and tons more...)
    •  Spring-Elastic Search (Cloud): This includes, among other features, a ES client that facilitates the sending of information towards ES. Moreover, using the default configuration, it launches a local ES node to play with.
  2. I manually added two dependencies:
    • Spring-Integration-File: This will allow us to load files with historical data
    • Elastic Search: This is needed to work with the latest Kibana version (This will be explained more in depth in the next article)
For the sake of clarity, you can see how the pom.xml looks like here.

Implementation of the Integration business logic

All is accomplished by means of an XML defining the flow, a couple of Java classes and an import in the main class. This can be reduced even more by using more auto-configuration features.

  • Enable the Spring Integration support for you XML file containing the flow


  • Write the flow, in a file called /META-INF/feedFlow.xml, which will contain:
    1. A http-outbound-gateway, that polls regularly the URL where the stock quotations are published
    2. A file reader that reads the local files containing historical data.
    3. A splitter, that creates one "item" per stock quotation.
    4. A logger that logs each item
    5. A service activator that sends the information to Elastic Search through REST
STS-generated diagram with the flow


Let´s take a look to the most relevant elements with some more detail:


HTTP part of the flow
  1. int-http:outbound-gateway performs a HTTP request to the specified URL. The result of the request (the whole CSV file) is sent to the quotations channel
  2. The request is triggered after an int:inbound-channel-adapter, which is driven by a poller, injects a message in the trigger channel.
File part of the flow
It might be useful, for further studies to have historical quotations. In order to process them a int-file:inbound-channel-adapter is polling a local folder where the user can place the files that he/she would like to load. These files are read and send to the quotations channel.

In this really simple example, we can see the powerful abstraction that Spring Integration gives, as we can inject quotations independently from where they come, as long as they respect the appropriate contract (format).

Final part of the flow
Finally, the CSV is split  in lines by a custom implementation of a Splitter and the result is made public in a publisher-subscriber channel, which is the Spring equivalent to a JMS topic.
Two consumers feed on these messages:
- A really simple logger, that logs the whole message (headers included)
- A service activator that send the quotations to the Elastic Search node.

We will see, how do we publish the data to ES and in which format in the next article. Moreover, we will create a couple of nice graphs using the data indexed in ES.

Source code

You can find the code in this GitHub repository.

I´m still improving things, like the error handling, configuration and proper conversion from CSV to the data expected by ES. In any case, feel free to share your critics/feedback.

Thursday, 23 July 2015

Stock quotations as input for your Spring Integration App

Summary


As you might know Google deprecated its Google Finance API some time ago and other options such as Yahoo Finance API do not seem to work when dealing with Stocks outside the US (i.e. Spanish stocks are not supported).

So here I found an option of getting Stock prices by means of using a Google Spreadsheet as a source of information. Later on, we will consume this source of information with Spring Integration.

Creating and publishing your Google Spreadsheet containing stock quotations

If you are familiar with Google Docs and Spreadsheets, you will know for sure how simple is to create such documents, so I will omit the creation of the spreadsheet for the sake of clarity.

We will use the function GOOGLEFINANCE to retrieve the prices and timestamp of a number of stocks (there are many more features, you can see full function documentation is Google Spreadsheets) using three columns:

  • Column A: Stock ticker
  • Column B: Price
    • The ticket will be taken from column A
    • =GOOGLEFINANCE($A1)
  • Column C:
    • Last trade dateThe ticket will be taken from column A
    • =GOOGLEFINANCE($A1;"tradetime")
With this setup, you will be able to write in column A as much stocks as you want and extend the formula to create a table. See this example (all but one are spanish stocks, OHI trades in the NYSE, note that for today we still don´t have a quotation).



Publish your sheet

Just go to File... Publish in the Web... and choose which sheets you want to publish and CSV format as output (of the formats available is the easiest by far to read later on by our application).
Check out this example here.

Next steps

After this simple set up, you will have a simple source of information with the stock parameters we need. In the next article, we will start building our Spring application (using Boot and Integration) to consume this source of data, store and analyze it.
Bear in mind that this free stock data is not suitable for automatic trading as it is not 100% reliable and usually come with a delay of around 20 minutes.


Update


This article refers to getting "live" stock data (i.e. delayed by 20 minutes). If you are also interested in getting historical stock data, Yahoo finance is still a good choice for you.  See this other article with more information on how to query the API provided by Yahoo Finance.


Hello World!


Yet another software blog. I know, that the idea is, by far, not the most original in the world.
However, in the past weeks, I found myself involved in a series of learning tasks that I have the feeling will be more useful for me and others if made public and accessible for everyone instead of being written to a .txt file and the source code stored in a local sandbox (only to be lost in the next HDD failure).

Let´s see how it works...