Sunday, 31 January 2016

Calculating Moving Averages with Spark MLLib

Update on Friday. 13rd of January, 2017

This post is about the implementation using the Java API. If you want to see the same example of calculating Moving Averages with Spark MLLib but using Scala, please check out this post.

Introduction

Since I began learning Spark, one of the things that given me the most pain is the Java API that wraps the Scala API used in Spark. So, I would like to share this problem I had using the Spark Machine Learning library in case is useful for anybody learning Spark. Also, it´s nice to take a look to the possibilities that the sliding windows give us when working on timed data.

The problem

As I wrote in my previous post, I created a new independent module to make studies on Stock data using Spark. One of the most interesting (and basic) studies one can make over Stock data, is to calculate the simple moving averages.

What is a simple moving average (SMA)?

If we take a window of time of length N, the SMA is:
Source Wikipedia: https://en.wikipedia.org/wiki/Moving_average
Wait, what? In plain English: For every element, we take the N precedent elements and calculate the average. The aim is to smooth the data in order to easily detect underlying patterns in the data.

One possible solution: Spark MLLib

Searching around in internet, I found that the Spark Machine Learning Libray has a sliding window operation over an RDD. Using this, it would be very easy to calculate the SMA as described in this StackOverflow answer.


Note: The RDD must be sorted before appliying the sliding function!

The caveat is that the previous code is written in Scala. How would it look like in Java?

Saturday, 30 January 2016

Orchestrating your microservices with Eureka and Feign


Introduction


This is a well known scenario: It´s time to add some new functionality to your application and you decide to add a new member to the family of micro-services.
In this example, we are going to use a service called "Sparkker" that will consume stock quotations from anothed service called "Stokker" and will publish the results to another service called "Portfolio-Manager"

This is a diagram showing the whole picture:



It´s obvious that the number of lines connecting the services increase and increase, making the maintenance and configuration of the system very tedious. How can we solve it? We can´t spend the whole day coding integration code, we need to deliver added value!

One possible solution to all this mess is to use some of the components from the Netflix Stack that have been incorporated to the Spring Cloud project and that are very simple to use within Spring Boot. Interested?

Thursday, 19 November 2015

Dockercon 2015

Just returned from the DockerCon 15 that took place in Barcelona, Spain. It has been a somewhat tiring but interesting experience. Here there are some quick thoughts I´d like to share with you.

Brand new Docker functionalities


In the general session, at the very beginning of the conference, some new features were show-cased and seemed  very interesting:
  • Production-ready Docker Swarm (which handles the creation and orchestration of a large number of containers in distributed applications).
  • Out of the box multi-host networking.
These two features were demonstrated by creating 50.000 containers on 1.000 Amazon Web Services nodes. However, I would have loved to see them actually performing some work, (rather just starting up). Specially hard would have been to share some workload and perform some kind of coordination...

Here you can see the video from the general session:


The demo done on the Docker Toolbox was quite interesting, it might be a really good tool in order to ramp up new developers coming to your project.

Tuesday, 27 October 2015

Introduction to CI and Cloud deployment with GitHub,Travis and Heroku

Introduction


These past weeks, while (eventually) working in my Stokker project, I noted the lack of continuous build that I usually enjoy at work. I committed many bugs to my code-base and even, non-compiling code!

So, what are the alternatives to fix this situation? Perhaps having a local Jenkins environment? Too hard to maintain. No CI at all? Not feasible if we pretend that somebody might cooperate in the future.

One possible solution: Use Github integration with Travis CI and Heroku


This is one of many and, to be fair, I did´t do a thorough study on the different choices before trying it. I noticed that JHipster is using it as CI system for part of its code-base, though.

Basically, what Travis-CI does is to poll regularly your repository and perform a build based on the parameters specified in a file called .travis.yml.

Here, we can see a very basic example on how does this file look like:
Basically, we are stating that:
  • This is a Java project. As no build.gradle file has been provided, it will execute these Maven commands:
    • mvn clean install
    • mvn test
  • We intend to deploy the result of the successful build to Heroku
    • You can get the API Key later on, after registering in Heroku.

Hint on working with the .travis.yml file: One handy tool is this YAML validator, specially due to the cryptic error messages given by Travis-CI.

Once a change in the master branch is detected, Travis-CI will pull the code and perform a Maven build. If a compilation or test failure is detected the process will be stopped and notification mails will be sent.

This is how a successful build looks like:


As said, once you have a successful build, Travis CI will trigger a Heroku deployment. By the way, what is Heroku? Heroku is a commercial PaaS that also offers some free functionality for hobby programmers and people learning. If you want to test it, follow this steps:

  • Register in Heroku with your personal details.
  • Create an application and link it with your Github:
    • By doing so, you will be provided with an application key (which was inserted in the YML snippet above).
  • Choose the "Free" dyno mode. This mode offers you very limited resources and the machine is shut down after 60 minutes of inactivity.
    • By the way, these are the different plans available:



The way in which your application is executed, is defined in a Procfile (this file should be placed in the same folder of .travis.yml)



Running example


You can check the unfinished application running here: https://stokker.herokuapp.com

Update on November 1st 2015: I had to separate Stokker and Portfolio Manager into two different Github repositories (and two Heroku applications) in order to be able to run both of them so the URL for the Portfolio manager in Heroku is now: https://stokker-portfolio-manager.herokuapp.com

  • Please excuse me for the ugly interface, learning AngularJS is still one of my non achieved goals!). Moreover I have not yet provided a CRUD JavaScript interface for the entities managed in the project (Portfolio, Market Position and Stock). If you want to interact with the database, you can use the Spring Data REST endpoints:
  • You will notice that no price is used for the portfolios, as I still have not figured out how to run more that one application per repository.
  • Kibana functionality won´t be available.


Manage your application with Heroku toolbelt

Heroku offers you a tool called Heroku toolbelt that you can install in your local computer and use it to:
  • Manage and monitor your applications:
    • You can login and check the application logs.
    • You can deploy directly by performing a git push heroku.
  • Deploy your applications in a local environment in order to ensure that everything is OK before pushing your code.

Resources

This is a really shallow introduction to Travis CI and Heroke. I am astonished, as the more I read the more documentation I found. Here you might find some interesting links in case you want to continue learning on this topic:


Hope that you find this quick introduction interesting, any feedback is always more than welcome! You can find the code of the unfinished application here.

Wednesday, 14 October 2015

Integrate your Kibana dashboard in your Spring Boot + AngularJS application

Summary


Let´s review the process to follow in order to:

  •  Provide your Spring Boot application with a very basic AngularJS interface
  •  Show an already defined Kibana dashboard and interact with it.

The application for which we are going to define the user interface is called Portfolio Manager, a really simple Spring Boot application with some JPA entities (stock, position, portfolio, etc.) exposed via REST.
In order to feed the Elastic Search node from which Kibana will read, we will use Stokker (application described in past articles).

See this high-level diagram for a clearer picture of the system:



Both of them can be checked in this Git Hub repository along with the instructions to run all the components.

Wednesday, 30 September 2015

Summer rentrée + upcoming DockerCon Europe 2015


Sorry, it´s being a while since my last post but it has been a terrible time for writing (summer holidays, good weather, "la feria de Málaga", ...).
I have some unfinished topics that I´m working with, and I would like to polish them before writing here, so stay tuned:

  • Spring Boot 1.3.0 + Elastic Search 2.0 together (I´m having problems trying to run them together, beta components are packed with bugs pending to be fixed in the incoming releases).
  • Setting up OAuth2 in my micro-services environment along with the rest of Netflix components (Eureka, Zuul, ...).
  • Exploring on the usage of Spring Data Rest to make it extremely easy to provide your persisted entities a nice REST interface.
  • ...

In the meantime, something new and exciting has come up: Some colleagues and myself will attend the DockerCon Europe 2015 in Barcelona!

The agenda is still open, but I have watched some of the videos of the recent SFO DockerCon 2015 and some of them are pretty interesting, specially when it comes to dealing with services deployed in a Cloud environment that you have to deploy/undeploy, orchestrate and monitor.

I really hope that this event meets my expectations and that I can come back and share with you plenty of interesting features worth learning.

Here you have some interesting resources from Youtube regarding the SFO DockerCon:

Specially interesting, as they are really close to what I´m doing, are these two videos:
Finally, as I am somewhat a novice in this technology, I am following this official tutorial series (given by one of the SFO speakers) which is a really nice place to start learning:

That´s all folks (for now).

Wednesday, 12 August 2015

Historical Stock Quotations as input for your Spring Application

Update on July 2nd 2017

Unfortunately, the API described in this article has been changed and now it requires a session cookie to access the data in CSV format. I guess the guys on Yahoo do not like to display their data without having the chance of displaying their ads...
Still, you might download the data manually (example URL here).
In the meantime, I´m looking for an alternate source of prices, I´ll keep you posted.

Summary


In a previous article, we defined how to get new stock quotations as soon as they were published in Google Finance (of course, after a 20-minute delay, this is a free service after all).
However, it was hard to study this data without having any historical data to compare with, so we defined a manual process to load some historical prices.

This manual process is very inefficient, though. You need to do it manually and there is a data gap between the last file update and the current quotations.

Solution: Yahoo finance historical quotations


We know that this service is not suitable for "real-time" quotations in case of the Spanish stock exchange, however, it is perfectly good for getting historical quotations during a period of time and offering more information such as:
- Volume
- Open, close, high and low prices
- Statistical data, such as mobile means, etc.

API Description


There seems to be no documentation for this interface at all (or at least, I have not been able to find it), however, I found some information about it buried in this 2009 blog article.

Basically, you need to compose a URL with the stock ticker and some time delimiters, for instance:


New Spring integration flow


As I had to replace the logic for reading the old static files, I created a new flow defined entirely in a separated XML. Basically what it does is:
- Reads from the application settings which stocks are we going to retrieve.
- Reads the time range to recover.
- Performs one request per Stock to the specific URL.
- Parses and converts the received CSV to our internal format.
- Publishes the new POJOs in to the topic so the  consumer can use this information seamlessly (it does not really care how the POJOs were built as long as the contract is respected).

Application.properties


HistoricalCSVRequestCreator

This class will receive a signal from Spring ApplicationContext as soon it is initialised (as it implements the interface ApplicationListener), it will parse the property containing the stocks to be retrieved and finally it will inject one message in the downstream channel per stock (Note that the ticker code is set as message header).


HTTP Outbound Gateway

This is the XML configuration for the component that will perform the request. Note how the target ticket is retrieved from the message header ticker and how the range of time is taken from the properties file.
And that´s it! As soon as you see the application running, a stream of data will be loaded into the ElasticSearch node so you can perform studies with wider statistical information.

As usual you can find the code in this GitHub repository. Feedback is always welcome!

Related articles


  1. Live Stock data as input for your Spring application
  2. Implement a Spring Integration flow with Spring Boot
  3. Integrate the application with ElasticSearch
  4. Represent and search your data with Kibana
  5. Deploy your Spring Boot microservice in a Docker container