Sunday, 31 January 2016

Calculating Moving Averages with Spark MLLib

Update on Friday. 13rd of January, 2017

This post is about the implementation using the Java API. If you want to see the same example of calculating Moving Averages with Spark MLLib but using Scala, please check out this post.


Since I began learning Spark, one of the things that given me the most pain is the Java API that wraps the Scala API used in Spark. So, I would like to share this problem I had using the Spark Machine Learning library in case is useful for anybody learning Spark. Also, it´s nice to take a look to the possibilities that the sliding windows give us when working on timed data.

The problem

As I wrote in my previous post, I created a new independent module to make studies on Stock data using Spark. One of the most interesting (and basic) studies one can make over Stock data, is to calculate the simple moving averages.

What is a simple moving average (SMA)?

If we take a window of time of length N, the SMA is:
Source Wikipedia:
Wait, what? In plain English: For every element, we take the N precedent elements and calculate the average. The aim is to smooth the data in order to easily detect underlying patterns in the data.

One possible solution: Spark MLLib

Searching around in internet, I found that the Spark Machine Learning Libray has a sliding window operation over an RDD. Using this, it would be very easy to calculate the SMA as described in this StackOverflow answer.

Note: The RDD must be sorted before appliying the sliding function!

The caveat is that the previous code is written in Scala. How would it look like in Java?

Saturday, 30 January 2016

Orchestrating your microservices with Eureka and Feign


This is a well known scenario: It´s time to add some new functionality to your application and you decide to add a new member to the family of micro-services.
In this example, we are going to use a service called "Sparkker" that will consume stock quotations from anothed service called "Stokker" and will publish the results to another service called "Portfolio-Manager"

This is a diagram showing the whole picture:

It´s obvious that the number of lines connecting the services increase and increase, making the maintenance and configuration of the system very tedious. How can we solve it? We can´t spend the whole day coding integration code, we need to deliver added value!

One possible solution to all this mess is to use some of the components from the Netflix Stack that have been incorporated to the Spring Cloud project and that are very simple to use within Spring Boot. Interested?