Thursday, 4 February 2016

Video: Lambda architecture with Spring XD and Spark

Summary


Today I do not have any piece of code worth showing you. Instead, I would like to quickly comment on this YouTube video, which I have seen recently, that is coming from one the SpringOne2GX 2015 event (and later released to the public).



Note: The fact the these talks are published more and more often (like what happens with the DockerCon talks) kinda puts me off from paying the high premium involved in physically assisting to these events...



The Lambda Architecture


Although the title reads about implementing a stock prediction system based on machine learning, what really caught my attention here is the architecture being used: A lambda architecture.

Source: Slide Share

This type of software architecture is quite famous nowadays with its two-pronged structure:

  • There is a "fast" data track, executed in memory, where data is quickly processed (sometimes, as in the current example, using reactive programming languages such as R).
  • A "batch" data track, where data is persisted to disk in order to be analyzed carefully later one.
  • Both tracks feed on the same type of data (which is usually defined by a time-stamp).


Tools being used for this project


Is quite striking how fast they seem to have developed this application and, I think, that is due to their tool choices:

  • Spring XD
    • Used as "glue" in the background easily allows you to define flows (much in the style of Spring Integration) but with the advantage of running them in a cluster, effectively enabling distributed computing.
    • It also has a command line interface for creating/deploying jobs and a Web interface to ease up the task.
    • Note: Spring XD comes also with a single-node setup, so you can easily play around with it.
  • Apache Geode
    • Distributed Key value data store (as they put it, like a Hash-Map on steroids)
    • Open Source version of Pivotal Gemfire
  • Spark ML Lib
    • In this example the usage of Spark is somewhat limited, only using MLLib to run lineal regressions to perform the predictions.
  •  R:
    • Introduced in the fast data track, R is used here to calculate the statistical indicators that are later fed to the Spark Module

I hope you enjoy the video, I think is very interesting. As they say, it is not about the stock prediction they do, is about what you can do with this kind of setup.

Sources

No comments:

Post a Comment