Summary
It´s been a while since I wrote anything here (again) but I don´t have much free time nowadays. Currently I´m taking some training on Scala and Spark, which, by the way, brings us here today.
A recipe for quick prototypes for Data Analysis: Scala + Spark + Zeppelin
If you remember well, I wrote some time ago about some personal learning projects I was working into, which basically picked stocks price information from the Web (using Spring Integration) and ran a couple of Spark analysis that were lately displayed in an AngularJS interface.
Nothing complicated at all, but rather verbose and time consuming to set up, specially if you just want to learn the subject.
With Apache Zeppelin, Scala and Spark, a prototype with basically the same functionality, would be reduced to these two code snippets in a Zeppelin notebook:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Data format | |
// |Date |Open |High |Low |Close |Volume |Adj Close | | |
// |2017-01-11 |2268.600098 |2275.320068|2260.830078|2275.320 |68,3620410000 |2275.320068 | | |
case class StockQuotation(ticker: String, date: String, val close: Float, volume: Double) | |
// We get the data from here | |
val url = "http://chart.finance.yahoo.com/table.csv?s=^GSPC&a=0&b=3&c=1950&d=0&e=12&f=2017&g=d&ignore=.csv" | |
// Read the line and build our business objects | |
val stockText = sc.parallelize(IOUtils.toString(new URL(url), | |
Charset.forName("utf8")).split("\n").drop(1)) | |
val stockRDD = stockText.map(x => x.split(",")).map(x => StockQuotation("SP500", x(0), x(6).toFloat, x(5).toDouble)) | |
stockRDD.cache() | |
val stockDF = stockRDD.toDF() | |
// Define a sliding window function and return the avg of each window and the first date of it as a dataframe | |
val slidingRDD200 = RDDFunctions.fromRDD(stockRDD).sliding(30) | |
val sma200 = slidingRDD200.map(window => (window(0).date,window.map(x => x.close).sum / window.size)) | |
val sma200DF = sma200.toDF().withColumnRenamed("_1","date").withColumnRenamed("_2","sma") | |
// Join the price datafram and the one with the moving averages | |
val joinedDF = stockDF.as("a").join(sma200DF.as("b"), $"a.date" === $"b.date").select("a.date","a.close","b.sma") | |
// We will query this table later on | |
joinedDF.registerTempTable("SP500") |
That reads the data, creates a Dataframe with the price and the moving average and declares a temporal table.
Later on, you can query it with a regular SQL query and its result set will be available as data source for a number of graph types:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
%sql | |
select * from SP500 | |
where date > "2016-05-01" and date < "2017-01-15" | |
order by date asc |
This will produce this nice looking graph after just some minutes of coding, showing the index price and our friend, the Simple Moving Average (SMA):
The notebook I´m talking about is hosted in ZeppelinHub and you can access it here.
And that's it! No project setup, no unnecessary pain with Angular JS libraries and no horrible Java Spark API. If you are interested, you can find below the links to the articles explaining the topics seen here today and how were they implemented in the traditional way.
Cheers!
Resources
- Consuming a REST service in Scala- Calculating Moving Averages with Spark MLLib
- Stock quotations as input for your Spring application
- Calculating and stock indicators with AngularJS and Spark
Hey Victor,
ReplyDeleteI would like to showcase my Github code on blogger as well.
https://gregperrypage.blogspot.ca/
Would you direct me to the documentation that allows me to do that?
I like your blog and I want to simulate the same look and feel, but I would like some direction.
Greg
gtfperry at gmail dot com
DeleteHi Greg,
DeleteThanks for your comment! Well... actually I dont think there is any documentation per se about writing code-related stuff...
I can share you what I use, however you might find other ways to do that, that are much better:
- Post my code snippets as Github gists (see https://gist.github.com/discover for some samples)
- Then you have the option of embedding them in your blog html (see the "embed" combo). Just copy the HTML code and paste it right away.
Some colleagues of mine use Wordpress and, I have to say, the look and feel of their code blog is really cool...
Hope it helps!
Victor