I hadn't pushed out my "favorite reads of the season" for a while. So, here's a bunch of links to keep you occupied over the next few days.
Graphs, search and recommendations:
- Under the Hood: Building out the infrastructure for Graph Search
- LinkedIn's Cleo and Search (Cleo claims to have been inspired by FB's graph search)
- GraphLab - another take on large scale, distributed graph processing. Similar to Apache Giraph but not based on Hadoop code.
- Graph Based Recommendation Systems at eBay. Graphs, algebra and Cassandra. (I had to go back to basics to understand slide 9)
Statistics, machine learning presentations and resources:
- Data, Data, Data: Thousands of Public Data Sources
- Scalable and Flexible Machine Learning With Scala @ LinkedIn
- Big Data, Small Computers (H2O)
- A Practical Intro to Data Science
- Numeric Programming in Scala with Spire
While doing some research on NoSQL systems, especially Cassandra, I was surprised to hear that newer releases of Cassandra are moving away from the flexible, semi-structured column families. Instead with CQL, there is a well somewhat restrictive, repetitive schema that should work well for certain workloads. Is it me or does it look like NoSQL is grudgingly moving towards SQL?
Speaking of SQL, PostGres is moving in the other direction. Recent (9.x+) versions have some very interesting column data types - Array, HSTORE, JSON etc. Of course, its SQL support is obviously fantastic.
And finally, a nice talk on trade processing and a of paper on MongoDB for finance.