{ Make this readable }
Showing posts with label tech. Show all posts
Showing posts with label tech. Show all posts

Friday, February 24, 2017

Spring 2017 tech reading

Hello and a belated happy new year to you! Here's another big list of articles I thought was worth sharing. As always thanks to the authors who wrote these articles and to the people who shared them on Twitter/HackerNews/etc.

Distributed systems (and even plain systems)


SQL lateral view

Docker and containers

Science and math


Java streams and reactive systems

Java Lambdas

Just Java

General and/or fun

Until next time!

Wednesday, November 16, 2016

Fall 2016 tech reading

It's almost the end of the year, so here's another big list to go through while you wait at the airport on your way to your vacation.


Streaming JSON and JAX-RS streaming

Java Strings

Data - Small and Big

Channels and Actors



Misc Science/Tech

Misc - Fun, Tidbits

Happy holidays!

Tuesday, July 26, 2016

Distraction free environment

Hello! I thought I'd share with you how I'm trying to maintain a distraction free environment at work (and at home).


Friday, July 22, 2016

Summer 2016 tech reading

Hi there! Summer is here and almost gone. So here's a gigantic list of my favorite, recent articles, which I should've shared sooner.


Other languages

Reactive programming

Persistent data structures



Systems and other computer science-y stuff


Until next time! Ashwin.

Saturday, March 26, 2016

Spring 2016 tech reading

Hello, Spring is here and so is another set of links that I've bookmarked for your reading pleasure.


Cassandra and other distributed systems



Functional programming

Some DevOps

Misc tech

Until next time!

Sunday, December 06, 2015

Fall 2015 tech reading


Big systems:
Until next time!

Saturday, October 10, 2015

Late summer 2015 tech reading

This should keep you busy for a few weekends.

(Once again, thanks to all the people who shared some of these originally on Twitter, Google+, HackerNews and other sources)


Java Bytecode Notes:
Java 8/Lambdas:
Tech Vids:
Some old notes on SQL Cubes and Rollups:
Until next time!

Wednesday, August 12, 2015

Summer 2015 tech reading and goodies

Graph and other stores:
  • http://www.slideshare.net/HBaseCon/use-cases-session-5
  • http://www.datastax.com/dev/blog/tales-from-the-tinkerpop
  • TAO: Facebook's Distributed Data Store for the Social Graph
    Architecture & Implementation
    All of the data for objects and associations is stored in MySQL. A non-SQL store could also have been used, but when looking at the bigger picture SQL still has many advantages:
    …it is important to consider the data accesses that don’t use the API. These include back-ups, bulk import and deletion of data, bulk migrations from one data format to another, replica creation, asynchronous replication, consistency monitoring tools, and operational debugging. An alternate store would also have to provide atomic write transactions, efficient granular writes, and few latency outliers
  • Twitter Heron: Stream Processing at Scale
    Storm has no backpressure mechanism. If the receiver component is unable to handle incoming data/tuples, then the sender simply drops tuples. This is a fail-fast mechanism, and a simple strategy, but it has the following disadvantages:
    Second, as mentioned in [20], Storm uses Zookeeper extensively to manage heartbeats from the workers and the supervisors. use of Zookeeper limits the number of workers per topology, and the total number of topologies in a cluster, as at very large numbers, Zookeeper becomes the bottleneck.
    Hence in Storm, each tuple has to pass through four threads from the point of entry to the point of exit inside the worker proces2. This design leads to significant overhead and queue contention issues.
    Furthermore, each worker can run disparate tasks. For example, a Kafka spout, a bolt that joins the incoming tuples with a Twitter internal service, and another bolt writing output to a key-value store might be running in the same JVM. In such scenarios, it is difficult to reason about the behavior and the performance of a particular task, since it is not possible to isolate its resource usage. As a result, the favored troubleshooting mechanism is to restart the topology. After restart, it is perfectly possible that the misbehaving task could be scheduled with some other task(s), thereby making it hard to track down the root cause of the original problem.
    Since logs from multiple tasks are written into a single file, it is hard to identify any errors or exceptions that are associated with a particular task. The situation gets worse quickly if some tasks log a larger amount of information compared to other tasks. Furthermore, an unhandled exception in a single task takes down the entire worker process, thereby killing other (perfectly fine) running tasks. Thus, errors in one part of the topology can indirectly impact the performance of other parts of the topology, leading to high variance in the overall performance. In addition, disparate tasks make garbage collection related-issues extremely hard to track down in practice.
    For resource allocation purposes, Storm assumes that every worker is homogenous. This architectural assumption results in inefficient utilization of allocated resources, and often results in over-provisioning. For example, consider scheduling 3 spouts and 1 bolt on 2 workers. Assuming that the bolt and the spout tasks each need 10GB and 5GB of memory respectively, this topology needs to reserve a total of 15GB memory per worker since one of the worker has to run a bolt and a spout task. This allocation policy leads to a total of 30GB of memory for the topology, while only 25GB of memory is actually required; thus, wasting 5GB of memory resource. This problem gets worse with increasing number of diverse components being packed into a worker
    A tuple failure anywhere in the tuple tree leads to failure of the entire tuple tree . This effect is more pronounced with high fan-out topologies where the topology is not doing any useful work, but is simply replaying the tuples.
    The next option was to consider using another existing open- source solution, such as Apache Samza [2] or Spark Streaming [18]. However, there are a number of issues with respect to making these systems work in its current form at our scale. In addition, these systems are not compatible with Storm’s API. Rewriting the existing topologies with a different API would have been time consuming resulting in a very long migration process. Also note that there are different libraries that have been developed on top of the Storm API, such as Summingbird [8], and if we changed the underlying API of the streaming platform, we would have to change other components in our stack.
Until next time!

Monday, June 01, 2015

Spring 2015 reading list

Here's a giant list of articles I read and liked (hat tip to people I follow on Twitter/Blogs. I'm just re-sharing this):

Sunday, April 12, 2015

A simple guide to using Unix/GNU Linux command line tools for fiddling with log files (*runs on Windows too)

I've been meaning to write this post for years now. Every time I thought about compiling a basic list, I've told my self "Nah.. there must be tons of examples on the net". Yes there are tons of them but I couldn't find anything:

  • That helped absolute noobs with a consolidated list
  • That demonstrated actual fiddling with Java log files
  • Something that works on Windows(!) No, I don't mean the awful Cygwin tool but something like Busybox or the wonderful Gow
So, here it is:

Sunday, February 01, 2015

Starting 2015 with yet another link dump

A belated happy new year! Here's some reading material I've been accumulating for a few months.

Distributed systems:

Performance related:
On tuning:
Misc tech articles:
Formatting comments on Gerrit:
That's it for now!