Sunday, November 21, 2010

Scalable compute & storage frameworks - A Refcard (in progress)

If you have been closely following the NoSql space or even shown a mild interest in scalable technologies such as Compute Grids, Data Grids, Distributed Caches or the countless other terms that people use interchangeably - you have probably realized that most Architects do not have the time or the resources to investigate the sift through the noise and decide on what to use.

Since I've had some experience using one such framework and also because I follow the progress of some others, I thought it would be helpful to everyone if I put together some information.

Please share and contribute information. Spread the word. Your efforts will be acknowledged. Ask for permission to work on the Wiki and Spreadsheet.
What I have done is created a Google Code project: scalable-frameworks where I hope I can spare the time to keep it updated and enlist some help from the community at large to gather correct information.
  • The intention is for it to serve as a ready reckoner and not be complete or authoritative
  • Performance is a criterion that has consciously been excluded from the lists here to avoid flame wars
  • For the full information it would be best to visit the actual product's/project's website
  • It is not official, nor has it been prepared by thorough research
  • If you have questions or would like to clarify/contribute, please get in touch
To start with, there are 2 parts:
  1. A very simple introduction with images describing the basic concepts
  2. A spreadsheet that is meant to serve as a ready reckoner - to help you choose the right framework/platform
    • It has some basic features listed
    • Pay attention to the features that you would find most useful and pick the project that has most/all the ones you are looking for

Basic concepts:
To help understand the basic processing and storage idioms being explored, here are a few images:


"Store and retrieve" (Scatter-Gather) on a cluster of compute + storage nodes:































"Store, notify changes, apply changes" (Scatter-Relay-Compute) on a cluster of compute + storage nodes:


































"Store, notify changes, calculate, notify new calculation result" (Scatter-Relay-ComputeAlert) on a cluster of compute + storage nodes: 





































Refcard:
(Full Spreadsheet)




Until next time!

Wednesday, November 17, 2010

Variety is the spice of (the Architect's) life

A mind map of the various aspects of software - from an architect's point of view:

Saturday, November 13, 2010

Hiking in Memorial Park (Off CA 84)

I decided to go hiking in one of the parks off CA-84 (La Honda). It's a longer drive from I-280 and goes closer to the coast. Driving on LA Honda is also quite fun if the weather is good. I had hiked in the same area a few years ago.

I went to Memorial Park, paid the $5 registration fee and went wandering around the empty camp grounds. I could not find a map anywhere (unlike the other parks) so I just went down to the creek from Tan oak camp grounds and walked along the creek, against the flow. After about a 1/4 mile, you reach a bridge which is actually part of Pescadero Creek Road. There are many camp grounds here. I crossed the bridge and wandered around in the Wurr campgrounds. Still no sign of a trail.

So, I came back to the road and then saw the entrance to Pescadero Creek County Park - Hoffman Creek Trailhead. I went in and kept going. There's a map here but Pomponio trail appears to be in both parks. It's confusing.



After about 10-15 minutes, there's a a junction where Old Haul Road (the one you will on) makes a left. I turned left and kept walking. Then you see a sign that says Pomponio Trail. Make a left here, cross the creek. There is no bridge. If it has rained, I don't know if it's safe to cross the 10 ft span of ankle deep water. There are rocks to step on to help cross the creek.

Pomponio trail looks like an out-and-back trail. There is no loop. I went in for a while and then headed back the same way.

Overall, it's a nice place. No crowd. I was worried that I did not see any fellow hikers until I saw just 1 couple on their way back. Camping here would also be a fun - very convenient too considering how close it is to 280.



Until next time!

Friday, November 12, 2010

If only the world were immune to diseases like it is to logic...

Tuesday, November 09, 2010

JMS spec - Time for an upgrade?

The last JMS specification was written 8 years ago! Since then the world has seen multi-core 64 bit systems, Hadoop, compute grids, data grids, AMQP, ZeroMQ, NoSql, Twitter, Facebook ..... and still JMS 1.1 is the backbone of enterprise systems.

There are a few things however, that are missing sorely from the spec and consequently from any implementation in a "standard" way. Each provider no doubt has an answer to some of its limitations but the emphasis is on the lack of a standard way.

8 years ago, running a whole swarm of machines as a single cluster was rare. But in today's world it is not. It is exactly here that the spec is lacking. I have listed some features below that I would find useful.

Message acknowledgment:

  • Out of order Acks: Acknowledging messages individually and in an order that is different from how they were received. The spec is vague about this. In some systems acknowledging a message will automatically ack all messages that have been received so far in that session! TIBCO EMS has Explicit Client Acknowledgement that is non-standard but is certainly very useful
  • Negative Acks: In some systems, if a message causes an exception at the receiving end, that message will not be redelivered by the server until the original client session disconnects. It would've been nice if the spec allowed some kind of a Negative Ack to force the server to redeliver the message to the same or a different session
  • Batch Acks: Since JMS is used in many setups to correlate multiple messages coming from different queues and considering that Ack'ing is expensive at high message rates; JMS should have a facility to accept a batch acknowledgment of many JMSMessageIds. Similar to JDBC batch statements
  • Disconnected & Federated Acks: In complex systems where messages can flow through multiple tiers of application servers either as the original JMS message or as an enriched DTO/VO, allowing the final tier in the flow to acknowledge the message purely by JMSMessageId would be very useful. Currently, only the client that received the message from the server can acknowledge it. Also the message has to be held in memory for the duration. In SEDA systems this does not work well and forces the developers to jump through unnecessary design hoops
Message selectors:
  • Smart selectors: JMS Selectors are very static and expensive to apply at high message rates. They were meant to work as a rudimentary form of Content based routing. What is really needed is a way to let the client run custom logic (like RMI) through the queue and pick what it needs. This way clients should be able to use a modified form of QueueBrowser and consume anything it finds interesting
  • Context aware routing: ActiveMQ has a non-standard feature called Message Group that can be used to perform smart message routing using a custom header. This would be a welcome feature where data aware routing would provide a huge performance improvement by exploiting data locality to deliver related messages to the same application server and thereby avoiding data/cache thrashing
Also, read this for a different take on the future of messaging.

Until next time, cheers!