{ Make this readable }

Tuesday, April 08, 2014

April tech reading

Here's a bunch of stuff I found to be of some interest and relevance. Happy reading!
An Apache HTTP client "bug"/weirdness I ran into recently, which would end up consuming a large number of ephemeral ports (client side) instead or reusing connections - fix description. The ports would end up waiting in TCP_WAIT state for a long time and the client would eventually stop, unable to make any new requests.

Big data stuff. Naturally, any list is incomplete without big data: 
IntelliJ 13.1 and Git weirdness:
Random, clever tech stuff:
Until next time!

Wednesday, January 29, 2014

This month's good tech reading

(Many of these links I discovered in my Google+, Twitter, HN or RSS feeds. I don't take credit to be the first to find them)

Until next time!

Sunday, January 19, 2014

Rsync in Java - a quick (and partial) hack

Over the years I've been (mildly) fascinated by how various version control tools and file backup utilities work. Especially the core algorithm that drives many of these file send/diff/backup/de-duplication programs.

Rsync being the most widely used tools and the basis for many extensions, I naturally tried to wrap my head around it's working. But I thought the details were somewhat hazy. Maybe it was just me but I was looking for a simpler, clearer implementation of the algorithm and not a fully functioning program.

Recently, I gave it another shot. I waded through some of the material available on the interwebs and bravely set out to implement it to see how much of I had understood.

So, here is the basic implementation in Java. It may not be a faithful implementation of the paper but the gist is:

  • Create a summary out of fixed blocks of input text (original)
  • Use these blocks as reference against another text (modified)
    • This modified text is slightly different from the original text
    • Hence the assumption that the original text can be transformed to the modified text without having to send the entire modified text back
  • The modified text can now be transformed into a combination of:
    • References to those original blocks where there were no changes
    • And any differences as simple text
The code is available here and the same is embedded at the bottom of this post.

Some notes on the implementation:
  • It only handles Java Strings
  • It uses a combination of Rabin-Karp rolling hash for quick, incremental hashing of blocks and CRC32 for hash conflict resolution. In reality a much more robust hash should be used instead of CRC32
  • It assumes that the list of generated "blocks" is available on the other side to generate the patch. In reality there has to be a more clearly defined mechanism/protocol to exchange these blocks
  • The overall algorithm to identify common/repeating hashes should be smarted than this
Until next time!

Wednesday, January 01, 2014

Miami, Keys, Naples and Sanibel trip

This X'mas break we spent 9 days visiting Miami, Key Largo, Key West, the Everglades, Naples and Sanibel island. It was time well spent in 80F weather, on the beach (several), while it was 40F in California.

Here are somethings that we would recommend to others.

Book a hotel a couple of months in advance. The Art Deco district/Collins Ave/Ocean Drive is a great place to stay at. Hotel prices are reasonable if you book well in advance. Ocean Drive is a lively scene in the evening and a great place to have dinner, sitting in the ocean breeze.

While travel books list many things to do, we found many such places to be passable, like Little Havana, Jungle island etc. Key Biscayne, Ocean Drive/Art Deco area and the Vizcaya house are worth visiting. Save the rest of your time to just hang out on the beach.

There is one thing we would recommend people from absolutely avoiding - Miami Open City Tour bus. We were told that driving around Miami in our own car would be expensive (parking fees) and a waste of time. So using a bus tour was recommended. We were sold tickets to this particular company at the hotel and I had not done my research. This tour service had all the makings of a scam. First, their buses are a lot less frequent than other companies. After spending 2 hours running around from stop to stop hoping to flag the bus down, we just decided to use our own car after we realized that their last pick up is 4.30 pm. The next day we wanted to do the Downtown tour and the bus service, picked us up on Ocean drive, then misguided and dropped us at a wrong stop to catch the downtown bus. They then lied to us about the timings. We spent an hour at the bus stop after having spend $70 for 2 tickets waiting. It turned out that the bus that originally misguided us was the one that eventually did the Downtown tour. We were give then impression that there were many buses. Then the best part, it never stopped even after frantically waving to flag it down. So, 4 hours of frustration and good money down the drain, we drove off in our own car again.

If you really want to do the bus tour, use the other 2 more frequent Big Bus or Hop-on-Hop-off bus services.

Great weather and we just missed some heavy showers. Well, it would be naive to not expect any rain in Florida.

John Pennekamp state park in Key Largo is a good place to do kayaking in the estuary. Then we drove down to Key West.

Surprisingly, Key West does not have good beaches, despite what people normally assume about the Keys. It has to do with the coral reefs there. Mallory Square, Duval street are good places to hang out. You can also do parasailing, snorkeling, jetskiing here. Parasailing in Key West is worth doing for the views and the water color.

Fort Zachary Taylor's beach is nice, among all the other "beaches" in Key West.

Unlike Miami beaches, none of the beaches in the Keys, Naples or Fort Myers have any service. You have to bring your own chairs, drinks and of course towels.

The Everglades park has 2 main entrances - Royal Palms visitor center near Florida City and the other is Shark Valley on the way to Naples from Miami.

The Anhinga trail is worth doing. We found that driving all the way down to the Flamingo visitor center was a waste. Surprisingly we did not spot a single gator here, but lots of birds. We got bitten by mosquitoes near the Flamingo center in the evening.

The gator farm and the airboat drive at the entrance to Royal Palms is worth a visit. We recommend the last 5 pm show and its 5.20 pm airboat ride. Just in time for sunset in December.

It turns out that most of the gators are on the Shark Valley side and further north. We did not have time to do the 2 hour tram tour but we did see many gators right at the park entrance.

Naples, Fort Myers and Sanibel:
We had originally not planned on visiting the Naples area. I had only booked hotels for the first half of our trip. We had the option of either visiting Fort Lauderdale or Naples for the second leg. We were told by the locals that the beaches in the Naples area were nicer and not commercialized. After Key West's disappointing beaches we decided to give Naples/Fort Myers/Sanibel a shot. It was worth it.

The drive to Naples from Doral/Tamiami via Shark Valley was "gatorsville". All along the highway, there is a storm ditch that runs parallel to it. You can spot a gator every 100 meters. Some tortoises too.

We booked hotels day-by-day, just 1 day ahead, online. Naples downtown is nice and its beach is the best. Don't miss eating the famous Abbot's frozen custard. We were also lucky to spot a pair of dolphins in the distance.

Fort Myers parking was too crowded so we skipped it. Sanibel island's lighthouse beach is nice but parking may be a little frustrating. Bowman's beach is also nice.

Happy new year and travel safely!

Sunday, December 15, 2013

Java/tech stuff I found on the internet (Dec 2013 edition)

Networking and big data:

Java/JVM perf:
Java memory model + arrays + visibility/ordering:
Happy holidays!

Sunday, November 24, 2013

Analyzing large Java heap dumps when Eclipse Memory Analyzer (MAT) UI fails

If you find yourself trying to analyze a big heap dump (20-30GB) downloaded from your production server to your staging/test machines.. only to find out that X-over-SSH is too slow then this article is for you.

As of Nov 2013, we have 2 options - Eclipse MAT and a hidden gem called Bheapsampler.

Option 1:
Eclipse Memory Analyzer is obviously the best tool for this job. However, trying to get the UI to run remotely is very painful. Launching Eclipse and updating the UI is an extra load on the JVM that is already busy analyzing a 30G heap dump. Fortunately, there is a script that comes with MAT to parse the the heap dump and generate HTML reports without ever having to launch Eclipse! It's just that the command line option is not well advertised.

Command line heap analysis using Eclipse MAT:

Assuming Eclipse MAT is installed and we are inside the mat/ directory, modify MemoryAnalyzer.ini heap settings to use a large heap to handle large dumps:


Run MAT against the heap dump:

    ./ParseHeapDump.sh ../today_heap_dump/jvm.hprof

This takes a while to execute and generates indices and other files to make repeated analysis faster. Then use the indices created in the previous step and run a "Leak suspects" report on the heap dump.

    ./ParseHeapDump.sh ../today_heap_dump/jvm.hprof org.eclipse.mat.api:suspects

The output is a small and easy to download jvm_Leak_Suspects.zip. This has HTML files just like the MAT Eclipse UI. It can be easily SCP'ed/emailed around.

Other report types possible.

More details - http://wiki.eclipse.org/index.php/MemoryAnalyzer/FAQ.

Option 2:
http://dr-brenschede.de/bheapsampler is something I chanced upon. It is a sampling heap dump reader and so it works for very large heap dumps where MAT sometimes fails. Being a sampling reader, the output is also a little imprecise but helps a great deal when you have nothing else. The tool seems to be closed source and is very sensitive to heap dump corruptions.

As an aside, here's something that might be useful for the initial heap dump quickly - https://blogs.atlassian.com/2013/03/so-you-want-your-jvms-heap/.

Sunday, November 17, 2013

Book review: Getting Started with Hazelcast

A few weeks ago Packt Publishing sent me a free copy of their new publication - Getting Started with Hazelcast by Mat Johns to read and write about. I have used distributed caches and compute grids quite a bit at work. So, I was happy to do a quick review of this book. I've used Oracle Coherence quite a lot and Hazelcast for some experiments.

The book is a gentle guide to building distributed compute and data grids. It assumes nothing about the reader and hence does a good job of doing what it says in the book's title - "getting started". I'd advice this book for anyone who is completely new to this area which is not to be confused with Hadoop, Storm, Cassandra or the other more "popular/hyped" cousins. I would say that for medium sized data, logic heavy, transactional/near real time applications, compute grids are the way to scale out.

Obviously this book is about using Hazelcast, which is a nice Apache software licensed, Java, distributed grid/cache. It is surprisingly feature rich and in terms of usability, features and elegance it comes very close to its more expensive, older, rock solid cousin which is Oracle Coherence.

The book explores the essential aspects of using such frameworks effectively. Such as - distributed maps, replication, network partitions, fault tolerance, data affinity, moving code closer to where data is etc. It does this without being too overwhelming for first timers.

For a full and more thorough treatment I would obviously recommend the Hazelcast documentation. And if you are curious to know about other frameworks check out my old write up - Scalable compute & storage frameworks - A Refcard.