{ Make this readable }
Showing posts with label tech. Show all posts
Showing posts with label tech. Show all posts

Tuesday, April 08, 2014

April tech reading

Here's a bunch of stuff I found to be of some interest and relevance. Happy reading!
An Apache HTTP client "bug"/weirdness I ran into recently, which would end up consuming a large number of ephemeral ports (client side) instead or reusing connections - fix description. The ports would end up waiting in TCP_WAIT state for a long time and the client would eventually stop, unable to make any new requests.

Big data stuff. Naturally, any list is incomplete without big data: 
IntelliJ 13.1 and Git weirdness:
Random, clever tech stuff:
Until next time!

Wednesday, January 29, 2014

This month's good tech reading

(Many of these links I discovered in my Google+, Twitter, HN or RSS feeds. I don't take credit to be the first to find them)

Until next time!

Sunday, January 19, 2014

Rsync in Java - a quick (and partial) hack

Over the years I've been (mildly) fascinated by how various version control tools and file backup utilities work. Especially the core algorithm that drives many of these file send/diff/backup/de-duplication programs.

Rsync being the most widely used tools and the basis for many extensions, I naturally tried to wrap my head around it's working. But I thought the details were somewhat hazy. Maybe it was just me but I was looking for a simpler, clearer implementation of the algorithm and not a fully functioning program.

Recently, I gave it another shot. I waded through some of the material available on the interwebs and bravely set out to implement it to see how much of I had understood.

So, here is the basic implementation in Java. It may not be a faithful implementation of the paper but the gist is:

  • Create a summary out of fixed blocks of input text (original)
  • Use these blocks as reference against another text (modified)
    • This modified text is slightly different from the original text
    • Hence the assumption that the original text can be transformed to the modified text without having to send the entire modified text back
  • The modified text can now be transformed into a combination of:
    • References to those original blocks where there were no changes
    • And any differences as simple text
The code is available here and the same is embedded at the bottom of this post.

Some notes on the implementation:
  • It only handles Java Strings
  • It uses a combination of Rabin-Karp rolling hash for quick, incremental hashing of blocks and CRC32 for hash conflict resolution. In reality a much more robust hash should be used instead of CRC32
  • It assumes that the list of generated "blocks" is available on the other side to generate the patch. In reality there has to be a more clearly defined mechanism/protocol to exchange these blocks
  • The overall algorithm to identify common/repeating hashes should be smarted than this
Until next time!

Sunday, December 15, 2013

Java/tech stuff I found on the internet (Dec 2013 edition)

Networking and big data:

Java/JVM perf:
Java memory model + arrays + visibility/ordering:
Happy holidays!

Sunday, November 24, 2013

Analyzing large Java heap dumps when Eclipse Memory Analyzer (MAT) UI fails

If you find yourself trying to analyze a big heap dump (20-30GB) downloaded from your production server to your staging/test machines.. only to find out that X-over-SSH is too slow then this article is for you.

As of Nov 2013, we have 2 options - Eclipse MAT and a hidden gem called Bheapsampler.

Option 1:
Eclipse Memory Analyzer is obviously the best tool for this job. However, trying to get the UI to run remotely is very painful. Launching Eclipse and updating the UI is an extra load on the JVM that is already busy analyzing a 30G heap dump. Fortunately, there is a script that comes with MAT to parse the the heap dump and generate HTML reports without ever having to launch Eclipse! It's just that the command line option is not well advertised.

Command line heap analysis using Eclipse MAT:

Assuming Eclipse MAT is installed and we are inside the mat/ directory, modify MemoryAnalyzer.ini heap settings to use a large heap to handle large dumps:


Run MAT against the heap dump:

    ./ParseHeapDump.sh ../today_heap_dump/jvm.hprof

This takes a while to execute and generates indices and other files to make repeated analysis faster. Then use the indices created in the previous step and run a "Leak suspects" report on the heap dump.

    ./ParseHeapDump.sh ../today_heap_dump/jvm.hprof org.eclipse.mat.api:suspects

The output is a small and easy to download jvm_Leak_Suspects.zip. This has HTML files just like the MAT Eclipse UI. It can be easily SCP'ed/emailed around.

Other report types possible.

More details - http://wiki.eclipse.org/index.php/MemoryAnalyzer/FAQ.

Option 2:
http://dr-brenschede.de/bheapsampler is something I chanced upon. It is a sampling heap dump reader and so it works for very large heap dumps where MAT sometimes fails. Being a sampling reader, the output is also a little imprecise but helps a great deal when you have nothing else. The tool seems to be closed source and is very sensitive to heap dump corruptions.

As an aside, here's something that might be useful for the initial heap dump quickly - https://blogs.atlassian.com/2013/03/so-you-want-your-jvms-heap/.