Ashwin Jayaprakash's Blog: Rsync in Java - a quick (and partial) hack

Sunday, January 19, 2014

Rsync in Java - a quick (and partial) hack

Over the years I've been (mildly) fascinated by how various version control tools and file backup utilities work. Especially the core algorithm that drives many of these file send/diff/backup/de-duplication programs.

Rsync being the most widely used tools and the basis for many extensions, I naturally tried to wrap my head around it's working. But I thought the details were somewhat hazy. Maybe it was just me but I was looking for a simpler, clearer implementation of the algorithm and not a fully functioning program.

Recently, I gave it another shot. I waded through some of the material available on the interwebs and bravely set out to implement it to see how much of I had understood.

So, here is the basic implementation in Java. It may not be a faithful implementation of the paper but the gist is:

Create a summary out of fixed blocks of input text (original)
Use these blocks as reference against another text (modified)

This modified text is slightly different from the original text
Hence the assumption that the original text can be transformed to the modified text without having to send the entire modified text back

The modified text can now be transformed into a combination of:

References to those original blocks where there were no changes
And any differences as simple text

The code is available here and the same is embedded at the bottom of this post.

Some notes on the implementation:

It only handles Java Strings
It uses a combination of Rabin-Karp rolling hash for quick, incremental hashing of blocks and CRC32 for hash conflict resolution. In reality a much more robust hash should be used instead of CRC32
It assumes that the list of generated "blocks" is available on the other side to generate the patch. In reality there has to be a more clearly defined mechanism/protocol to exchange these blocks
The overall algorithm to identify common/repeating hashes should be smarted than this

References:

Code:
Until next time!

Ashwin Jayaprakash's Blog

Sunday, January 19, 2014

Rsync in Java - a quick (and partial) hack

0 comments:

Who is Ashwin Jayaparkash?

Disclaimer

Quick links

Blog topics

Blog archive

Blog feed (Atom)