<<< Timothy W Macinta's Website Menu >>>
Home | My Java | Resume/CV | R&D | Age Detector | Contact
Timothy W Macinta - Contract Software Development

 

SPONSORED LINKS

Boston real estate market
Analysis for your life's
largest investment
www.bostonbubble.com
 

YOUR AD HERE

Fast MD5 Implementation in Java(TM)


Java's Built-In MD5 Support

You don't need to use this Fast MD5 Implementation to get an MD5 hash in Java (though you are certainly welcome to). The standard edition of Java comes with MD5 support built in. You might want to use this Fast MD5 Implementation if one or more of the following applies:

For those of you who ended up on this page searching for how to calculate an MD5 hash in Java, here's the gist of how to do it (without exception checking):
    MessageDigest digest = java.security.MessageDigest.getInstance("MD5");
    digest.update(...your data here...);
    byte[] hash = digest.digest();
You can then convert the hash into the familiar hex format (e.g., "d41d8cd98f00b204e9800998ecf8427e"), if you wish. If you don't know how to do that, the Fast MD5 Implementation can do it for you - download the code and then check out the tutorial.

How Fast Is It?

Short answer: Much faster than any other Java implementation that I have tested and (surprisingly) even faster than the native, non-Java MD5 implementation on some systems.

Long answer: First of all, it is important to note that the term "fast" is used here in relative terms. The implementation of the MD5 message digest algorithm available on this page is written in Java and is fast compared with other implementations written in Java, both because it is heavily optimized by itself and because there is an optional native method that makes it even faster when the platform supports it. How it compares to a sensible implementation written in a language, such as C, that is compiled directly to machine code, is heavily dependent upon how good of a job the JIT compiler in your JVM does in compiling the code or whether you are able to use the optional native method.

Here is a table detailing the amount of time it took to checksum one particular file of size 679,477,248 bytes on one particular Linux system using various permutations of different MD5 code from within the JDK 1.4.1 from Sun, where applicable (see below for more recent test results):

Implementation system user elapsed sys+user
Default Java Impl - The default MD5 implementation that ships with the JDK (attainable through java.security.MessageDigest). 6.423333 78.70833 213.7667 85.13167
Default Java Impl Interpreted - The default MD5 implementation that ships with the JDK (attainable through java.security.MessageDigest). The JVM was forced to run in interpreted only mode. 7.756667 1669.12 2275.383 1676.877
Fast MD5, No Native - The Fast MD5 implementation on this web page. Native methods were disabled. 6.121667 43.40667 215.365 49.52833
Fast MD5 Interpreted, No Native - The Fast MD5 implementation on this web page. Native methods were disabled. The JVM was forced to run in interpreted only mode. 8.526667 826.3383 1060.125 834.865
Fast MD5 - The Fast MD5 implementation on this web page. Native methods were enabled. 6.336667 28.345 212.8033 34.68167
Fast MD5 Interpreted - The Fast MD5 implementation on this web page. Native methods were enabled. The JVM was forced to run in interpreted only mode. 6.43 27.895 213.1483 34.325
md5sum Binary - The Linux native "md5sum" program distributed in the textutils-2.0.14-2.i386.rpm with Red Hat Linux 7.2. 4.53 55.34667 211.2883 59.87667
Speed comparisons were made of the above MD5 message digest algorithm implementations by using the "time" program to roughly time how much time each implementation took. As can be seen from above, all the implementations took roughly the same amount of real time to execute, except when the JVM was run in interpreted mode (in which case it took much longer). This indicates that file I/O was likely the bottleneck. Therefore, the MD5 implementation that ships with the JDK will be adequate in a large number of cases. On the other hand, my fast implementation will be useful when the underlying I/O is very fast (e.g., if the data to be hashed is being created on-the-fly in memory), the CPU is inherently slow, CPU cycles are scarce (e.g., many other programs are running at the same time), the Java VM being used does not provide an adequately fast JIT, or in other cases where it is desirable to minimize CPU usage.

The very surprising (for me) thing to note is that the Fast MD5 implementation outperforms the native "md5sum" binary even when the native methods aren't used. Oddly enough, the same did not hold true on Windows where the native binary was faster in all cases (perhaps the underlying I/O implementation in the Linux JVM is more efficient than the same in the Windows JVM).

On November 19, 2009 I reran the most important of the tests using Java build 1.6.0_17-b04 on Windows Vista using an even larger file. A speed advantage still existed. The Fast MD5 implementation yielded an average 26% savings in sys+user time.

Making It Even Faster

Thanks to Benjamin "Quincy" Cabell V for sponsoring the addition of an optional native method to the MD5 package. Now that the native method has been added, the remaining optimizations that could be made to this package are no longer as dramatic as they used to be. If you try my optimized implementation and decide that you still need something even faster, try the following:

The Code

This implementation has been derived from code originally written by Santeri Paavolainen and was retrieved from his website at http://www.cs.hut.fi/~santtu/java/ . The initial changes that I (Tim Macinta) made were heavy optimization of the code, some bug fixes, and I replaced Mr. Paavolainen's test suite (which was under a separate license) with an expanded test suite of my own. I also moved the code into a Java package because it was originally distributed without its own package. I attempted to contact Mr. Paavolainen to see if he was interested in incorporating my changes into his distribution, but after about a week I had not heard back from him so I decided to make my changes available on my own website.

Browse the Javadoc documentation.

Highlights of the distribution contents:

Please Link To This Page

If you download the code and find it useful, please link to http://twmacinta.com/myjava/fast_md5.php to show your appreciation. This is not a requirement of the license, it is just a friendly request. This helps me because it helps spread the word about my contract software development services. It helps you because the more interest my MD5 library receives, the more it will be improved. Also, the more interest it receives, the more I will be encouraged to release other libraries I have sitting around. Linking to this page is a free, easy way to show your support.

Download Version 2.7.1

The code in the following download is provided under the GNU LGPL version 2.1, or (at your option) any later version. If you want to discuss licensing under different terms, contact me (distributing the code under a license that is not compatible with the GNU LGPL would require a full rewrite of the code because the code was derived from code under the GNU LGPL).

Have you read the GNU LGPL version 2.1, and do you agree to its terms?


Yes, I Agree
 
No, I Don't Agree
 

J2ME MIDP/CLDC Versions (Preliminary)

Note: you may be able to use the MD5 hashing support that is built into Java even in J2ME MIDP/CLDC, if your target platforms all come with The SATSA-CRYPTO Optional Package and the MD5 algorithm. Usage should be very close to the example above for Java's Standard Edition (the call to digest() will require parameters, in this case). If you cannot guarantee that all of your target platforms will have the necessary support, you can simply use the Fast MD5 Implementation instead.

There is no official J2ME MIDP/CLDC version of the Fast MD5 Implementation yet, but you can probably use the distribution in J2ME anyway, with a tiny bit of effort. Multiple people have been kind enough to contribute changes back to make the library work in J2ME, I just haven't personally tested them, or integrated them with the build process yet, and I am providing them as a convenience before official support is added. Alternatively, you can also use the regular distribution and strip it down to work in J2ME. The options in detail are:

Upcoming Versions

I generally release new versions of the Fast MD5 distribution either when I add a major new feature, when I have more than a few minor improvements accumulated, or when there are bugs found. You can receive notifications of new releases by subscribing to the project at freshmeat.net. Looking toward the future, here are some other things that I think might be nice to add to the distribution, but which I'll have to weigh against my free time and motivation (and you can always motivate me to add your own favorite feature by contacting me about contract development):

Quick Tutorial

Do you want to calculate the MD5 hash of a file from within your Java program? Here's a quick way to do it with the Fast MD5 Distribution (just set filename to the name of the file that you want to checksum):

    String hash = MD5.asHex(MD5.getHash(new File(filename)));

Do you want to calculate the MD5 hash of a string from within your Java program? Here's a quick way to do it with the Fast MD5 Distribution (just set myString to the string that you want to checksum):

    MD5 md5 = new MD5();
    md5.Update(myString, null);
    String hash = md5.asHex();

Note that the above code will convert the string to an array of bytes using the ISO-8859-1 character encoding method. You can specify a different method using the second parameter to Update() or you can leave out the second parameter entirely to use the target platform's default encoding method.

Software Using This Library

Want to add your software to this list? Contact me.

Acknowledgments

Thanks to all the contributors:

Change Log


Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries.

All Pages, Images, and Other Content Copyright © 1997 - 2024 Timothy W Macinta , except where noted. All Rights Reserved. The "Tim Macinta Now" button may be used on web pages that are external to this site to provide a link back to this page. For usage guidelines on KMFMS artwork please see http://www.kmfms.com/usage-guide.html.