There are already tools which can look for duplicate files such like fdupes or mp3dup. These tools are coded in C, and works the same way.
These 2 tools are aproximatively the same, and the difference between both are :
The main liminations for this kind of tool comes when for instance you burn a part of your files on CD-ROM, to save space on you hard disk. After that, tools can't tells you if one files of your hard drives is already in one of your CDs. Nodupes can.
To bypass this limitation, nodupes can stores the size and the md5sum of files from your CD. Nodupes can handle various type of media like : CD-ROMs, audio CDs, diskettes, hard drives, network drives, DVDs, or any other media that can be accessed as a directory.
You can at last know if your downloaded files (videos/mp3/divx/pictures) are already in one of your medias. Of course nodupes also tells you on which media files are.
One another limit of fudupes or mp3dup, is that they only look for duplicate files. They can not find duplicate directories. Nodupes can (Not for the moment but it will be able to).
Nodupes is using the size and the md5sum of each file to detect duplicate files. Fdupes and mp3dup make an extra diff to ensure files are really duplicate, nodupes can't because some of files could not be physically diffed (if they are on your CD for instance). So you can't trust nodupes, it can lies you and gives you false duplicate. But in my opinion this case is very rare. If you find 2 different files with the same md5sum and the same size, please email me.
Nodupes should be abe to have this features in a more or less future :