Use hash codes to avoid file duplicates

For each file censhare calculates and manages hash codes on the file system.

censhare calculates hash codes for storage items and manages them in the database.

The hash code is applied on new files for storage and creation of the file name on the file system. The purpose for this is that even after repeated opening and saving of an asset, the user creates no additional files and thereby enabling systems with many assets to save a great deal of memory on their file server.
A new field "hashcode" is stored in the table "storage_item". As a new file is being created on the file system, a new hash code is also generated in parallel. In addition, both the local file system and remote file servers are supported.

Example

Using the hash code implementation, duplication of files will be avoided on the file server right from the outset because import assets with identical file content are filtered. This is especially helpful if you need to merge data between two servers, with the data of one migrated into the database of the other. When an asset cannot be found through its hash code during the migration, then the hash calculation is interrupted and a message is written into the log. The operation then continues normally with the remaining assets.

Configuration

The Secure Hash Algorithm SHA-1 is used which calculates the unique codes for the files. It is highly improbable that two assets will ever have the same code by chance.

When setting up the extended "Find assets duplicates" module under "Asset duplicate" in the Admin-Client the hash code can be selected as a feature to compare.