Improve the performance of the cdb by adjusting compression level and index configuration.


Compression level & index rebuild

The rebuild of the censhare database (CDB) is around 50 percent faster than it was before. This is achieved by downsizing the footstep of the indexes by about 50 percent in terms of reduced memory consumption and smaller file sizes.
The admin module "Embedded database statistics" has been designed more clearly and provides further information by formatting the output via XML/XSLT. You can still find it in the Tools menu (Gear) in the Admin-Client. In addition, there is now a separate module for precise analysis of resource requirements of the data called "Embedded database data analysis." By the downward sorting order, you can easily isolate the heavyweights in the CDB. From left to right the column headings refer to Key, the name of the feature, the total number, the total size, the number of entries in the cache and the amount of memory it occupies (cache size).
The CDB now uses separate caches for temporary and persistent nodes and data. The new feature named "cleaner-max-files-to-evict-per-run" is used to limit the number of items to be released per transaction. All this relieves the cleanup process. With the change to ICU (International Components for Unicode) for the collation keys which are compressed, the size of the CDB may be reduced.
The monitoring is more precise because the actual memory footprint is tracked in bytes besides the number of entries in the database. This results in a better overview of the level of utilization of the files.
This is now calculated exactly and is no longer guessed based on the number of entries.
The translation memory is now using a separate CDB index to search for exact matches in addition to the regular database for fuzzy search.

Enhanced synchronization

The automatic synchronization is running every hour. Manual triggering of the sync process is possible and this process is much faster since only asset ID and CCN numbers are compared between the Oracle database and the CDB.

Embedded Database Data Analysis

The module displays a summary of the data set of the asset CDB and its current cache allocation.

The command is available in the Admin-Client to system administrators via the action menu. If you want to make the command available to other censhare Clients or specifically for defined groups or roles or you would like to assign a keyboard shortcut, you can change that in the module "Administration · Embedded database · Embedded database data analysis".

Comprehensive database data analysis

Configuration

Compression of the CDB files is optional, which yields to space savings of around 25 percent. The compression level is a configurable value in the Admin-Client's settings for the "Embedded Database" under "cdb-deflater-level"
Default is "0" (without compression). This can be configured with values between "0" and "9" identical to the degree of compression for RMI-server connections.

Maximum editing distance (fuzzy-max-distance and max-bitap-errors)

From release 5.4 onwards we optimized index search fuzzy parameters to avoid high memory consumption and improve performance. Here you will find what we have changed and how you can change it on your system.

Those parameters should be set for all features named censhare:text.... except for the quick search feature censhare:text.meta as this feature has a virtual fulltext configuration and no "real" fulltext index configuration like the others.

Release note

Opt: #2724376/C4-41026 Change fuzzy search default to 1 Symptom: Fuzzy search parameter fuzzy-max-distance should be set to 1 Cause: The default value of 2 causes too many results and reduces query performance Solution: Set the default values of fuzzy-max-distance and max-bitap-errors to 1. Note: The values are changed in the default settings in insert-data.xml only to avoid unexpected changes in search behavior at existing installations. It is recommended to change these settings in existing projects.

Versions 5.3.x and higher

In versions 5.x, the parameters for full-text index fuzzy search are part of the feature. The default index settings of the features are read-only. But it is possible to create a custom configuration with the little "+" button at the end of the dialog. This might even be a custom setting for a specific server. The possible configuration settings here are not new to censhare 5, they have just moved from the embedded database service configuration dialog to the feature definition. The intention was to keep all configurations concerning a feature definition in one place. Look at the screens below on how to create a custom configuration. There is no further action needed. It is no CDB rebuild or server restart necessary.

1. Open the default configuration:



2. Set up a customer configuration: Use the same settings as in the default setting, except both marked.

Do this for all censhare:text... features except for censhare:text.meta.

There is an option to switch off the indexing for certain features. Furthermore, you are now able to prevent applying the values ​​for word frequency and document length in a full-text index. censhare administrators should take this into consideration to prevent the memory requirements (database cache) and the database size to grow uncontrollably. censhare administrators should take this into consideration to prevent the memory requirements (database cache) and the database size to grow uncontrollably.

The "use-frequency" option is disabled only for the metadata (text.meta and text.name). For these two indices, the relevance has little to do with the ratio of hits and document length, so there are really no disadvantages, but rather better predictability of the results.

The censhare full-text index, wherein the "DocLength" is significantly more important continues to use frequency. 

Configuration

Edit the XML file /app/services/assetstore/config.xml" and make the necessary or desired adjustments for your specific case.

<fulltext><index ... use-frequency ="false"/>
<index name="..." disabled="true"/>
CODE

You can disable a feature index in the Admin-Client via the check box Disable this index in Configuration/Embedded Database.

Due to the changes made in the database configuration, a rebuild of the CDB is required. The relevance of search results is now calculated a bit less accurate, but this is absolutely tolerable.

For more information, see this Wikipedia article.