Comparison of the different search methods in the censhare Client with configuration practices.


Introduction

Most users will initially use the Quick Search, because like with most search engines, here the results already appear while users are typing. The detialed asset search, however, is more specific, if users want to look for certain asset features in combination. Here you can make heavy use of Boolean operators.

Quick Search

In a standard configuration, the Quick Search is the text field next to the little magnifying glass on the top censhare window. The main advantage of the Quick Search is to provide fast search results for users. This principle is known from the Spotlight search in macOS. The results are generated from the fast buffered censhare database index (cdb) without making a detour to the slower Oracle database.

Users don't also not have to fill in a search mask with search operators. All entered terms are combined with AND. However, the rules are less strict. When a search term yields to no result, the positive results of the already entered search terms remain on display. By using the simple-search method all search terms (including stop words) that do not occur, are ignored. Without this, there would be no search results. The Quick Search is dedicated for novice users and advanced users alike.

The Quick Search also involves some drawbacks. Particulary, with large databases it may occur quite often that you receive too many similar or unwanted results. This circumstance is due to the fuzzy search. A fuzzy search is triggered only when the search terms provide no clear or too few matches. The search results can be refined further in the filter tab on the right by clicking on a filter type in the list or you can narrow the search results by entering something in the search box in the filter area for 'Meta Data'.
Moreover, it is possible to adapt the search behaviour in the censhare Admin Client to your needs (see below).

When falling back on fuzzy search, the Quick Search at first glance returns too many or sometimes even confusing results. The filter list in the right pane comes in handy. If you still cannot find the expected assets, you should use the Asset Search to strictly limit the search results with operands. To switch from Quick Search to the detailed Asset search, run "Asset Query · Find Again (Ctrl-R / cmd-R)" to open a prepopulated search dialog using the last search terms.

Asset Search (detailed search)

You can open a detailed Asset Search window by clicking on the large magnifying glass or by pressing the shortcut (strg-F / cmd-F). This opens a dialog box and a combination of fuzzy search algorithms (fulltext fields) and specific search fields. The latter do not accept combined search entries. But instead you can make use of wildcards (* (for any number of characters) or ? (for a single replacement character). In the "Advanced" tab you can combine search terms with the Boolean operators AND (Standard), OR and NOT by clicking the (+)-Icon. Also full text searches based on the contents of documents can be performed via the edit field "Content".

Using "Recent searches" in the "Asset query" menu you have access to the last ten searches of your current session. With the restart of the client, these entries get lost. The last performed search can be saved permanently via the menu command "Asset Query · Save search as...". To rerun the previous search you can also click on the refresh icon right to the magnifying glass icon for Advanced search. When there are too many search results, the arrows become enabled for turning to the next or previous page in the result list.

The search results can be rendered more precisely under the "Advanced" tab. Here the following mathematical operators are available:

<, ≦, =, >, ≧, !=, like(%), IN, IS NULL, NOT NULL


Examples:

Search query:
Name: "*ola"
Result:
"Carola"

The use of a wildcard * in front of a search string returns all results that end with the search string. For example, this can also be useful to find assets with a certain file extension.

Search query:
Name: "coc*"
Result:
"Coca-Cola Zero"
"Cocktail"

With a wildcard before and after the search string, it is treated as a substring, and therefore all kinds of results get displayed.

Tips & Tricks

Starting with 5 characters (Minimum word length), a fuzzy search will be performed with the default settings for the "Embedded Database" and an "Maximum editing distance" (maximum deviation) of 2 characters is used.
When performing a Quick Search for "Carola", this will return "Carola", "Caroline" but also "Coca-Cola Zero". But why the latter? After adjusting the fuzzy search algorithm, it is not as strict as before. It interprets the string "Cola" as similar to "Carola". And since "Caroline" has more characters than the search keyword this asset will appear even below "Coca-Cola Zero" in the ranking of search results. Vice versa, a quick search for "Coca-Cola Zero" gives no results for "Carol" or "Caroline". Only a search for "Cola" will also list "Carola". This means for our example, that "Cola" differs two character steps (omitting 'a' and 'r') from "Carola". The function evaluates all letters equal and does not distinguish similar sounding letters. Thus the Coca-Cola beverage gets also found. However, it is not shown as top hit, but appears at the bottom of the ranking.

If you want to avoid that "Coca-Cola Zero" continues to appear after running Quick Search, a search string such as "person meta carola" is a possible workaround to ensure that only person assets are listed in the Quick Search results. A search term like "picture Carol" correspondingly mainly returns image assets and "Layout Carol" would give a list of layout documents. By combining both asset name and asset type, the results of the Quick Search are more predictable. It may yet happen that despite the narrowing you still receive unexpected results. That happens since Quick Search also searches the metadata. As long as the text description of an image asset contains a search keyword, then this asset also appears in the results. But that should not bother you, because the asset you were looking for is also among the results.
If you for some reason you need / want search directly in the Oracle database, you are able to activate this temporarily only in the Administration Mode of the Client in the Client Preferences under "General".

Keyword search

Suppose, you are looking for a photo that was provided with the keyword "pizza" and you want to use the advanced asset search dialog. There you can enter the term "pizza" in the field "Keywords" with wildcards at the beginning and at the end, so you have entered "*pizza*". If you want to search for two keywords and work without a Keyword Tree, you will have to enter "*Pizza* *Pasta*" to find all categorized Italian restaurants in your system. Here you can easily fall into a trap. Because a keyword search without wildcard only works in the Asset search, if you use a keyword tree where the keywords are not separated by commas. The definition of a Keyword Tree has to be done in the Admin-Client in Master data via the "censhare:keyword" . In the Quick Search you must not follow these strict guidelines and may also enter keywords directly.

Search sharper or fuzzier

When performing a quick search for "Maier" (a common German name with many different spellings) using the default settings for the database, then person assets with surname Meyer, Mayer, Meier are not found reliably. You would have to know how the name is spelled exactly. The underlying setting in the standard only considers the word length, and similar spellings such as Mayr. But the fuzzy search can also help with with the exemplary Maier problem. Database adjustment shown below (under configuration).

Configuration

In the Admin client open "Configuration · Services · Embedded database" then scroll down below "Full text index" for each index setting (Name, Meta Data).
A word with a minimum word length of two characters, can be indexed for the search. To configure the fuzzy search: the parameters "maximum editing distance" and "maximum errors", are used to to adjust the threshold of the fuzzy search. Both parameters are usually set to 2.

A word with a minimum word length of two characters, can be indexed for the search. To configure the fuzzy search: the parameters "maximum editing distance" and "maximum errors", are used to to adjust the threshold of the fuzzy search. Both parameters are usually set to 2.
It is generally not recommended to adjust the N-gram size. In the context of search this is not useful. The n-gram size determines the size of the individual fragments, which are stored in memory. The "Minimum word length" refers to the shorter words, usually with two or three characters. Changes on both parameters will take effect only after a rebuild of the full-text index (cdb). They primarily relate to the storage of the data and not the search itself. You can set the quick search noticeably sharper or fuzzier in the fields "Maximum editing distance" and "maximum errors". A value less than 2 is hardly recommended here, since non-ASCII characters in the search string, such as umlauts or accents would already cause an error of 1. The maximum allowed number of errors is 3.


You can turn stopp words on and off in the index. Activating the option "Use Stop Words" will increase the number of results, since stop words occur in almost any context. Given that, the usefulness of the results will deteriorate in most scenarios.
Basically, most users even want more results, so Quick search is set too sharp for them. And if you consider to sharpen the Quick Search beyond the standard, note this also hides potentially meaningful results from the user.
To find "Maier" and most of its different spellings in the metadata, the maximum deviation of the fuzzy search must be adjusted as follows in order to obtain more search results accordingly:

If you perform these changes compared to the standard, then Quick search is able to find the last name "Maier" in more variations with the letters a e, i and y.

An adapted fuzzy search

In our next example a person asset with the name "Renee Émile Malô" was created. Phonetically, these french names can have different spellings. But also the surname. In addition, anyone who does not know that the name is spelled with an accent mark, should be able to find it anyway. After the adjustment from the figure above this asset is also found with different spellings. Quick search will yield this result, even if the user enters just "Rene" instead of "Renée" or mistakenly enters "Emil" instead of "Émile". The surname is now even found using the search string "Marlo". Then the number of errors amounts to two (r, o). A search for "Marlow" lists the result "Malô" as well after "Maximum errors" is set to "3".

The weighting factors of the individual steps will influence the sorting of the results:
     relevance-factor-prefix="0.9"
     relevance-factor-distance0="0.8"
     relevance-factor-distance-step="0.5"

These three parameters are not available via the Admin-Client interface, but if necessary you can access them directly in the XML code.
Full text index search (content of documents) can be disabled in a separate dialog, if desired. This option is available in "Services · Configuration · Fulltext Search".

Before making any changes to the parameters above, you should test your search settings on a lab system and not on the production environment.

Customize the Quick Search

The Quick Search can also be customized to your needs. You have to edit the file  javaclient-app-def-actions.xml  go to the function 'asset-quick-search' and extend it with XSLT. Here, the entire search can be remodeled. For example, with the following code sample only images are found.

<action key="asset-quick-search" title="${find}" class="com.censhare.client.javaclient.uiactions.AssetQuickSearchUIAction" 
     icon-key="find-flag">
      <params min-chars="2" delay-ms="400">
        <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" xmlns="http://www.w3.org/1999/xhtml">
          <xsl:param name="search"/>
          <xsl:param name="system"/>
          <xsl:template match="/">
            <query type="asset" xmlns:corpus="http://www.censhare.com/xml/3.0.0/corpus">
              <condition name="censhare:text.meta" value="{$search}"/>
              <condition name="censhare:asset.type" value="picture." test="{$system/system/@timestamp}" 
              test2="{action/params/@min-chars}"/>
            </query>
          </xsl:template>
        </xsl:stylesheet>
      </params>
    </action>
XML