"Google Cloud Natural Language" is a cloud service by Google for text analysis. censhare uses this service to analyze text content stored in your system. As a result, it returns the content categories and entities like persons or companies that are relevant to the text.


Context

The results of Google Cloud Natural Language are shown in the "Info" tab of the selected text content asset in the censhare Client.

Prerequisites

Google Cloud Natural Language is a service provided by Google. The use of that service may result in additional costs directly invoiced by Google. censhare does not have any influence nor control on these costs and shall therefore not be held responsible.

  • For asset automation, the automatic server action for Google Natural Language must be configured and executed regularly on the system.

  • For manual execution, the manual server action must be configured.

  • The Google Cloud Natural Language functionality is only available with certain text content assets.

Introduction

Google Cloud Natural Language is a cloud service for text analysis. censhare uses this service to find content categories in a text. Besides that, Google Natural Language returns a list of commonly known entities that are mentioned in the text. Entities can be public persons or companies, for instance. If Google finds a Wikipedia web page for an entity this will be linked accordingly in censhare.

censhare analyzes the following text content assets:

  • "Text" assets with the following file content: plain, ICML or XML text

  • "Text" assets with the DOCX file content

  • "Image" assets with JPEG MIME-type

  • "PDF" assets with PDF MIME-type

The analyzes can be triggered automatically by the server automation or executed manually by yourself. The results are presented in the "Info" tab of the censhare Client for the respective text content asset.

The "More information" section of the "Info" tab will show you a list with the found content categories. For each content category it will also show the confidence value for this result. Calculated by Google, the confidence value indicates how certain the found content category matches with the analyzed text.

For the found entities, a similar value is used: the salience. Calculated by Google, the salience value represents the importance of the entity for the text.

Depending on the text, Google Cloud Natural Language finds many content categories or entities. Not all of them might be interesting for your case, especially if the confidence respective salience values are very low. As of that, censhare applies thresholds for the confidence and salience value that filter results that are lower than the defined value. Due to this, the content asset page may show less results as Google can deliver. The confidence and salience thresholds are defined by an administrator. Please, refer to him for any further questions.

In order to prevent too large text files to be analyzed (especially with the service automation), censhare can apply a threshold for the text size. This is due to performance reasons of the system. If the file from the content asset to be analyzed exceeds this size the analysis will not be executed.

Each entity and content category are stored as an asset in censhare. If an entity or content category is assigned to several Text assets, these all will be linked to the same entity or content category asset. You have to open such an entity or content category asset in censhare Web - not in the censhare Client, if you want to see which text content assets share the same content category or mention the same entity.

If you re-analyze a content asset censhare will update the asset respectively. This means that Content category and Entity information will be added or removed according to the new results. If the confidence or salience results differ from prior analyzes, censhare will update the according entries in the content asset as well. If the threshold values have changed between two analyzes and if there are now more or less results, the entries shown in the content asset will also be updated accordingly.

Concept of analyzes

The mapping of content categories

Google has a list of content categories which it is used to analyze a text. On the other side, censhare has list of content categories that it supports. These can differ because changes on the Google side are not reflected on the censhare side. The content categories in censhare are stored as assets.

When running an analysis, censhare receives a list from Google with all content categories that has been found. If a content category found by Google exists as an asset, censhare creates a relation between the content asset and the content category asset. If no content category asset is found for a result from Google, it will not be shown in censhare because censhare does not create new categories. If you are missing categories in censhare that Google has found and that should be added, please contact your administrator.

The content category assets in censhare have the "Module/Content category" asset type. If you want to see a full list of all supported categories, you can search for this asset type in the "Detail search". Just open the Detail search and select according to the asset type in the "Type" field.

Currently, Google only supports content categories in English.

Creating entity assets

censhare creates for any entity an asset if the found salience is higher than the threshold defined. If an entity asset already exists in censhare, this one will be taken. If there is a Wikipedia web page for an entity, censhare will also create a "Wikipedia web page" asset respective use the existing one.

Entities can have one of the following asset types in censhare:

  • Consumer good

  • Company

  • Event

  • Location

  • Other

  • Person

  • Work of art

Contact your administrator, if you want to change or extend the asset types.

Note: Google currently supports 10 languages for entity analysis. For a full list of languages with the ISO-639-1-Code, see https://cloud.google.com/natural-language/docs/languages

Execute analysis

  1. In the censhare Client, go to the asset in question in the asset list. You can also select more than one content asset if you want to execute the action for multiple assets simultaneously.

  2. Select "Google Cloud Natural Language" in the "Server actions" menu, either on the top of the Client or from the context menu.

View results

For a text content asset that has been analyzed:

  1. Select the text content asset in question in the asset list.

  2. Go to "More information" section of the "Info" tab.

  3. Below, you will find all "Content categories" and "Mentioned" entries that have been created after the analysis.

If you want to switch to an asset shown in the "More information" section:

  1. Open the asset in question with "Edit metadata".

  2. Go to the "Features" tab.

  3. Go to the "Content category" or "Mentioned" property in question, perform a right-click on the referenced asset and choose "Show/Show in a new window".

  4. A new window opens with the desired asset in the asset list of the query window.

Result

The "Google Natural Language" server action is executed for the selected asset(s) and the results are stored in censhare. The results of the analysis are shown in the "More information" section of the "Info" tab.