Concept of Google Cloud AI analysis - SysAdmin
Google Cloud AI service integration in censhare and basic functionality.
censhare uses Google Cloud Natural Language, Google Cloud Vision AI, and Google Cloud Video Intelligence API for content and media analyses.
Google Cloud AI services can be used with censhare Web and censhare WP.
Note: Google Cloud Storage, Google Cloud Natural Language, Google Cloud Vision AI, and Google Cloud Video Intelligence API are services provided by Google. The use of one or all of these Google services can result in additional costs that are invoiced directly by Google. censhare cannot influence or control these costs and therefore cannot be held responsible for them.
Prerequisites
Google Developer Account
- Without activating the modules described below, the analysis actions will be hidden from the censhare standard workspace.
Overview
The Google Cloud offers extensive capabilities to analyze various media files. With the Google Cloud AI integration in censhare, you can analyze:
Texts. For example, find known entities.
Images. For example, find keywords, text (OCR), or web pages that contain the image. Find and identify multiple objects in the image.
Videos. For example, find keywords, or inadequate content.
The following media and content assets can be analyzed with the Google Cloud AI integration in censhare:
Google Cloud service | Asset type | File type/MIME type |
Natural Language | Text | plain, ICML, XML |
Text | DOCX | |
Image | ||
Document | Powerpoint, Excel, Word, PDF | |
Vision AI | Image & sub-asset types | JPEG, GIF, PNG (8 Bit and 24 Bit) |
Video Intelligence API | Video | Any MIME type if configured with JPEG previews |
Analyses can be executed manually in censhare Web and the censhare Client, or automatically via trigger events.
Analyses results are shown in the Analysis tab of an asset page in censhare Web.
For more information on Google Cloud AI:
- Transcription is used to convert detected speech to text in videos analyzed by Google Cloud.
Integration
Google Cloud integration consists of two parts:
censhare Google Cloud AI service: The service connects with the Google Cloud, uploads the media files for the analysis, and receives the results from Google Cloud AI.
Google module: The module is part of the censhare Server. The module is part of the censhare Server. The module provides the server actions to start an analysis. A manual and an automatic server action are available.
The Google module contains synchronous and asynchronous server actions. The synchronous server actions wait for the result from Google Cloud AI after they have sent the request.
The following server actions are available:
Name | Execution mode | Comment |
Analyze via Google Natural Language | Synchronous (1) | Start the analysis of a Text via Google Cloud Natural Language manually. |
Analyze via Google Natural Language (automatic) | Synchronous (1) | Starts the analysis of a Text via Google Cloud Natural Language automatically. The execution is defined by the assets events configured for asset automation. |
Analyze via Google Vision API | Synchronous (1) | Start the analysis of a Text via Google Cloud Vision AI manually. |
Analyze via Google Vision API (automatic) | Synchronous (1) | Starts the analysis of a Text via Google Cloud Vision AI automatically. The execution is defined by the assets events configured for asset automation. |
Analyze via Google Video Intelligence API | Asynchronous (2) | Start the analysis of a video manually. The first execution for a video sents the request. Every following execution requests the status of the processing. |
Analyze via Google Video Intelligence API (automatic) | Asynchronous (2) | Starts the analysis of a Text via Google Cloud Video Intelligence API automatically. The execution is defined by the assets events configured for asset automation. The server action only starts the execution. For updates on the execution use Status check via Google Video Intelligence API (automatic). |
Status check via Google Video Intelligence API (automatic) | Asynchronous (2) | Checks the execution status of videos regularly that are currently being analyzed via Google Cloud Video Intelligence API. |
(1) The synchronous server actions wait for the result from Google Cloud AI after they have sent the request. During this time, the widget is disabled.
(2) When a server action sends the request, the action does not wait for the response. To receive an update of the processing status, configure the automatic server action for status checks.
Using Google Cloud AI
Authentication
The censhare Google AI service uses a service account key to authenticate himself to Google Cloud AI to start the various analysis tasks.
You must create your own key and provide it to the censhare Google AI service.
Google Cloud Storage
To use the Google AI analyses, a bucket within Google Cloud Storage is required. If you have not set up a bucket, you must do this before you can use the censhare Google AI service
The censhare Google AI service first uploads the media file to a bucket within the Google Cloud Storage. From there, the file is transferred to the respective analysis service in the Google Cloud. This has the advantage that there is no file size restriction for the file to upload.
Text analyses
For a text, Google Cloud Natural Language returns content categories and calculates a confidence value for each category. Google also searches for known entities such as public persons or companies and calculates a salience value for each entity. There is a threshold defined in censhare for confidence and salience.
Note: Google currently supports 10 languages for entities analysis. For a list of supported languages, see cloud.google.com/natural-language/docs/languages.
Category mapping
Returned categories are stored in an asset reference. For this purpose, censhare provides a default set of Content category assets.
Each Content category is identified by an External source ID that contains the respective Google content category. If there is no asset for a found content category name, censhare skips this result.
Note: Currently, Google only supports content categories in English.
Entity mapping
Due to the huge number of possible entities, censhare does not provide a default set of entity assets. It uses the entity type that Google delivers for a found entity, for example, "CONSUMER_GOOD" or "PERSON".
Entities are mapped to an asset type and category via a mapping table. The mapping table is stored in the censhare Admin-Client and is editable through an XML file.
If it contains a mapping definition for the found entity type, censhare creates an asset.
If Google returns a Wikipedia page that refers to an entity, censhare creates a Wikipedia web page asset.
Images analyses
The following table shows the categories that Google Vision returns and their mapping in censhare.
Note: Results are only shown if the functionality is activated in the censhare Admin Client.
Google category | Google sub-category | censhare result | censhare sub-result |
Web | Web Entities | Keywords | - |
Web | Pages with Matched Images | Matching images | Full matching image page (URL) |
Web | Fully Matched Images | Matching images | Full matching image |
Web | Partially Matched Images | Matching images | Partial matching image (URL) |
Properties | Dominant Colors | Content Colors | - |
Safe Search | - | Safe Search | - |
Landmarks | - | Locations | - |
Logos | - | Brands | - |
Document | - | Texts | - |
When censhare receives the results from Google Cloud Vision, it checks depending on the category if:
There is a threshold for relevance score.
There is a limit for number of results.
Text recognition
censhare stores the recognized text as plain text in a storage item and assigns the storage item to the image asset. The key for storage item is text-preview and the MIME type is application/xhtml+xml. The text is indexed, and users can search for it.
Video analyses
censhare supports the following functionality from the Google Cloud Vision API:
Keywords
Safe search
Transcription
Each function can be activated individually in the censhare Admin Client.
For the keyword detection, there is a threshold defined in censhare. Results from Google below this threshold are not shown. Besides that, censhare defines a limits for the number of shown keywords.
Video transcriptions
censhare stores the result of the transcription in the Time text storage item for a video asset. The text is stored in the vtt file format (Video Text Format).
By default, transcriptions include punctuation and creates correct sentences. However, Google does not support punctuation for all languages. This can affect the result in the vtt file.
censhare stores the returned text as plain text in a storage item and assigns the storage item to the video asset. The storage item key is text-preview and the MIME type is application/xhtml+xml. The text is indexed, and users can search for it. Select the Content (full text) field in the Detailed search or Expert search.
Update from previous versions
What is new?
New functionality: Google Cloud Video Intelligence API
No restrictions for file size to analyze texts, images, or videos.
The censhare Google Cloud AI service uses a Google service account key to authenticate to the Google Cloud. Before 2020.1, censhare used an API key.
A Google Cloud Storage bucket is needed.
All Google Cloud AI-related server actions have a new Host configuration parameter to connect to the censhare Google Cloud AI service. This setting replaces the obsolete Google API Key configuration.
Workspace configurations are updated. Manual actions are moved to the right side of the screen and the enhancement of disabled widgets during calls and better indicators of progress and status.
Steps to do after the update to 2020.1
Previously configured manual server actions and asset automation must to be removed and reconfigured again.
Recreate text previews for plain text assets: With 2020.1, the malformed HTML header for the generated text previews is corrected. As of that, the Google Natural Language analysis does not work with the format before.
Check if also XML content shall be analyzed automatically with Google Natural Language analysis: As of a bug fixed with 2020.1, censhare now also generates a preview-done event for generated text-preview(-s) for master files with 'text/xml' MIME type. If you have already configured asset automation for natural language analysis before version 2020.1, check if also XML content shall be analyzed automatically.
Ensure that your workspace configurations, both for static and asset pages, work according to your expectations.
Result
You understand the integration of Google Cloud services and the Google Cloud AI functionality. You know about the server actions that execute an analysis and how the server actions work.
You know how to install/configure:
Accessing Google Cloud AI and provide space from Google Cloud Storage
The manual and automatic server actions to use the Google Cloud AI functionality
The censhare Google Cloud AI service
Further steps
Configure the manual and automatic server actions in the censhare Admin Client that you want to use with Google Cloud AI.
Install and configure the censhare Google Cloud AI service.