Configure Google Cloud AI service - SysAdmin
Google Cloud object detection with Google Cloud Vision is supported as of censhare 2020.3.1.
Context
censhare integrates the analysis tools Google Cloud Natural Language, Google Cloud Vision AI, and Google Cloud Video Intelligence API.
Each of these Google Cloud AI services can be executed manually or automatically.
In the left navigation: There are static pages for media and content assets for easy access.
Workspace view snippets provide the necessary search and list views.
Google Cloud AI services can be used with censhare Web and censhare WP.
Prerequisites
Google Developer Account
Introduction
Google Cloud offers extensive capabilities to analyze media files. censhare uses the following Google Cloud AI services::
Google Cloud Natural Language for text analysis
Google Cloud Vision AI for image analysis
Google Cloud Video Intelligence API for video analysis
In censhare, the Google Cloud AI integration consists of two parts:
censhare Google Cloud AI service: The service uploads media to the Google Storage, requests analysis, and removes it after the results of the analysis are delivered.
Google module: The module provides the server actions to start the analysis of a media or content asset. A manual and automatic server action is available. The configuration is done for each server action separately.
censhare Server
Configure the Google module in the censhare Admin Client in Configuration/Modules:
Configure the general part: This is different for manual and for automatic server actions.
Configuration in Google
Google Storage
Create a bucket at Google Cloud Storage:
Create a bucket at Google Cloud Storage: Open the Cloud Storage browser in the Google Cloud Console.
Click Create bucket to open the bucket creation form.
Specify a bucket name.
Note: Consider that Google uses global namespaces .
Select a Default storage class for the bucket.
Select an Access control model.
Optionally, you can add bucket labels, set a retention policy, and choose an encryption method.
Click Done.
You also need the bucket name for the configuration of the censhare Google Cloud AI service.
Enable Google Cloud APIs
The use of Google Cloud APIs requires a Google project. To enable an API for a project, do the following:
Go to the Cloud Console API Library.
From the projects list, select a project, or create a new one.
In the API Library, enable the APIs that you want to use with Google Cloud AI:
Cloud Natural Language API
Cloud Vision API
Cloud Video Intelligence API
Cloud Speech-to-Text. This used for speech-to-text detection in videos.
In the API Library, you must also enable:
Cloud Storage
If you need help to find the API, use the search field and/or the filters.
On the API page, click ENABLE.
Google Authentication
Create a service account key:
Open Google Console.
Select your project.
On the Dashboard, you get an overview of the available APIs and their usage.
Go to the Credentials tab.
Click Create Credentials and select Service account.
Enter the account details in the mask and click CREATE.
Add roles: Owner and Storage Admin and then click CONTINUE.
Click CREATE KEY, select JSON as Key type, and click CREATE. A file with the key is stored on the local computer.
More information:
Configuration in the censhare Admin Client
Note: Only, when a module is enabled in the censhare Admin Client, the analysis action options are visible in the user interface.
Configure access to the service
The censhare Server accesses the censhare Google Cloud AI service via HTTP/HTTPS requests through REST endpoints. The Host field contains the network address of the host that runs the service and port to access it:
http://SERVER-ADRESS:SERVICE-PORT
For example:
http://censhare.myCompany.de:8033
Check the configuration file of the censhare Google Cloud AI service to see which port is defined.
Configure the External Source Provider
As there can be other cloud services from the same or other services providers, you must identify:
The service provider
The analyses service
In censhare, there are two configuration assets for this purpose:
The Google asset (asset type Module/External provider) stores the service provider.
For each analyses service, a dedicated Module/Module Interface asset is required to identify the corresponding service provider API.
They are configured in Analyzer setup section of the server action.
Field | Google service | Value (Resource key) | Referenced asset name |
External provider key | censhare:external-source.google | ||
Configuration asset key | Text | censhare:interface.google-language-api | Google Natural Language API |
Configuration asset key | Image | censhare:interface.google-vision-api | Google Vision API |
Configuration asset key | Video | censhare:interface.google-video-ai | Google Video AI |
Note: In most cases, you do not need to changes these values.
Configure file selection for analysis
For video and image analyses you can select, which storage item is transmitted and analyzed:
Select the storage item type in Storage type.
Check Fallback to master to use the master file the selected storage type is not available for the current asset.
Note: Be aware that the master file can be larger then the selected storage type. This then can result in much higher cost for the analysis by Google Cloud AI.
Hints for the selection of the Storage type:
Automatic analysis of images using Object Detection: If you select Preview or Thumbnail as storage type, you must configure Asset preview done as Asset event in Trigger events in Analyze via Google Vision API (automatic). This ensures that the selected storage item exists when the automatic analysis is triggered.
Analysis of image formats that are not supported by Google: Select Preview or Thumbnail as the storage type. These storage types have the MIME Type JPEG. This can be analyzed by Google. If Preview or Thumbnail do not exist, no analysis is possible.
Large master files: Be aware that the master file can be larger than the selected storage type. This then can result in higher costs for the analysis by Google Cloud AI.
Google Natural Language Analyzes
For more information on Google:
For text analyses, the following settings are available:
Salience threshold for detected entities: Enter the desired threshold into Entity salience threshold. Entities are stored as assets and then related to the analyzed asset. If an entity does not exist, it is created.
Confidence threshold for detected content categories: Enter the desired threshold into Entity salience threshold.
Mapping of entity types provided by Google to asset types in the censhare Server: For each found entity, Google also delivers an entity type. censhare uses this information to map the entity type to an asset type. This asset type is also used to create a new entity asset if no existing is found. The mapping is stored in the Google to censhare type mappings section.
The default XML configuration for type mapping:
The src attribute stores the entity type delivered from Google. You find the complete entity type list at Google reference for Natural Language. The dest attribute stores the asset type that is creating in censhare for this entity. The default mapping only uses Keyword assets as entities. The category attribute stores the classification of the keyword.
To edit the configuration, click Edit type mappings.
In the mapping you can do the following:
Add additional entity types.
Map different source entity types.
Map source entities to other target asset types.
Map two or more different source entity types to one target asset type.
Google Cloud Vision
For more information on Google:
Control the results of the analysis
Google Cloud Vision provides several functions and can deliver a large number of results to censhare. In the configuration, you can control the results shown in censhare in different ways:
Enable/disable individual functions: Check/Uncheck Enabled.
Set a relevance threshold for a function: Enter a value in Threshold.
Limit the number of results for a function: Enter a value in Max result.
The following table shows, which items are analyzed:
Label Admin Client | Functionality | Activate | Threshold | Number of results |
Logo detection | Identify brands | x | x | x |
Landmark detection | Identify locations | x | x | x |
Text detection | Recognize text | x | o | o |
Dominant colors detection | Identify the main content colors | x | o | x |
Safe search detection | Check for inadequate content | x | o | o |
Web detection | Assign keywords | x | x | x 1) |
Detect web full matching images | x | o | ||
Detect partially matching images | x | o | ||
Detect partially matching images | x | o | ||
Object detection | Detects objects in the image and classifies them | x | o | x |
1) The value limits each result list in Web detection.
Note: For Web detection, there is only one threshold and one maximum result number. The values are valid for all four result lists.
Results for brands, locations, content colors, and keywords are stored as assets, and an asset relation is created to the image. If an asset does not exist, censhare creates a new one.
Color
In the default configuration, censhare maps the returned colors from Google to the standard 16-color palette. The mapping is based on the RGB values of each color. censhare calculates the distances of the red, green and blue values between the Google color and the censhare color palette. The closest match in the censhare 16-color palette is mapped to the result and assigned to the image asset.
The censhare 16-color palette is a dynamic value list. Each color is represented by a “Feature item” asset (asset type: “Module/Feature/Feature item”). For the definition of the assets, see the folder on the censhare Server:
~/censhare/censhare-Server/install/system/required/features/content-color
Google Cloud Video Intelligence API
For more information on Google:
- (valid for audio and video)
Features
To store the state of the analysis for a video asset, the censhare Server uses the following asset features:
Google service transaction ID: ID that Google returns to reference the analysis while it is ongoing.
Google service completion state: completion state of the analysis for the video asset
Note: You must update the database before the features are available.
The status update for videos
When the manual or the automatic server action starts the analysis of a video, Google returns a transaction ID that is stored with the video. The automatic status check server action requests the status using the transaction ID if the transaction has not yet finished.
For each status request, Google returns a completeness percentage for keywords, safe search, and speech transcription. These percentages are calculated into an average completion percentage and stored in the video asset.
For the automatic status check, configure the Trigger events in the Status check via Google Video Intelligence API (automatic) server action the following:
Select Cron events in the Trigger events section
Enter a Cron pattern that suits your needs, for example, "* * * * *" for every minute. For more information, see Cron events in Configure the automatic execution of server actions.
Prevention of a repeated analysis
Once a video has been analyzed, the censhare Server prevents that a new request can be started. The reason for this that a repetitive analysis of a video can produce very high costs. This is especially important for automatic server actions.
For this purpose, the censhare Server checks if the Google service transaction ID and the Google service completion state feature exist for video assets.
Note: To allow a new analysis of a video that was already analyzed, you must manually remove the features and start a new request.
Keyword detection
Configuration:
Enable keywords: Check Enabled.
Set a relevance threshold for keywords: Enter a value in Threshold.
Limit the number of results for keywords: Enter a value in Max result.
Keywords are stored as assets and then related to the analyzed video. If a keyword does not exist, it is created.
Safe search
Enable safe search: Check Enabled
Speech transcription
Enable speech transcription: Check Enabled.
To allow transcription, censhare sends the language code to Google. Google Speech-to-Text requires the language code with language and region value, for example, en-US.
censhare allows us to define languages in various ways. For example, you can define language codes that are only valid within a company.
Therefore, a mapping is required to create the correct output format for the Google service. By default, the following mappings are available:
language code in censhare | language-region code |
en | en-US |
de | de-DE |
fr | fr-FR |
it | it-IT |
ja | ja-JP |
To add other languages to the mapping table, do the following:
Change to the Admin mode in the censhare Admin Client.
Go to the manual or automatic server action in the Google module and mark it.
Click Show/edit XML file in the Admin menu in the censhare Admin Client.
Go to the tag and add the mapping in the following format:
zz is the language code defined in censhare. xx is the internationally defined language code. YY is the respective language region code.
There is a priority for the selection of the language code:
Content language of the video asset if defined
Language defined in Default language code in manual/automatic server action.
Language in Default language code
Language code en-US as a fallback if there is no entry in Default language code. This is hard-coded and cannot be changed.
Enable new analysis
In the censhare Client: Change to Admin mode.
Search for the video asset in the censhare Client.
Open the edit dialogue for metadata for this asset.
Go to Features (internal section) on the Features tab:
Delete Google service completion state.
Delete Google service transaction ID.
Note: Be aware that the analysis of a video with Google Cloud Video Intelligence API can lead to high costs charged by Google!
Permissions
To manually execute the analysis through Google Cloud AI, one of the following permission keys is needed:
ID (Permission key) | Name | Comment |
app_google_all | Google tools (all) | Permission to use all Google Cloud AI tools |
app_google_nl | Google natural language | Permission to use Google Cloud Natural Language |
app_google_vision | Google vision | Permission to use Google Vision |
app_google_video | Google video intelligence | Permission to use Google Video Intelligence |
Monitoring
censhare Server
The censhare Server writes log messages upon requesting an analysis from Google Cloud AI.
Use the command name for respective server action to find the entries in the server log:
Name | Command name |
Analyze via Google Natural Language | google_nl.update-data-action |
Analyze via Google Natural Language (automatic) | google_nl.update-data |
Analyze via Google Vision API | google_vision.update-data-action |
Analyze via Google Vision API (automatic) | google_vision.update-data |
Analyze via Google Video Intelligence API | google_video_intelligence.update-data-action |
Analyze via Google Video Intelligence API (automatic) | google_video_intelligence.update-data |
Status check via Google Video Intelligence API (automatic) | google_video_intelligence.update-data-status |
Video analysis can take a long time. To follow a video that is being analyzed, Google returns a transaction ID. This ID is also written into the log:
AAGoogleVideoIntelligence.serverActionSetup: GoogleAiService:
SERVER_NAME.20200304.144508.206[USER]: assetId[42942]:
Google AI Video Intelligence Analyze Async:
Processing RequestId = dd2eb205-c0d2-40cd-9dfe-c3a755169a1c
This ID is then used to request the status of the analysis for the related video:
AAGoogleVideoIntelligence.serverActionSetup: GoogleAiService:
SERVER_NAME.20200304.144537.604[USER]:
requestId[dd2eb205-c0d2-40cd-9dfe-c3a755169a1c]:
Google AI Video Intelligence Status Async: Processing progress percent = 66
Google Cloud AI service
The censhare Google Cloud AI service writes log messages.
Flags when a text or images is analyzed:
Flag | Comment |
[analyze] | Analysis start. |
[uploadFile] | Upload file to Google Cloud Storage. |
[getStorageUrl] | Get the URL of the file at the Google Cloud storage. |
[analyzeStorageUrlSync] | Analyze files through Google Cloud AI. |
[deleteDirectory] | Delete the uploaded file. |
[cleanupDirectoryCleanup] | The bucket is empty again. |
[analyze] result / output | Return the result. |
Flags when the request to analyze a video file is started:
Flag | Comment |
[analyze] | Request start. |
[uploadFile] | Uploads file to Google Cloud Storage. |
[getStorageUrl] | Get the URL of the file at the Google Cloud storage. |
[analyzeStorageUrlAsync] | Analyze files through Google Cloud AI. analyzeStorageUrlAsync contains the Google Long-Run processing ID. |
[analyze] result | No result is returned. The requests are started separately to get status updates and return the results. |
Flags when a status update request is sent:
Flag | Comment |
[asyncStatus] | Request start. |
[getAsyncOperationStatus] | Gets status in percent for one of the following analysis tasks: |
· EXPLICIT_CONTENT_DETECTION | |
· SPEECH_TRANSCRIPTION | |
· LABEL_DETECTION | |
· AVERAGE (average status result from all analyze tasks) |
Flags after the video analysis is finished (average status = 100 %):
Flag | Comment |
[asyncResult] | Starts to finish the request. |
[deleteDirectory] | Delete the uploaded file. |
[cleanupDirectory] | Cleanup. The bucket is empty again. |
[processAnalyseResult] | Update video asset with the results. |
censhare Google Cloud AI service
Introduction
The censhare Google Cloud AI service is running as a standalone service. It uses Apache Tomcat that runs on a defined port. The service calls the censhare Server through a REST API. The REST API is also used to download the storage item from the censhare Server. Storage items are uploaded to Google Cloud Storage to the bucket that is configured in the configuration file.
The censhare Google Cloud AI service and the censhare Server can run on the same machine, or on dedicated servers. Both machines must be located in the same intranet.
The service only handles one Google account, it is not multi-tenant.
By using Google Cloud AI services, you accept the terms and conditions of Google. Additional costs for analyses are invoiced directly by Google to the account that you configured in the service.
Installation
The censhare Google Cloud AI service is provided as RPM package.
If the censhare Server is installed as an RPM package, the installation process also installs the analysis service.If you want to install the Google Cloud AI service manually, use the following command:
yum install censhare-google-ai
Note: You must have an RPM repository with the censhare RPM sources configured. For more information, contact the censhare support.
The Google module for the Google Cloud AI configuration is automatically installed with the censhare Server 2020.1.
Configuration
During the installation, the following local directory for the configuration is created:
/opt/censer/google-ai
The installation directory contains the following files:
Jar file for the Google Cloud AI service
application.yml for the configuration of the Google Cloud AI service
There is also a DB file for storage purposes. But, this file is created on-demand. It is not part of the installation.
The service is configured to start automatically. Before it can start for the first time, you must do the following steps:
Install the Google service API key.
Edit the configuration file.
To install the Google service API key:
Obtain the JSON file for the Google service API key.
For more information, see Google Authentication.
Copy the file into the following directory:
/opt/censer/google-ai
Rename the file to google-api-key.json
To update the configuration file:
Obtain the bucket name for Google Cloud storage. For more information, see Google storage.
Open the application.yml file in a text editor.
Go to # Google Storage Name.
Enter the bucket name behind the variable censhare.microservices.google-ai.google-storage.
To start the service, use the following command:
systemctl start censhare.google-ai.service
censhare.google-ai.service is the service name.
To check if the service is running, use the following command:
systemctl status censhare.google-ai.service
The variables in the application.yml configuration file:
Variable | Comment |
server.port | This port is used to reach the service from the outside. You must enter the port in the Host field in the configuration of the respective server action. |
censhare.microservices.standalone-web-server.host | URL to send requests to the censhare Server, for example, https://censhare.yourCompany.com:9443. This URL is used to access the censhare Server via REST calls. |
censhare.microservices.standalone-web-server.username | User name to access the REST API of the censhare Server |
censhare.microservices.standalone-web-server.password | Password of the user |
censhare.microservices.standalone-web-server.timeout-ms | Default: 500. Timeout to wait for a connection to the censhare Server. |
censhare.microservices.google-ai.google-storage | Name of the bucket in the Google Cloud Storage to use with censhare Google Cloud AI service |
censhare.microservices.google-ai.google-credentials | Relative path and name and of the file that contains the Google service account key, for example, "./google-api-key.json". |
Result
You understand the architecture of censhare Server and censhare Google Cloud AI service to use the Google analysis functionality. You know that there are server actions that work synchronously and some asynchronously.
You know how to install/configure:
Using Google Cloud AI and Google Cloud Storage
The manual and automatic server actions to use the Google Cloud AI functionality
The censhare Google Cloud AI service