Amazon S3 - Configuration
S3 is the Amazon Web Services (AWS) cloud storage system. censhare uses Amazon S3 as an additional (or only) file system for storing asset files. It allows you to set up a central repository for saving your files. You manage the censhare Server locally or as an EC 2 instance in the Amazon cloud.
What's new
Configuration to access Amazon S3 using IAM roles.
- Amazon S3 filesystems can be configured with the new Amazon S3 protocol, or as S3 clone from third-party providers.
Update note
If you update earlier versions to censhare 2019.2 or later, follow the instructions in the Update from earlier versions section at the end of this article!
Context
The configuration is carried out in the censhare Admin Client and in Amazon AWS.
Prerequisites
Amazon AWS account
Understanding Amazon S3
Understanding Amazon IAM (Identity and Access Management)
Understanding censhare file systems
General knowledge of the administration of the censhare Server, including censhare Admin Client
If censhare Server is running in Amazon AWS:
Understanding Amazon EC2
Understanding AWS Command Line Interface (CLI)
Introduction
You can run censhare completely within the AWS environment. censhare Server and database run as instances within Amazon AWS and use Amazon S3 as the storage system. Alternatively, you can run the censhare Server locally and use Amazon S3 as central or additional file repository.
Use cases
You can run censhare completely within the AWS environment. censhare Server and database run as instances within Amazon AWS and use Amazon S3 as the storage system.
The different topologies for using Amazon S3 with censhare Server: (1) using Amazon S3 as an auxiliary storage location for asset files; (2) Amazon S3 as a central repository for asset files; (3) asset file storage as well as censhare Server both working in the Amazon cloud.
Legend: Amazon Web Services (AWS) (4), asset files in Amazon S3 storage (5), asset files saved locally (6), censhare Server as Amazon EC2 instance (7), local censhare Server (8).
Alternatively, you can use local censhare servers and use Amazon S3 as a central storage location for your asset files. If you need additional storage space, you can use Amazon S3 as a separate storage system for files. You configure that independently of any existing local storage systems for asset files. The latter approach allows you to learn a bit about Amazon S 3 using an existing censhare installation. For general information, see Using Amazon Web Services with censhare.
Note: censhare supports the use of Amazon S3 for asset file systems only. For interface files systems like hotfolders, use Amazon EBS. The configuration of Amazon EBS as a file system is the same as defining a file system for a local attached storage.
Configuration
To use Amazon S3 as a central repository, you do not need to configure a local asset file system. In this case you only have a central asset file system located in a bucket on Amazon S3. A bucket is a the storage space you must configure on Amazon S3. Every local censhare Server that accesses the Amazon S3 file system also has its own local cache file system.
When using Amazon EC2 instances (Elastic Cloud Computing) for the censhare server there must also be a local cache available. This is located on the respective EC2 instance. You can also define multiple buckets to save asset files on one server.
Once the censhare server can access the Amazon S3 bucket, there are two asset file systems: the local file system and the Amazon S3 bucket. The two differ in their configuration. The local asset file system has the asset temp file system, an asset file system on Amazon S3 also has a local cache on the censhare server.
If you have more than one asset file system, you must configure, which asset files are stored on which asset file system. You do this with domain paths. Every asset file system must have a unique domain path. censhare uses the domain path of an asset to select the correct asset file system. For more information, see Configuring domains for asset file systems.
If you later change the domain path of an asset, censhare will not automatically move the associated files if the new domain path belongs to another asset file system. You can use a server action for selected assets to tell censhare to move the current version (or all) file versions to the new asset file system associated with the domain path.
If the domain of an asset changes and the new domain uses a different file system, you must ensure that the asset files also move to the new file system. For more information, see Moving files to another asset file system.
Prepare Amazon AWS
Amazon S3
To configure the S3 file system in censhare, you need the following:
Name of the S3 bucket to use
Region of this S3 bucket
IAM roles
Security advice
The censhare Server must authenticate himself to the IAM (Identity and Access Management) of AWS to get access to Amazon S3. There are two possibilities:
The censhare Server runs on an EC2 instance of AWS: Use an IAM role.
The censhare Server runs outside the AWS: Use IAM keys.
Use IAM keys only for use cases that run censhare server outside AWS. With an IAM role, no keys need to be stored on the censhare Server.
If you run the censhare Server in Amazon AWS, Amazon Machine Images (AMI) are used to provide censhare Server instances. If you use IAM keys, the keys are stored in the censhare Server AMI. For instance, if such an AMI is shared, the IAM key can be read because the key is stored in clear text in the censhare Server configuration.
Allow the EC2 instance that contains the censhare Server to access to the S3 bucket:
In the IAM service, create a new role as a trusted entity and select "EC2" as "AWS Service". For more information, see IAM Roles for Amazon EC2.
Attach a policy to the role. Give full access to this role for all S3 buckets (policy = "AmazonS3FullAccess") or create a JSON file with the permissions and attached it to the role. For example:
JAVA{ "Version": "2012-10-17", "Statement": [ { "Sid": "ListBucket", "Effect": "Allow", "Action": [ "s3:GetBucketLocation", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::bucket-name" ] }, { "Sid": "ActionBucket", "Effect": "Allow", "Action": [ "s3:DeleteObject", "s3:GetObject", "s3:PutObject" ], "Resource": [ "arn:aws:s3:::bucket-name/*" ] } ] }
For more information, see IAM Policies for Amazon EC2.
Attach the policy to the censhare application server instance. For more information, see Attaching an IAM Role to an Instance.
Test if the operating system has access to the bucket. You can use the AWS Command Line Interface (CLI) and the S3 command. For more information, see AWS CLI and S3 Reference.
For more information, see Using an IAM Role to Grant Permissions to Applications Running on Amazon EC2 Instances.
IAM keys
censhare needs the access keys to access the Amazon AWS API on Amazon S3. For more information, see Types of Security Credentials.
To configure the S3 file system, you need the following:
Access key Access Key ID for Amazon S3
Access key Secret Access Key for Amazon S3
For more information, see What is IAM.
Dedicated domains for S3 file systems
If you work with more than one asset file systems, we recommend to configure dedicated domains for each file system. If you use only one file system, the domain must always be Root. The domains are only necessary for file systems. The cache file systems always have Root domains.
To determine the domain in which censhare stores an asset file, it compares the domain path of an asset with the domain path of the file systems. The file is stored in the file system that matches best with the domain path of the asset.
A correct configuration contains exactly one asset file system with the domain path Root. censhare selects this file system if multiple domain paths match to the same degree. censhare also selects Root if no other full matches exist with another file system.
Configuration
Every file system that you create requires a separate Amazon S3 bucket. The configuration is done in the censhare Admin Client, in two locations:
Every S3 asset file system has its own local cache file system. If you are using three S3 asset file systems there are three local caches. There are two settings options: the size of the local cache and the location of the local cache on the local storage of the censhare server. censhare manages these parameters for every cache separately and saves the configuration with the configuration for the associated asset file system.
There are two parameters that define the local cache:
Configure Master data entry for Amazon S3 file system
To add an entry for an asset file system:
- In the censhare Admin Client, open the Master data/File systems table.
- Click to create a new entry.
- In the dialog, enter an ID, select a Type, and enter a Description. The Description is necessary in the next configuration step.
- Select the desired Domain and 2nd Domain. For more information see Dedicated domains for S3 filesystems.
- Leave the Master replication field empty.
- Click OK to save the entry and close the dialog.
To add an entry for a local cache file system, repeat the steps and in the Type field, select Others. In the Domain and 2nd Domain fields, select Root.
Configure Filesystem service for Amazon S3 file system
After creating the file systems, add the corresponding filesystem configurations. There are two ways to configure your Amazon S3 file system in censhare:
Amazon S3 - you set up and manage you own S3 bucket.
S3 Clone - you access a S3 clone from a third-party provider, or set up your own S3 clone in a Docker container.
- In the censhare Admin Client, go to the Configuration/Filesystem directory, and open the Configuration.
- Go to the end of the dialog, and click to add a new entry. You must create a new entry for each S3 file system, and for each separate local cache. Leave all other entries unchanged!
- In the File system field, select the file system that you created in the previous configuration step.
In the Filesystem type field, select Cloud.
Introduced as of censhare 2019.3
The Cloud parameter enhances the transaction handling of files in cloud storage filesystems (Google Cloud, Microsoft Azure, Amazon S3). File transactions (for example: moving files from one directory to another) works faster and are more reliable. If a transaction fails, residual files are removed immediately.
- Leave the Replication filesystem field empty.
- Leave the External synchronization field unchecked.
- In the Usage field, select Assets.
In the Protocol field, select one of the following:
- To access your S3 bucket directly, select Amazon Simple Storage (S3). Then select a bucket region and enter the bucketname.
- To access your S3 bucket through a clone provider, select S3 Clone. Then enter the Bucketname without any path or other prefix, and enter the base URL to access the S3 clone.
- In the Access key and in the Secret key fields:
If you use an IAM role, leave the fields empty.
If you use IAM keys, enter the Access Key ID and the Secret Access Key.
- If you are using a firewall that completely blocks outgoing HTTP traffic, enter the access data for a proxy for the S3 data:
- Proxy Host: IP address or DNS name of the proxy
- Proxy Port: port to use for the proxy access
- Proxy Protocol: HTTP or HTTPS
- Proxy Username: username of the account to authenticate at the proxy
- Proxy Password: password of the account to authenticate at the proxy
- Check the Resume multipart uploads field if you want to use Amazon multipart upload. It is useful if censhare Server connects to an S3 bucket in a different geographical location.
- Check the Accelerated mode field to use Amazon S3 Transfer Acceleration. It is useful if censhare Server connects to an S3 bucket a different geographical location. Note that this feature is only available if you select Amazon Simple Storage Service (S3) as Protocol.
- In the Cache filesystem field, select the entry for the desired cache file system. The values come from the Master data/File systems table. If you created your own cache file system there, select it here. To locate the cache within an existing cache file system, select the desired entry. The generic name is Temp.
- In the Max. size of cache (MB) field, enter the desired size of the cache. The default value is 20000. For more information, see Configure local cache size.
- To encrypt the files, you can use the server-side encryption that Amazon AWS S3 provides. Select Server side AES-256 to enable the server-side encryption.
- Click OK to save your configuration and close the dialog.
- To enable the file system, run Update Server Configuration.
Configure the local cache
The local caches can all be located in one directory. However, censhare creates a sub-directory for each cache. Alternatively, you can define a dedicated directory for each cache.
To configure a local cache, proceed as follows:
- In the censhare Admin Client, go to the Configuration/Filesystem directory, and open the Configuration.
- Go to the end of the dialog, and click to add a new entry. You must create a new entry for each separate local cache. Leave all other entries unchanged!
- In the File system field, select the desired cache file system.
- In the Filesystem type field, select Physical.
- Leave the Replication filesystem field empty.
- Leave the External synchronization field unchecked.
- In the Usage field, select Other.
- In the Protocol field, select Default.
- In the Type field, select Plain. This configuration allows you to store asset files >4GB.
In the URL field, enter the URL beginning with the path prefix file: and followed by the relative path starting from the installation directory of the server. For example file:work/asset-s3-cache/.
- Leave the Lost+found-URL field empty.
- Click OK to save your configuration and close the dialog.
- To enable the file system, run the Update Server Configuration.
Configure local cache size
There are two parameters for the cache size: the cache storage size (file cache size), and the hash table size. This saves information about files in the cache, for example, the file size. The default values for both parameters are usually large enough.
The default local cache size is 20 GByte per file system. If you define three different asset file systems, the maximum cache size is 60 GByte (3 x 20 GByte).
You can configure the size of the cache for every asset file system separately in the configuration entry for the asset file system. The cache size must be set for each cache individually.
censhare checks the size of the cache at regular intervals. If the remaining cache size is too small, censhare removes the oldest files. Between two checks, the cache can temporarily be larger than the defined value.
censhare logs the size of the used cache at regular intervals in the server log file. A log entry looks something like this:
2016.02.02-15:13:16.806 INFO: Thread-5: FileSystemCacheCleaner: Size of hard drive cache (s3-cache, s3-assets): 7 MB
- If the value for the cache size is regularly higher than the defined value, this indicates to increase the cache size.
- The time to access the cache is much shorter than the time to access the Amazon S3 bucket. If the file cache is not big enough, the censhare Server needs more time, because files are retrieved from the Amazon S3 bucket instead from the cache. This indicates to increase the cache size.
The size of the hash table
Every local cache has a hash table with a default size of 100000 entries. You can increase the number separately for every asset file system.
censhare checks the size of the hash table at regular intervals. If the table is larger than the defined value, censhare removes the oldest entries. Between two checks, the hash table can be larger than the defined limit.
You can change the hash table size as follows:
- In the censhare Admin Client, switch to administration mode.
- Go to Configuration/Services/Filesystem and open the Configuration.
- Open the Admin menu in the top toolbar, and select Show/edit XML file.
- Search the asset file system ID of which you want to change the hash table size.
Add a cacheMaxHashTableEntries attribute with the desired value to the <filesystem/> element:
CODE<filesystem external-sync="false" protocol="s3" name="s3-assets" cacheMaxHashTableEntries="200000" usage="assets" url-creator="storage-sequence" type="plain" region="eu-central-1" encryption="NoEncryption" bucketName="bucket2" accessKey="YOUR_ACCESS_KEY_ID" secretKey="YOUR_SECRET_ACCESS_KEY" cacheFileSystem="s3-cache"/>
- Click OK to save your changes and close the XML configuration.
- Click OK to save the configuration and close the dialog.
The hash table allows censhare to retrieve file attributes faster. If the hash table is not large enough, the censhare Server needs longer to retrieve file attributes from the file system. This indicates to increase the number of entries in the hash table. You have to do this for every asset file system separately.
Update from earlier versions
If you update your system from earlier versions to censhare 2019.2 or later, and you already use an Amazon S3 filesystem, you must adjust the XML configuration file. Otherwise, the Amazon S3 file system is no longer accessible, and you cannot work with your production files that are stored there.
To adjust the configuration, do the following:
On your censhare host server, go to censhare-Custom/censhare-Server/app/services/filesystem.
Open the
config.xml
file.Search for Amazon S3 filesystem configurations:
XML<filesystem protocol="S3" url="[URL]" .../>
Remove the url attribute:
XML<filesystem protocol="S3" url="" .../>
- Search for the
version
and enter the correct value there. Save the file.
Result
You have configured the censhare Server to use a bucket on Amazon S3 as the file system. If the censhare Server runs as an EC2 instance, you use an IAM role for authentication. If the censhare Server runs outside Amazon AWS, you use IAM keys for the access.