DataStore Configuration
Configurations within the Satellite Configuration Group.
Configurations within the Satellite Configuration Group.
Element: config
Attributes:
@version [ required | fixed: 2 ]
Children:
- all of these elements:
- [0, 1] '→startup'
» startup related settings (e.g. when should the service register) - [0, 1] '→sync'
» synchronization pipeline configuration - [1, 1] '→output-channels'
» Configure content filter - [0, 1] '→compatibility'
» Backward compatibility settings - [0, 1] '→storage'
» Configure storage item handling - [0, 1] '→s3-push'
» Configure AWS s3 usage - [0, 1] '→s3-storage-items'
» Configure AWS s3 access to storage items - [0, 1] '→fileservice'
» Configure local storage - [0, 1] '→query-engine'
» Configure query engine - [1, 1] '→locales'
» Configure supported fulltext locales - [1, 1] '→datastore'
» Configure datastore structure - [1, 1] '→cached-tables'
» Configure db-tables - [0, 1] '→users'
» Configure system user (server connection) - [0, 1] '→update'
» Configure update and synchronization behaviour - [0, 1] '→commit'
» Configure commit behaviour - [0, 1] '→profiling'
» Configure analytics data - [0, 1] '→change-detection'
» Configure change detection (checksum)
- [0, 1] '→startup'
Element: startup
Configure startup behaviour
Attributes:
@registration [ default: OPEN ] ↦ { OPEN | CACHE_READY | SYNC_DONE }
point in startup when the service is registered and available to OC and other services using it
- OPEN new default value, registers the instance right after instanciation and (when the CachedTables are not empty anymore)
- CACHE_READY registers when: Known Values are loaded & Cached Tables not empty
- SYNC_DONE registers when: Known Values are loaded & Cached Tables not empty & initial sync with censhare-Server is done or a timeout occured due to connection problems
@connection-wait-for-seconds [ default: 60 ] ↦ int
timeout in seconds for registration mode "SYNC_DONE"
Element: sync
Configure server synchronization pipeline.
Attributes:
@max-queue-size-mb ↦ positiveInteger
Size limit of each queue, in megabytes. Does not affect queues that are already limited to single item. When not specified, default is 25.
@max-queue-length ↦ positiveInteger
Maximum length of each queue (number of update packages queued). Most queues are limited to single item regardless of this value. When not specified, default is 10.
@connection-wait-for-seconds [ default: 60 ] ↦ int
timeout in seconds for registration mode "SYNC_DONE"
Element: profiling
Configure profiling
Children:
- all of these elements:
- [0, 1] '→query-time'
» Configure profiling by query execution time - [0, 1] '→db'
» Configure profiling by db statistics
- [0, 1] '→query-time'
Element: compatibility
Backward compatibility settings
Attributes:
@xpath-disallow-unanchored-relation-queries-conforming-4.6 ↦ boolean
@disable-search-folder-limit-conforming-4.7.8 ↦ boolean
@verify-result-filter [ default: false ] ↦ boolean
Element: query-time
Profiling by query execution time
Attributes:
@enabled ↦ boolean
Enable this logging
@threshold ↦ decimal
Log queries with longer duration as this millis
Element: db
Profiling by db statistics
Attributes:
@statistics-logging-interval ↦ nonNegativeInteger
Logging interval for statistics in seconds.
Element: output-channels
Restrict content to assets having this output channels
Children:
- sequence of these elements:
- [1, n] '→output-channel'
Element: output-channel
Attributes:
@name [ required ] ↦ string
Element: change-detection
Change detection - checksum based
Attributes:
@hash ↦ { farmhash | siphash | crc32 | adler32 | sha256 | murmur3-32 | murmur3-128 }
Hash algorithm, default is "farmhash" (recommended).
Some algorithms might be slower, but with lower chance of collision (sha256); other ones are faster, but with high chance of collision (crc32).
Remember that checksum size of always limited to first 64 bits!
Children:
- all of these elements:
- [0, 1] '→asset-xml-exclusions'
» Custom exclusions of asset xml parts.
- [0, 1] '→asset-xml-exclusions'
Element: asset-xml-exclusions
Children:
- sequence of these elements:
- [1, n] '→exclude'
» Attribute ignored when computing checksum. By default, no attribute is ignored and these elements constitute blacklist.
It is possible to exclude all attributes by specifying asterisk (*).
- [1, n] '→exclude'
Inner element: asset-xml-exclusions/exclude
Attributes:
@element [ required ] ↦ token
@whole-element ↦ boolean
Children:
- sequence of these elements:
- [0, n] '→condition'
- [0, n] '→exclude'
» Attribute ignored when computing checksum. By default, no attribute is ignored and these elements constitute blacklist.
It is possible to exclude all attributes by specifying asterisk (*). - [0, n] '→include'
» Attribute that is always used when computing checksum. This element makes sense only when there is "exclude" element with "*" (asterisk) value present;
otherwise, all attributes are included by default!
Inner element: asset-xml-exclusions/exclude/include
Attribute that is always used when computing checksum. This element makes sense only when there is "exclude" element with "*" (asterisk) value present; otherwise, all attributes are included by default!
Attributes:
@attribute [ required ] ↦ token
Name of the attribute to include in checksum. One special value is recognized: * (asterisk) include all possible attributes.
Inner element: asset-xml-exclusions/exclude/condition
Attributes:
@attribute [ required ] ↦ token
@value ↦ token
Inner element: asset-xml-exclusions/exclude/exclude
Attribute ignored when computing checksum. By default, no attribute is ignored and these elements constitute blacklist. It is possible to exclude all attributes by specifying asterisk (*).
Attributes:
@attribute [ required ] ↦ token
Name of the attribute to exclude from checksum. One special value is recognized: * (asterisk) excludes all possible attributes.
Element: storage
Configure storage item handling
Children:
- all of these elements:
- [0, 1] '→inline'
» XPath to filter storage items which are inlined in the asset xml - [0, 1] '→index-only'
» XPath to filter storage items which are included in the fulltext index only - [0, 1] '→delete'
» XPath to filter storage items which are not synchronized from the app server - [0, 1] '→stream-only'
» XPath to filter storage items which are not downloaded from the app server, but instead stream on every access of the content
- [0, 1] '→inline'
Element: inline
Attributes:
@xpath [ required ] ↦ string
Element: index-only
Attributes:
@xpath [ required ] ↦ string
Element: delete
Attributes:
@xpath [ required ] ↦ string
Element: stream-only
Attributes:
@xpath [ required ] ↦ string
Element: s3-push
Children:
- all of these elements:
- [1, 1] '→satellites'
» hint duration estimation with a max speed to assume - [1, 1] '→server'
- [1, 1] '→satellites'
Inner element: s3-push/server
Children:
- sequence of these elements:
- [1, n] '→bucket'
Inner element: s3-push/server/bucket
Attributes:
@proxy-protocol ↦ { http | https }
@proxy-host ↦ string
@proxy-port ↦ positiveInteger
@proxy-username ↦ string
@proxy-password ↦ string
@request-timeout [ default: -1 ] ↦ integer
@socket-timeout [ default: 60000 ] ↦ integer
@max-connections [ default: -1 ] ↦ integer
@multipart-threshold [ default: -1 ] ↦ integer
@multipart-minimum-size [ default: -1 ] ↦ integer
@resume-multipart [ default: false ] ↦ boolean
@{group '→s3-bucket'}
Inner element: s3-push/satellites
Attributes:
@default-bucket-name [ required ] ↦ string
@default-bucket-region [ required ] ↦ string
@default-access-key ↦ string
@default-secret-key ↦ string
@url-override ↦ string
override default endpoint url
@accelerated-mode [ default: false ] ↦ boolean
Use accelerated-mode for S3 access
@max-bytes-per-second [ default: -1 ] ↦ integer
hint duration estimation with a max speed to assume
@path-style-access [ default: false ] ↦ boolean
use path style access to bucket
Children:
- sequence of these elements:
- [0, n] '→bucket'
» hint duration estimation with a max speed to assume
- [0, n] '→bucket'
Inner element: s3-push/satellites/bucket
Attributes:
@instance-region [ required ] ↦ string
@max-bytes-per-second [ default: -1 ] ↦ integer
hint duration estimation with a max speed to assume
@path-style-access [ default: false ] ↦ boolean
use path style access to bucket
@{group '→s3-bucket'}
Element: s3-storage-items
Configure AWS s3 access to storage backend (deprecated by filesystems/s3-filesystem)
Attributes:
@bucket [ required ] ↦ string
Select bucket to use in @region
@region [ required ] ↦ string
Select the system AWS region.
@access-key ↦ string
Provide api-key for bucket
@secret-key ↦ string
Provide api-secret for bucket
Element: fileservice
Configure local storage
Attributes:
@work-filesystem-url ↦ string
Local path to datastore storage
@skip-filesystem-check [ default: true ] ↦ boolean
Skip check on startup?
Children:
- [0, 1] all of these elements:
- [1, 1] '→filesystems'
Element: filesystems
Configure storage
Attributes:
@default-mode [ default: greedy ] ↦ { greedy | lazy }
- greedy Sync from server immediately
- lazy Sync from server on first access
Children:
- sequence of these elements:
- [1, n] choice of these elements:
- [0, n] '→standard-filesystem'
» Configure local storage - [0, n] '→s3-filesystem'
» Configure AWS S3 storage
- [0, n] '→standard-filesystem'
- [1, n] choice of these elements:
Element: standard-filesystem
Configure local storage
Attributes:
@name [ required ] ↦ NCName
@mode ↦ { greedy | lazy }
- greedy Sync from server immediately
- lazy Sync from server on first access
Element: s3-filesystem
Configure AWS S3 storage shared with server. Remember to always set s3-streaming, "true" is the correct value in 99% of cases!
Attributes:
@name [ required ] ↦ NCName
@mode ↦ { greedy | lazy }
if 's3-streaming' ignored
- greedy Sync from server immediately
- lazy Sync from server on first access
@s3-region [ required ] ↦ string
Select the system AWS region.
@s3-bucketName [ required ] ↦ string
Select bucket to use in @region
@s3-accessKey ↦ string
Provide api-key for bucket
@s3-secretKey ↦ string
Provide api-secret for bucket
@s3-streaming [ default: false ] ↦ boolean
Use direct S3 access (evade server). The default is impractical, you want to set this to true in 99% cases!!
@url-override ↦ string
override default endpoint url
@accelerated-mode [ default: false ] ↦ boolean
Use accelerated-mode for S3 access
@path-style-access [ default: false ] ↦ boolean
use path style access to bucket
Element: locales
List the locales available to fulltext search
Attributes:
@default [ required ] ↦ string
Children:
- sequence of these elements:
- [0, n] '→locale'
Element: locale
Attributes:
@name [ required ] ↦ string
Element: query-engine
Configure query engine
Children:
- all of these elements:
- [0, 1] '→ignore-relation-sorting'
» Configure relations for which the sorting should be ignored and relevances used instead. First match is used.
- [0, 1] '→ignore-relation-sorting'
Inner element: query-engine/ignore-relation-sorting
Configure relations for which the sorting should be ignored and relevances used instead. First match is used.
Children:
- sequence of these elements:
- [0, n] '→relation'
» Relation type, full and normalized; asterisk at the end is supported.
- [0, n] '→relation'
Inner element: query-engine/ignore-relation-sorting/relation
Attributes:
@type [ required ] ↦ token
Relation type, full and normalized; asterisk at the end is supported.
@direction ↦ { child | parent }
Direction (default = both directions)
@ignore-sorting [ default: true ] ↦ boolean
What does this match mean: false is used to use the standard behaviour.
Element: datastore
Configure datastore structure
Attributes:
@url ↦ string
Local path to storage
@file-mode [ default: write ] ↦ { read-only | deferred-write | write | write-sync | write-sync2 }
@max-tree-node-size ↦ integer
@tree-node-size-threshold ↦ integer
@node-cache-size ↦ integer
number of nodes in cache, usally 1000000 is a good guess
@node-cache-size-mb ↦ integer
size in mb, needs to be bigger then '@file-size'
@data-cache-size ↦ integer
number of data chache entries,usally 1000000 is a good guess
@data-cache-size-mb ↦ integer
size in mb
@file-sync-interval ↦ integer
sync files to disk ever n milli seconds
@file-size ↦ integer
file size in bytes. Must be big enough to hold the biggest index of db in one file. Must be smaller than '@node-cache-size-mb'.
@cleaner-interval ↦ integer
how often should the cleaner run
@cleaner-usage-threshold ↦ integer
target ratio tracked size / filesize, usually 30%
@cleaner-max-files-to-evict-per-run ↦ integer
how many files can be cleaned per run
@deflater-level ↦ integer
compression level
@checkpoint-interval ↦ integer
how often should checkpoints written to the cdb files (ms) - from these points sync to the app server will start again after a crash
@nb-backups [ default: 0 ] ↦ integer
@oc-backup-interval [ default: 0 ] ↦ integer
@oc-group-mode-url ↦ string
@{anyAttribute}
any additional attribute allowed
Children:
- all of these elements:
- [1, 1] '→features'
» Define feature indices and structures - [0, 1] '→functions'
» - [0, 1] '→inlineElements'
» replace this elements with "" instead of " " for block elements. (e.g. tags like "italic" cann start inside a word and should not lead to splitting the word)
- [1, 1] '→features'
Element: features
Define feature indices and structures
Attributes:
@default-hierarchical-feature-mode [ default: store-flat-and-hierarchical ] ↦ { store-flat | store-hierarchical | store-flat-and-hierarchical }
@index-disabled-by-default [ default: false ] ↦ boolean
if true all indices need to be added and configured manually; if false all features have an index with default configuration
@skip-rebuild-on-feature-config-change [ default: false ] ↦ boolean
if true: either complete CDB needs to be rebuild manually or specific indices need to be rebuild by server command on change
@add-disabled-features-to-registry [ default: true ] ↦ boolean
if true: features with disabled index won't be added to the index registry, this will make queries using that feature fail instead of returning 0 results
Children:
- sequence of these elements:
- [1, n] '→feature'
» Structure settings
- [1, n] '→feature'
Inner element: features/feature
Attributes:
@key [ required ] ↦ string
Feature key
Children:
- sequence of these elements:
- [1, 1] '→index-config'
» Structure settings
- [1, 1] '→index-config'
Element: fulltext
The fulltext search uses fulltext indices configured by this element.
The fulltext search splits the searchinput in terms. Each term is matched in the indeces of contained words using prefix- and infix expansion.
Terms are ignored if they are shorter than @min-length.
Word matching stops after a limit given by @prefix-expansion-limit, fulltext@infix-expansion-limit.
If words found is less than @fuzzy-skip-results a fuzzy expansion with near-by words done to create more hits.
The search index from text content is by default feature key="censhare:text.content".
The search index from feature values is by default feature key="censhare:text.meta".
Search queries can be configured to use multiple indices. Indices were searched for words matching term, words are expanded to assets carrying them.
The minimal length of an searchterm is configured by attribute @min-length="3".
Attributes:
@auto-suggest-size ↦ integer
gives the maximum size for AltResults with auto-suggestion on
@min-length ↦ integer
minimum char length of term to start search
@stopword-list ↦ string
Use StopWord list to ignore terms
@fuzzy-max-distance ↦ integer
if search word result is filled with fuzzy hits: define max distance
@fuzzy-min-length ↦ integer
Minimum term length for fuzzy expansion
@fuzzy-ngram-length ↦ integer
N-gram size for prefix and fuzzy candidate lookup. Changing this value requires a database rebuild.
@fuzzy-expansion-limit ↦ integer
max search words fitting your searches are expanded from the known values
@infix-expansion-limit ↦ integer
max search words fitting your searches are expanded from the known values
@prefix-expansion-limit ↦ integer
max search words fitting your searches are expanded from the known values
@fuzzy-min-alpha ↦ integer
Minimum length of non-numeric sequence to enable fuzzy search.
@max-bitap-errors ↦ integer
Maximum errors for fuzzy expansion.
@fuzzy-skip-results ↦ integer
Create no searchwords from fuzzy then searchwordlist length is greater.
@fuzzy-end-results ↦ integer
Number of results required to end fuzzy search.
@term-split-chars ↦ string
Split input into terms by these chars
@term-join-chars ↦ string
characters that join terms that were split by the WordBreaker. Changing this value requires a database rebuild.
@term-keep-chars ↦ string
(single) characters to keep in joined terms that were split by the WordBreaker. Changing this value requires a database rebuild.
@altavista-syntax ↦ boolean
Use AltaVista Syntax instead of simple search heuristics.
@scale ↦ float
@relevance-factor-prefix ↦ float
Relevance reduction for prefix vs exact match.
@relevance-factor-distance0 ↦ float
Relevance reduction for infix vs exact match.
@relevance-factor-distance-step ↦ float
Relevance reduction for each step of fuzzy editing distance.
@relevance-mode ↦ { BM25 | BM25FLAT | COSINE | COUNT }
relevance calculation mode
@stemming ↦ boolean
En-/Disable stemming
@strip-diacritics ↦ boolean
strip diacrictis from chars; example: à -> a
@expand-umlauts ↦ boolean
expand german umlauts: example: ä -> ae
@storage-item-filter ↦ string
filter storage items to index
@expand-junction-operator [ default: AND_NOT_EMPTY ] ↦ { AND | AND_NOT_EMPTY }
search term operation mode for multiple words contained
- AND combine all words in searchterm by AND; a word without hit leads to empty result
- AND_NOT_EMPTY combine all words in searchterm by AND if hits for word exist; a word without any hit is ignored for search
Children:
- sequence of these elements:
- [0, n] '→feature'
» included feature - [0, 1] '→character-mapping'
» mapping for characters before indexing / searching
- [0, n] '→feature'
Inner element: fulltext/feature
included feature
Attributes:
@key [ required ] ↦ string
included feature id
@mode [ default: content ] ↦ { lookup | content | fulltext | reference }
include mode
- lookup Feature type 'enumeration' and 'hierachical' only; do not index the technical keys, but the values from the cached tables
- content Use to include content of feature
- fulltext Use if feature is fulltext index itself
- reference Use if feature is asset ref feature. you also need a configured fulltext index for censhare:text.name and a not-null index for censhare:resource-key
@relevance ↦ decimal
@scale ↦ decimal
Inner element: fulltext/character-mapping
mapping for characters before indexing / searching
Children:
- sequence of these elements:
- [0, n] '→mapping'
» e.g. map ä to ae
- [0, n] '→mapping'
Inner element: fulltext/character-mapping/mapping
e.g. map ä to ae
Attributes:
@source [ required ] ↦ string
@target [ required ] ↦ string
Element: index-config
Hints on index build
Attributes:
@disabled ↦ boolean
En-/Disable index (avoids building)
@isContentFulltext ↦ boolean
@default-fulltext-index ↦ boolean
Is this default index to use on fulltext searches?
@content-filter-xpath ↦ string
XPath to restrict index to part of content; example '//website/teaser/teaser-text'
@notnull-column ↦ boolean
create not null index
@isnull-column ↦ boolean
create is null index
@collator-columns ↦ boolean
@numeric-trie-bits ↦ nonNegativeInteger
@sort-value ↦ boolean
Is value used for sorting?
@invert-range ↦ boolean
invert-range kept for compatibility. However it should be deprecated.
@type ↦ { pair | range | split-pair | coordinates | numeric-trie | fulltext | virtual-fulltext | direct-value | hierarchical }
@hierarchical-feature-mode ↦ { store-flat | store-hierarchical | store-flat-and-hierarchical }
Children:
- sequence of these elements:
- [0, n] '→fulltext'
» - [0, n] '→hint'
»
- [0, n] '→fulltext'
Element: inlineElements
Define XML elements which are considered "inline" e.g. "italic" (replaced before indexing with "", versus block elements which are replaced by a word breaking character)
Children:
- sequence of these elements:
- [1, n] '→inlineElement'
Element: inlineElement
Attributes:
@name [ required ] ↦ string
Element: hint
Children:
- sequence of these elements:
- [0, n] '→property'
»
- [0, n] '→property'
Inner element: hint/property
Attributes:
@name [ required ] ↦ string
@value [ required ] ↦ string
Element: functions
Children:
- sequence of these elements:
- [0, n] '→function'
»
- [0, n] '→function'
Inner element: functions/function
Attributes:
@name [ required ] ↦ string
@expr [ required ] ↦ string
Element: cached-tables
Configure cached tables available to satellite applications
Children:
- sequence of these elements:
- [0, n] '→cached-table'
Element: cached-table
Attributes:
@name [ required ] ↦ string
@writable [ default: false ] ↦ boolean
Element: users
Configure user context to use with server connections.
Attributes:
@defaultuser [ required ] ↦ string
Server user to use by default
Children:
- sequence of these elements:
- [1, n] '→user'
Element: user
Attributes:
@name [ required ] ↦ string
Element: update
Configure update and synchronization behaviour
Attributes:
@limit ↦ nonNegativeInteger
package size of id packages while syncing
@timeout ↦ nonNegativeInteger
Connection timeout setting
@ignore-live-tags ↦ boolean
En-/Disable support for live tagging (must be supported by server)
@enforce-tsn-based-updates ↦ boolean
If supported by server use map based updates for data syncronisation.
@enabled [ default: true ] ↦ boolean
Disable the update mechanism (inhibit content updates)
@worker-extended-logging [ default: false ] ↦ boolean
OCDatatStoreService.Worker extended logging (VERBOSE!)
@manager-extended-logging [ default: false ] ↦ boolean
OCDatatStoreService extended logging (VERBOSE!)
Element: commit
Configure asset commit settings
Attributes:
@fast [ default: false ] ↦ boolean
If enabled satellite uses local transaction but commits to server 'async'. For some usecases race conditions may lead to inconsistent states.
@virus-check [ default: false ] ↦ boolean
If enabled and supported by the censhare-Server an virus check is executed on the censhare-Server before commit
@virus-check-timeout-in-seconds [ default: 120 ] ↦ positiveInteger
timeout for each file which needs to virus checked
(link)
AttributeGroup: s3-bucket
@name [ required ] ↦ string
@region [ required ] ↦ string
@access-key ↦ string
@secret-key ↦ string
@url-override ↦ string
override default endpoint url
@accelerated-mode [ default: false ] ↦ boolean
Use accelerated-mode for S3 access
The Datastore Configuration configures an instance of the local datastore.
Multiple Datastore Configurations may be present in one configuration group, but in general sharing one is preferred.
A satellite uses his own datastore at runtime. This datastore is a partial copy of the servers datastore.
At startup the datastore is updated to the servers datastore state (synchronized). At runtime the satellite is informed of updates on the server by an event system.
The content of the satellite's datastore can be classified as technical assets needed, assets matching configured output-channels and their storage items wanted.
Asset data and storage item content may use different storage backends.