HCMS Request Logging
Explains how to configure custom request logging.
Each HCMS request can be logged as a one or more records in log file or one or more records in the statistic table on the censhare server.
The later option allows application to collect statistics about its usage. For example, HCMS Client framework uses this feature to track number of logged users for audit purposes.
Request logging is, by default, disabled and must be explicitly configured in the HeadlessCMS configuration xml.
Commandline tool supports only one special case (HCMS Client framework user tracking). Generic support for custom logging configuration is planned for one of future releases; until then, direct XML editing is necessary.
Configuring loggers
Request logging is enabled by adding element <request-logging> into element <api>. This element contains one or more loggers, for example:
<api>
<request-logging>
<log level="800" logger="access_log"/>
<statistics service="hcms" group="hcms-client.user-tracking" response-header="true">
<condition value-of="userId"/>
</statistics>
</request-logging>
</api>
- At least one logger must be always specified; empty <request-logging/> is invalid configuration and refused by the XSD schema.
- Loggers are always applied in the order of their definition. This is not important for statistics logger, but it does make a difference for the logfile one.
- Only two loggers are implemented so far:
- <log writes single message to satellite log file via JUL logger.
- <statistics sends data to censhare server, which stored them to statistic table.
Each logger can optionally contain conditions (condition element) that define when it is used. If there are no conditions defined, the logger is always used; otherwise, the request is logged only if at least one of the conditions matches (in other words, the conditons are connected by boolean OR).
Each condition is represented by <condition/> element with following attributes:
- value-of (required): Name of the log-value that is tested. By default, condition match if the value is present.
- value-if-missing: Default value used if there is no real value available in request. This attribute makes sense only in conjunction with attribute regex.
- regex: Regular expression used to evaluate the value. When missing, any value is accepted; this is the same as specifying .* expression.
- Note that even number values are evaluated as string.
- negate: When set to true, the condition is inverted and match if the value is missing or it does not match the regular expression.
Logfile logger
This logger is always available, with no prerequisite. It uses the internal logging infrastructure based on JUL logger that writes to logfile. In case of docker deployment, this log is written to standard output and can be available by standard tools.
Element is named log and its attributes are:
- level (required): Log level as defined by the JDK API. Typical value is 800, which is translated to INFO.
- logger (required): Name of the logger.
- format: String used to format log message. When missing, default format is compatible with standard log format.
- Everything outside curly braces is used as a verbatim text.
- Curly braces mark placeholders that are replaced by log values from request.
- For list of available values, see '→special section'.
- All values are treated as strings and missing value is treated as empty string.
- In the simplest form, placeholder is just value name in curly bracers: {userId} renders to something like 12343 or ``
- Default value can be added after colon: {status:unknown} renders to something like 12343 or unknown
- Second colon can be added to pad the value by spaces (from left) to reach specified number of characters: {status:unknown:10} renders to 12343 or unknown
- Negative size value is used to specify padding from right: {status:unknown:-10} renders to 12343 or unknown
- Backslash is used to escape special values (brackets, backslashes, colons inside placeholder).
- The default format is: {remoteAddr:-} - {userId:-} [{timestamp/access_log}] "{method} {url} HTTP/1.1" {status}
Statistics logger
This sends data to censhare server and has several pre-requisites:
- The censhare server must have service OCStatisticsService enabled.
- This is automatically done by the <strong>hcms</strong> commandline tool commands server enable and configuration create, but only in version 2.0 or higher. Earlier versions did not enable this service. When upgrading existing hcms installation, it is recommended to run server enable again.
- Implementation is part of the com.censhare.oc.statistics bundle, which is part of HCMS installation (both standalone and docker). Special bundle-only installations, however, need to take care of this dependency explicitly.
- Statistics bundle has its own configuration XML assets with <factorypid>com.censhare.oc.statistics.StatisticsService</factorypid>; each configuration can contain any number of groups.
- Group can contain any number of attributes, as allowed by the appropriate xsd.
- Attribute name is very important: it is directly used as a name of the '→log value' and filled by appropriate value.
- Several instances can be configured at once, each being identified by the content of <instanceid><instanceid> element.
Element is named statistics and its attributes are:
- service (required): Instance id of the statistics service. This is the value of instanceid element in the statistics service configuration xml.
- group (required): Group id in the statistics service configuration.
- strict: Boolean value that enables strict error handling mode. In the strict mode, failure to assemble data group and send it to server cause whole HCMS request to fail.
- Default behavior (false) just ignores such error. This is recommended setting in most cases.
- response-header: Boolean value that enables reporting of this logger in the HTTP response.
- Each <statistics response-header="true" logger with enabled adds one instance of response header x-cs-ocstatisticslogger. Note that the header can be present several times.
- Value of the header is in a form service.group=result, with result being one of four values:
- logged: Success.
- unavailable: Specified service/group combination does not exist. Either there is no such statistics configuration, or it does not contain specified group.
- skipped: Condition is specified, and it did not match. Nothing was logged.
- failed: Attempt to write group failed, with exception available in the log. This is usually caused by an invalid configuration or server failure.
- Default value is false, no such header is added.
- This special feature can be used to ensure that the statistics logging is properly configured. Client/frontend application can check this header and react on its absence or unexpected value.
On each request:
- Conditions are evaluated. If there is one or more conditions specified and non of them matches, the logger ends with status skipped.
- Statistics service with given id (service attribute) is looked up. If there is no such service instance, the logger ends with status unavailable.
- Group with given id (group attribute) is found in the statistics bundle instance. If there is no such group, the logger ends with status unavailable.
- Statistics service is used to create new group. Failure is interpreted as unavailable result.
- Start and end timestamps are filled with request start and end.
- Note that request end does not contain response data streaming. This might be significant in case of large data transfer (storage item download).
- Note that request end does not contain response data streaming. This might be significant in case of large data transfer (storage item download).
- For each attribute defined in the group, corresponding value is found and set.
- If the value is not available, attribute is left empty (null).
- If the attribute is long or asset, value is automatically converted to long integer number. String are parsed as decimal number. If the value cannot be converted, it is considered missing.
- Filled group is committed to the statistics service bundle, which adds it to queue and eventually sends to the server.
- Note that the data group is usually not sent immediately and synchronously - that would make HCMS completely unresponsive.
- Note that the data group is usually not sent immediately and synchronously - that would make HCMS completely unresponsive.
When the data group arrives at the server (asynchronously), it is written as single record in the statistic table. This table can be accessed from XSLT by special cs:statistics() function or processed by one of the automations in module statistics.
Available values
There are many values available to be included in the log messages/records. Some of them are hardcoded (name is just a pre-defined string), some of them are dynamic (name contains some dynamic part).
Hardcoded values from HTTP request/response:
- method: HTTP method (GET, PUT, POST, OPTIONS)
- url: Full request URL (including query parameters).
- Note that if the HCMS is behind via reverse proxy (escpecially TLS-terminating one), the WebServer configuration must be configured to trust reverse proxy (forwarded and forwarded-levels). Otherwise, this URL would have wrong (internal) hostname and, in case of TLS-terminating proxy, even wrong protocol (http instead of `https).
- uri: Full request URI; this is basically just an alias for url.
- host: Host (hostname) part of the URL. Port is not included.
- port: Port specified in the URL, if any.
- protocol: Protocol part of the URL - usually https or http.
- baseUri: Base URI of the REST API, without endpoint-specific part. For valid HCMS requests, this always ends by /hcms/vX.Y (for example /hcms/v1.0/, /hcms/v1.11 or /hcms/v2.1/)
- path: Endpoint path, relative to baseUri. Note that it is not full URL path! For example, list of entities has path: entity
- query: Query part of the URL (the part after ?), in its raw form (url encoded). Missing if there is no such part.
- queryDecoded: Query part of the URL (the part after ?), url encoded. This value is more convenient to read, but cannot be directly used to construct url.
- userId: Asset id of authenticated user. Only present if valid JWT token is present and its sub claim contains id of existing asset.
- userName: Name of authenticated user. Unlike userId, this is also present in case of HTTP Basic authentication (in that case, this is the login name).
- remoteAddr: IP address of the remote client.
- Note that if the HCMS is behind via reverse proxy (load balancer), the WebServer configuration must be configured to trust reverse proxy (forwarded and forwarded-levels). Otherwise, this value would always contain just the address of that proxy.
- Warning: Keep in mind that IP address is considered "personal data" for legal reasons (GDPR and similar)! Always make sure records containing it are properly handled (and deleted or anonymized as soon as possible).
- remoteHost: Hostname of the remote client. This is usually the same as remoteAddr, except it might be DNS name instead of the IP address.
- remotePort: TCP port of the remote client. This value is rarely useful.
- status: HTTP response status code (as an integer number). Example: 200.
- statusMessage: HTTP response status message (the string message after status code). Example: OK
- timestamp: Request (start) timestamp; string form is the same as timestamp/utc
- timestamp/<format>: Request date and time, in specified format (see below).
- requestEnd: Request end timestamp; string form is the same as requestEnd/utc
- Note that this timestamp represents moment of sending response status and headers. Message body is sent after this timestamp and it might take a significant amount of time.
- requestEnd/<format>: Request end timestamp, in specified format (see below).
- duration: Request duration, as a <strong>java.time.Duration</strong> (string format is defined by ISO-8601).
- Note that this duration does not include streaming of response body!
- duration/millis: Request duration (without response body), in milliseconds.
- duration/nanos: Request duration (without response body), in nanoseconds. Note that the measurement is not done with nanosecond precision.
Date/time (timestamp) values can be formatted by one of several predefined formats, specific after slasl. Examples are given for 2020-06-09T07:56:48.701007Z in CEST time zone.
- <value>/iso: Date and time, in ISO 8601 format and local time zone. Example: 2020-06-09T09:56:48.701007+02:00
- <value>/utc: Date and time, in ISO 8601 format and UTC time zone. Example: 2020-06-09T07:56:48.701007Z
- <value>/access_log: Date and time, in the format used by standard log format. Example: 09/Jun/2020:09:56:48 +0200
- <value>/rfc1123: Date and time in RFC 1123 format. Example: Tue, 9 Jun 2020 09:56:48 +0200
- <value>/local_date: Date (without time), in ISO 8601 format. Example: 2020-06-09
- <value>/millis: Date and time, represented as number of milliseconds since 1970 epoch ("Java time") 1591689408701
- <value>/unix: Date and time, represented as number of seconds since 1970 epoch ("Unix time")1591689408
Dynamic values from HTTP request and response provides a way to get value of single header or query parameter. If the corresponding header or query parameter is present multiple times, the result string is created by concatenating values by comma (,).
- requestHeader/<headerName>: Request header.
- Header name part is always lowercase, regardless of the case actually used in request.
- Example: requestHeader/host resolves to value localhost:8080
- responseHeader/<headerName>: Response header.
- Header name part is always lowercase, regardless of the case actually used in response.
- Example: responseHeader/content-type resolves to application/json
- query/<paramName>: Query parameter from request URL.
- Unlike headers, query parameters are case-sensitive; the paramName part might contain both uppercase and lowercase characters.
- Example for some listing query: query/query resolves to date>"2020-06-01" and query/order resolves to date
Special dynamic values are available for requests with valid JWT tokens:
- jwt/claim/<claimName>: Value of one particular claim.
- Simple claim value are resolved directly to the value.
- For example: jwt/claim/iat resolves to 1591692041
- If the claim is an array, it's string representation uses standard Java toString.
- For example: jwt/claim/roles resolves to [auth] or [management_full, person_full]
- When used in queries, the value is actually treated as an array (and = operator is effectively IN).
- If the claim is an object, it's string representation uses standard Java toString.
- Since version 4.0, all properties in this object are available as their own variable, with a name having additional /<propertyName>. This is done recursively.
- For example: in Keycloak-generated token, realm roles are available as jwt/claim/realm_access/roles and resource-access roles are available as jwt/claim/resource_access/account/roles (both are arrays).
- Since version 4.0, all properties in this object are available as their own variable, with a name having additional /<propertyName>. This is done recursively.
- Simple claim value are resolved directly to the value.
Special dynamic values are available for requests with valid session (see session support):
- session/id: Session id.
- Available if and only if the request has a valid session.
- session/createdAt: Timestamp of the session begin.
- Optional; available only if the request has a valid session and the webhook result returned createdAt in its response.
- session/expireAt: Expiration timestamps.
- Optional; available only if the request has a valid session and the webhook result returned expireAt in its response.
Some REST API endpoints add their own special values:
- schema: Name of the entity schema.
- id: Entity id.
Note that these values are available only in very limited set of circumstances (pretty much only in case of single entity CRUD).
change