User tracking
What is user tracking in HCMS Client
Each HCMS Client instance continuously tracks concurrently logged users. Results are accessible in the Reporting module of the application.
User tracking must be always enabled as explained below. Disabling or removing it is a breach of the license contract. It is important to check if the application usage conforms to the acquired license.
The present article explains what is tracked and how.
What is tracked
Personal data collection
Warning: the present section is just technical notes and not a legal statement. Compliance with GDPR and other regulations must be always evaluated on a case-by-case basis.
No usual sensitive personal data is collected. Personal data, as defined by the GDPR, is collected and processed. Therefore, this should be reflected in the terms of use of your application.
User activity, such is information about visited pages and performed actions in the web application, is recorded and kept for one day. In cases this is problematic, e.g., it might be used to reverse-engineer sensitive personal data, please remove all attributes marked as optional in the overview.
User identification and anonymization
Users are identified by the asset id of their account as stored in the censhare system. Removing the account asset is sufficient to make all these records completely anonymous.
As an additional step, the userId
attribute can be removed from the detailed XML documents stored in the module.statistics.hcms.daily.
module in the censhare system, without affecting the Reporting module in the HCMS Client itself.
User sessions
It is important to understand that the implemented mechanism stores user sessions, not just users. Technically, it is possible for the same user to be logged in the application more than once at the same time, e.g., from different browsers or devices. These instances are counted as different user sessions, assuming that these are different people sharing the same account.
From a technical perspective, a user session is identified by the following parameters in the logging configuration of the respective HCMS instance.
Configuration parameter | Definition |
---|---|
userId | user account id |
jwt/claim/cs:logat | login timestamp |
jwt/claim/iss | HCMS Client instance id |
For the full list of logged attributes please see below.
User tracking mechanism
User tracking and data processing includes the following stages.
Logging stage
The user tracking is based upon the standard request logging feature of the HCMS and, consequently, is also performed by the HCMS, not the Client.
Any user action in the web application involves a REST API request to the backend. On the logging stage, those HCMS REST API requests are recorded, and selected attributes are stored in the censhare standard database (Postgresql or Oracle). Only requests done by authorized users are logged.
It is possible to use these records to collect statistics about application usage. This is not provided in the current version and must be developed as part of a custom project.
Attributes logged
Optional attributes can be skipped if you have concerns about unnecessarily collecting sensative personal data. They are deleted anyway on the summary step of the data processing; disabling them from the very beginning won't break the user tracking.
Attribute | Definition | Required | Used for |
---|---|---|---|
userId | account ID of the logged user (asset ID of the person.account. asset) | yes | necessary to identify users |
jwt/claim/exp | expiration time of JWT token | yes | to compute effective logout |
jwt/claim/cs:logat | login time | yes | to distinguish separate session of the same user account |
timestamp/millis | log record timestamp | yes | to ensure that correct timestamps are used even if the satellite is disconnected |
jwt/claim/iss | JWT token issuer; this value actually contains id of the HCMS Client instance (portalId ) | yes | to separate users of different HCMS Client instances; without this value, they would be mixed together |
path | the path component of the request URL | optional | generic usage statistics |
host | the host component of the request URL | optional | useful if the HCMS Client is actually running on different domains |
schema | entity schema used by the HCMS request, if any | optional | - |
id | id of the entity used by the HCMS request, if any | optional | together with schema and path , it can be used to collect usage statistics of various parts of the application |
status | HTTP status of the response to this request | required | Only records with this value present are considered valid by the "summary" step, but the value itself is thrown away. |
Note Although the HCMS CSK provides out-of-the-box attributes for advanced usage statistics (path
, schema
, id
), such functionality needs to be built as a custom implementation.
Summary stage
At this stage, individual records about made requests are summed up to derive user sessions from them and are consequently replaced by those summaries. For each session, only the first and the last requests are kept. Based on them, a record of the session, with the start and end time, is created. The log records of individual requests are no longer required and therefore deleted.
The whole processing is done by an automation, a configuration feature that is always available in the censhare Server. It is enabled as part of the server preparation and can be found in the censhare Admin Client, in the module "Statistics", named "HCMS Client - User Tracking Cleanup".
The automation is executed once per day, at 2:30 AM by default, and processes all records collected during the previous calendar day.
Final stage
At this stage, all sessions for the past day are counted and stored in assets. For each calendar day, a special asset of type module.statistics.hcms.daily.
is created in the censhare system. This asset is mapped with the entity user_stats_daily
in the HCMS. The asset contains data for all HCMS Client instances connected to the given censhare Server. For each instance (portal), the maximum number of concurrently active user sessions is stored as a feature. The storage file then contains an XML document with a detailed list of all sessions including their beginning and end (login and logout), sorted by login time for convenience. It also contains precomputed data points to draw a detailed graph of user sessions over the course of the day, with each data point being one login or logout. Data points are generated by an algorithm similar to the sliding window. This means that the final number of sessions is not equal to that in the raw data; this is normal.
Below is an example of a very simple storage document for such statistics, with only one user logged in during the day:
<portal portalId="release-demo" max-concurrent="1">
<sessions>
<session userId="113226" portalId="release-demo" start="2020-08-04T13:32:56.032Z" end="2020-08-04T14:37:57.456Z" timeout="2020-08-04T14:52:57Z"/>
</sessions>
<graph>
<start count="1" at="2020-08-04T13:32:56.032Z"/>
<end count="0" at="2020-08-04T14:37:57.456Z"/>
</graph>
</portal>
The final processing and storing of data is performed by an XSLT transformation. Its setup and deployment are done as part of the server preparation; it can be found in the module "Transformation", named "Count concurrent active users in HCMS Client".
Enable user tracking for HCMS CSK instance
Usually, there is also no need to activate user tracking. It is created automatically on one of the server prepraration steps - registering the HCMS instance - by the following part of the general configuration create
command:
configuration create <id> .... --user-tracking true ...
Warning Please do not change the default configuration! HCMS request logging allows to create custom configurations, but for a successfully working user tracking you have to use the out-of-the-box configuration.
You can check the active configuration any time by running the configuration inspect
command. It will list all included attributes. In case you did not activate the user tracking as part of the installation, you can do it afterwards using this command:
configuration update <portalid> --user-tracking true
Below you will find more details about the user tracking confguration.
How user tracking configuration is stored in HCMS
One part of the configuration is stored in the HeadlessCMS.xml
:
<request-logging>
<statistics service="hcms" group="hcms-client.user-tracking" response-header="true">
<condition value-of="userId"/>
</statistics>
</request-logging>
and the other in the Statistics.xml
configuration:
<groups>
<group id="hcms-client.user-tracking">
<attribute name="userId" type="asset" column="asset_id"/>
<attribute name="jwt/claim/exp" type="long" column="value_long_0"/>
<attribute name="jwt/claim/cs:logat" type="long" column="value_long_1"/>
<attribute name="timestamp/millis" type="long" column="value_long_3"/>
<attribute name="jwt/claim/iss" type="string" column="value_string_0"/>
<attribute name="path" type="string" column="value_string_1"/>
<attribute name="host" type="string" column="value_string_2"/>
<attribute name="schema" type="string" column="value_string_4"/>
<attribute name="status" type="long" column="value_long_2"/>
<attribute name="id" type="long" column="value_long_4"/>
</group>
</groups>
Possible tracking failures
User tracking may get delayed or fail in the following cases. Please read about possible consequences.
Request logging stage
If the satellite is disconnected, the logged records are collected in its memory and sent to the Server when the connection is available again.
Summary stage
If the automation is not executed for some reason, the user reporting will still work. The result is a smaller number of summary records containing only the information about user sessions and number of requests done in that session. However, an unnecessary amount of records are stored in the database which has two consequences:
- Technical: The performance can be much worse (much worse if this automation never runs at all).
- Legal: These records can be used to reconstruct complete activity of users over longer periods of time, which might not be part of the terms of use of the application.