Introduction
Vessel Insight offer self-service configuration of Data Quality validation to let consumers of data be aware of any data quality issues. The rules have to be configured by authorized individuals in the shipowner organization.
By default there are no rules configured to validate any sensor data.
Currently, we offer 3 validation rules which are evaluated in real-time as data are ingested into the Vessel Insight cloud platform:
- rangeLimit shows Data Range errors, configured with a min/max range tolerance
- timeFreeze shows frozen timings errors, Configured with a number of repeated timestamps to tolerate
- dataFreeze shows data freeze errors , configured with a time period tolerance for repeated values
After configuring the rules for a vessel, it can respond with data quality information in each request for aggregated data from the vesselsensoreventsbypath endpoint as shown in the following example.
This is the curl for the endpoint:
curl --location 'https://api.kognif.ai/galore/v2/timeseries/vesselsensoreventsbypath'
\--header 'Ocp-Apim-Subscription-Key: [YOUR-OCP-APIM-SUBSCRIPTION-KEY]'
\--header 'Authorization: ••••••'
\--data '{<body>]}
The body :
{
"from":"2024-12-02T09:00:00.000Z",
"to":"2024-12-02T11:00:00.000Z",
"interval":"1h",
"resamplingMethods":[
"avg"
],
"paths":[
"/Fleet/imoXXXXXXX/Aux_Engines/3/Generator_Load"
],
"includeDataQuality":"true"
}
The setting "includeDataQuality" is by default "false". The same apply when the parameter is omitted.
Response:
{
"intervalInSeconds": 3600,
"resamplingMethods": [
"avg"
],
"time": "2024-12-02T09:00:00Z",
"interval": "1h",
"columns": [
{
"path": "/Fleet/imo9307633/Aux_Engines/3/Generator_Load",
"unit": "%",
"samples": {
"avg": [
0,
23.480952380952377,
0
]
},
"dataValidationErrors": {
"rangeLimit": [
0,
2,
0
],
"timeFreeze": [
0,
0,
0
],
"dataFreeze": [
0,
0,
0
]
}
}
],
"errors": []
}
The section dataValidationErrors includes information on observed data quality errors per interval in the queried time period.
Filter bad data
Currently, there are no cleansing or filtering on server-side, so consumers need to do the necessary to avoid bad data in downstream use. The recommendation is to use a reasonable short time interval, e.g. 10 minutes, and exclude any intervals with data quality errors from further processing.
Summary data
Summary endpoint is available to get vessel-level data quality data:
Rules metadata
Rules metadata will be available in a future release.