Data analysis

These schemas specify how the data analysis component functions, by defining configuration, fields, definitions, queries and source tracking information.

Data analysis BigBoat status fields

https://gros.liacs.nl/schema/data-analysis/bigboat_status.json

bigboat_status

bigboat_status

type

object

properties

  • fields

type

object

patternProperties

  • ^.+$

type

object

properties

  • titles

Localization titles for a subgraph of a BigBoat performance status field.

BigBoat status field locales

  • descriptions

Localization texts for a subgraph of a BigBoat performance status field.

BigBoat status field locales

  • unit

Units used by the BigBoat performance status field values.

type

string

enum

bytes, seconds

  • match

Regular expressions that match portions of BigBoat status field names and their normalized field names.

type

object

patternProperties

  • ^.+$

Normalized field name for the fields that match the regular expression.

type

string

pattern

^.+$

BigBoat status field locales

type

object

patternProperties

  • ^[a-zA-Z]{2,3}(-.*)?$

Localization item for a specific language. Valid languages use two-letter ISO 639-1 language codes plus optional BCP 47 subtags, so only a subset of languages is recognized.

type

string

Data analysis configuration

https://gros.liacs.nl/schema/data-analysis/config.json

data-analysis-config

data-analysis-config

anyOf

config-org

type

object

patternProperties

  • ^[_a-z]+$

config-org

config-org

type

object

properties

  • db

type

object

properties

  • host

type

string

format

hostname

  • dbname

type

string

  • user

type

string

  • password

type

string

  • primary_source

type

string

enum

jira, jira_version, jira_component_version, tfs

  • fields

type

object

properties

  • jira_url

type

string

format

uri-reference

  • tfs_url

type

string

format

uri-reference

  • vcs_url

type

string

format

uri-reference

  • jenkins_url

type

string

format

uri-reference

  • quality_time_url

type

string

format

uri-reference

  • metric_history_file

type

string

pattern

^[a-zA-Z0-9_.-]+$

  • metric_options_file

type

string

pattern

^[a-zA-Z0-9_.-]+$

  • prediction_combine

oneOf

prediction_combine

type

boolean

enum

False

  • prediction_data

type

string

format

uri

  • prediction_url

type

string

format

uri-reference

  • sprint_patch

type

string

  • weather

properties

  • url

type

string

format

uri

  • api_key

type

string

  • lat

type

number

minimum

-90.0

  • lon

type

number

minimum

-180.0

  • origin

type

string

format

date

  • exclude_domain

type

array

items

type

string

  • teams

type

array

items

type

object

properties

  • name

type

string

  • display_name

type

string

  • board

board_id

  • recent

type

boolean

  • invisible

type

boolean

  • overlap

type

integer

  • projects

type

array

items

oneOf

project_key

project_board

  • components

components

  • arguments

type

array

items

type

string

board_id

type

integer

minimum

1

project_key

Project key from JIRA or team name from TFS.

type

string

project_board

type

object

properties

  • key

project_key

  • board

oneOf

board_id

type

array

items

board_id

  • replace

type

boolean

  • own_board

type

boolean

  • start_date

type

string

format

date

  • end_date

type

string

format

date

  • exclude

type

string

  • include

type

string

prediction_combine

type

string

enum

mean, median, mode, sum, min, max

components

type

array

items

type

object

properties

  • name

type

string

  • display_name

type

string

  • project

project_key

patternProperties

  • ^(project | metric_history | metric_options | quality_time | quality | sonar | jenkins | jira | vcs | git | gitlab | github | tfs | subversion | prediction)$

component_filter

component_filter

type

object

properties

  • include

oneOf

type

string

type

array

items

type

string

  • exclude

oneOf

type

string

type

array

items

type

string

Data analysis definitions for queries

https://gros.liacs.nl/schema/data-analysis/definitions.json

definitions

definitions

type

object

properties

  • fields

type

object

patternProperties

  • ^[a-z_]+$

type

object

properties

  • description

Description of the definition.

type

string

  • field

field

patternProperties

  • ^(jira | jira_version | jira_component_version | tfs)$

Definition for a specific primary source.

type

object

properties

  • field

field

anyOf

references

anyOf

references

  • conditions

Query filter conditions that may be used in WHERE clauses.

type

object

patternProperties

  • ^[a-z_]+$

type

object

properties

  • description

Description of the condition.

type

string

  • condition

condition

patternProperties

  • ^(jira | jira_version | jira_component_version | tfs)$

Condition for a specific primary source.

type

object

properties

  • condition

condition

anyOf

references

anyOf

references

field

SQL template of the definition, which may contain ${…} for nested expansions.

type

string

condition

SQL template of the condition, which may contain ${…} for nested expansions.

type

string

references

type

object

properties

  • table

Table(s) involved in the query, which would need to be in the FROM clause or JOIN clauses for a successful query.

oneOf

table_name

type

array

items

table_name

  • column

Fields involved in the query, which may be used in, e.g., SELECT or GROUP BY clauses to ensure a successful query.

type

array

items

type

string

pattern

^[a-z_]+$

table_name

type

string

pattern

^[a-z_]+$

Data analysis performance

https://gros.liacs.nl/schema/data-analysis/performance.json

performance

performance

type

object

patternProperties

  • ^[a-z_]+$

type

object

properties

  • old

performance_query

  • new

performance_query

performance_query

Performance metrics for a query.

oneOf

type

array

items

performance_result

performance_result

performance_result

type

object

properties

  • query

Compiled SQL query used during the performance test.

type

string

  • columns

Number of columns in the query result.

type

integer

minimum

0

  • rows

Number of rows in the query result.

type

integer

minimum

0

  • optimize_mean

Average number of microseconds spent in the optimizer pipeline of the database.

type

number

minimum

0.0

  • optimize_std

Standard deviation of microseconds spent in the optimizer pipeline of the database across runs.

type

number

minimum

0.0

  • wall_mean

Average number of seconds between the start and the end of the query, based on wall clock time.

type

number

minimum

0.0

  • wall_std

Standard deviation of seconds between the start and the end of the query, based on wall clock time across runs.

type

number

minimum

0.0

  • run_mean

Average number of microseconds spent on the query before the result could be exported.

type

number

minimum

0.0

  • run_std

Standard deviation of microseconds spent on the query before the result could be exported across runs.

type

number

minimum

0.0

  • ship_mean

Average number of microseconds spent on exporting the result.

type

number

minimum

0.0

  • ship_std

Standard deviation of microseconds spent on exporting the result across runs.

type

number

minimum

0.0

  • load_mean

Average CPU load percentage during query execution.

type

number

maximum

100.0

minimum

0.0

  • load_std

Standard deviation CPU load percentage during query execution across runs.

type

number

maximum

100.0

minimum

0.0

  • io_mean

Average percentage of time waiting for I/O.

type

number

maximum

100.0

minimum

0.0

  • io_std

Standard deviation of percentage of time waiting for I/O across runs.

type

number

maximum

100.0

minimum

0.0

Data analysis query index

https://gros.liacs.nl/schema/data-analysis/queries.json

queries

queries

type

object

properties

  • path

Directory in which the query files are stored.

type

string

  • files

Queries known to this index.

type

array

items

oneOf

Analysis report query

Sprint event query

Feature query

  • categories

Category names that could be used by the queries in order to group them together, with localization for the category.

type

object

patternProperties

  • ^(?!other$)[a-z]+$

type

object

properties

  • icon

Portions of a FontAwesome icon class that indicates the category.

type

array

items

type

string

minItems

2

patternProperties

  • ^[a-zA-Z]{2,3}(-.*)?$

Localization item for the category name in a specific language. Valid languages use two-letter ISO 639-1 language codes plus optional BCP 47 subtags, so only a subset of languages is recognized.

type

string

Analysis report query

Query for an analysis report.

type

object

properties

  • filename

filename

  • table

Name of the report.

type

string

  • fields

patterns_fields

Sprint event query

Query for a sprint event.

type

object

properties

  • filename

filename

  • type

Name of the event.

type

string

  • display

Whether to show the event in a timeline chart by default.

type

boolean

  • split

Whether to show the event in a separate subchart of a timeline chart.

type

boolean

  • descriptions

Descriptions of the event.

Feature locales

Feature query

Query for one or more features.

type

object

properties

  • table

Name that could be used as a table name if the query were to be placed in a subquery, and in general as a normalized identifier for the query.

type

string

pattern

^[a-z_]+$

  • column

Name(s) of the feature(s) that the query provides.

oneOf

column_name

type

array

items

column_name

minItems

1

  • carry

Whether to use an earlier occurring sample’s value(s) for the features(s) when the current sample has no result in the query.

type

boolean

  • default

Default value for the feature(s) of samples that had no result in the query.

type

number

  • patterns

patterns_conditions

  • summarize

Indication of how to summarize values of the feature when the query result provides multiple rows per sample.

oneOf

summarize

type

array

items

summarize

minItems

1

summarize_params

  • combine

Operations to use to combine values of the feature(s), either when multiple projects are combined for a team or when concurrent sprints are combined into one.

oneOf

Feature combining

Operations to use to combine values of the features when combining multiple projects for a team. If summarize is an array, then the number of combine operations must match that length in order to combine for each summarizing operation.

type

array

items

Feature combining

minItems

1

type

object

properties

  • project

Operation to use to combine values of the feature when combining multiple projects for a team.

Feature combining

  • sprint

Operation to use to combine values of the feature when combining multiple concurrent sprints.

Feature combining

  • prediction

Different methods of predicting the feature.

type

array

items

type

object

properties

  • url

URL template to retrieve prediction data values for this feature. The template may contain ${…} for variable expansions.

type

string

format

uri-reference

  • reference

Feature that can be used as a linear regression over sprints to predict the overall change of the feature.

column_name

  • monte_carlo

Monte Carlo parameters

minItems

1

  • values

type

object

properties

  • type

Format type of the feature.

type

string

enum

fraction, duration, icon

  • denominator

Maximum denominator to use when formatting the value of the feature as a common fraction when it has the fraction type.

type

integer

minimum

1

  • intervals

type

array

items

Interval specification

  • icons

Icons to use in place of the value of the feature when it has the icon type.

type

object

patternProperties

  • ^[0-9]+$

Icon value

  • descriptions

Descriptions for the feature(s).

oneOf

Feature locales

Multi-feature locales

  • long_descriptions

Longer descriptions for the feature(s).

oneOf

Feature locales

Multi-feature locales

  • units

Sprintf-compatible format strings that indicate the feature(s) along with longer descriptions of unit(s).

oneOf

Feature locales

Multi-feature locales

  • short_units

Sprintf-compatible format strings that indicate the feature(s) along with unit(s).

oneOf

Feature locales

Multi-feature locales

  • tags

Shorter descriptions for the feature(s) that indicate their presence in or statefulness of the sample.

oneOf

Feature locales

Multi-feature locales

  • predictor

Shorter descriptions for the feature(s) when used as factors of another feature’s prediction, in order to differentiate different prediction strategies.

oneOf

Feature locales

Multi-feature locales

  • measurement

Metadata of the feature(s) regarding the type of units, relations to other features and moments when the query result is (available to be) collected.

oneOf

Feature unit measurment metadata

type

array

items

Feature unit measurment metadata

minItems

1

  • preferred

Whether the feature(s) should be prominently displayed in reports. If this is false, then the features(s) may be hidden behind expandable sections.

type

boolean

  • source

URL template(s) to human-readable websites that should roughly display the same information as the query result. The template may contain ${…} for variable expansions.

oneOf

source_url

source_type_urls

  • cache

Whether the result of the query should be cached in a database table for reuse.

type

boolean

  • category

Category to group the feature(s) in.

type

string

pattern

^[a-z]+$

  • normalize

Feature to use by default as a divisor of the feature, in order to display a normalized value in a report.

column_name

  • groups

type

array

items

Categories to group the feature in when displaying in a card-based report. The group normalize makes the feature available to act as a divisor for other features.

type

string

enum

project, metric_history, metric_options, quality_time, quality, sonar, jenkins, jira, vcs, git, gitlab, github, tfs, subversion, prediction, normalize

minItems

1

oneOf

Definition feature

Expression feature

Query file feature

Metric feature

Definition feature

properties

  • definition

Name of the query definition to use to calculate the feature.

type

string

pattern

^[a-z_]+$

Expression feature

properties

  • expression

R code to generate the feature, using an environment where names of at least certain requested non-expression features are available.

type

string

  • precompute

Whether to perform the feature generation based on the expression during the selection of non-expression features instead of during other expression features, such that it is available in the environment of other expression features and in summarizing operations.

type

boolean

  • window

Parameters for calculating the expression based on samples occurring before the generated sample.

type

object

Query file feature

properties

  • filename

filename

Metric feature

properties

  • metric

Name of the metric(s) to use to calculate the feature.

oneOf

type

string

type

array

items

type

string

minItems

1

  • aggregate

When the metric is measured more than once in the time span that each sample represents, perform an aggregation query to calculate a proper value for each sample. - end: Select the last measured value. - max: Select the highest value. - min: Select the lowest value. - avg: Calculate the average value.

type

string

enum

end, max, min, avg

  • backends

Source types that should not be included when providing a source for the feature.

type

array

filename

Name of the file stored in the path of the query index where the query is stored.

type

string

pattern

^[^/]+$

Feature locales

type

object

patternProperties

  • ^[a-zA-Z]{2,3}(-.*)?$

Localization item for a specific language. Valid languages use two-letter ISO 639-1 language codes plus optional BCP 47 subtags, so only a subset of languages is recognized.

type

string

Multi-feature locales

type

object

patternProperties

  • ^[a-zA-Z]{2,3}(-.*)?$

Localization items for the features in a specific language. Valid languages use two-letter ISO 639-1 language codes plus optional BCP 47 subtags, so only a subset of languages is recognized.

type

array

items

type

string

minItems

1

patterns_fields

type

object

patternProperties

  • ^[a-z_]+$

type

object

patternProperties

  • ^[a-z_]+$

Column name(s) to use in the field in the query for the given table or primary source.

oneOf

column_name

type

array

items

type

string

pattern

^[a-z_]+$

patterns_conditions

type

object

patternProperties

  • ^[a-z_]+$

oneOf

SQL template to use in the field in the query.

type

string

type

object

patternProperties

  • ^[a-z_]+$

SQL template to use in the filed in the query for the given primary source. The template may contain ${…} for nested expansions.

type

string

summarize

Summariziation operation. Can be one of the combine operations or one of:

  • count: The value becomes the number of (non-empty) rows.

  • count_unique: The value becomes the number of unique (plus empty) rows.

  • end: The last value is used.

  • sum_of_na_avg: The sum of the values (or those of reference) is normalized by the ratio between the number of empty values and the number of non-empty values (or those of reference).

  • sum_of_na_diff: The sum of the difference between the new values (or those of reference) and the old values is taken, where the field needs keys that start with old_ and new_ along with the actual value.

oneOf

Feature combining

type

string

enum

count, count_unique, end, sum_of_na_avg, sum_of_na_diff

summarize_params

type

object

properties

  • operation

summarize

  • with_missing

Include missing values in summarizing operation, for example with count.

type

boolean

  • field

Field(s) from the query result to provide to the summarizing operation. Multiple fields can be provided to sum_of_na_diff.

oneOf

column_name

type

array

items

column_name

minItems

1

  • overlap

Feature(s) whose detailed values should be used to detect and remove overlapping values when using sum or sum_of_na_diff operation.

type

array

items

column_name

minItems

1

  • reference

Feature whose field from its details, or the value itself (when expression is true) can be used as additional parameter of the summarizing operation when using sum_of_na_avg (use referenced values for non-empty divisor) or sum_of_na_diff (use as default value for old values).

column_name

  • details

Field(s) from the query result to retain the values for each sample for, which may be used for tracking detailed information on how the feature was calculated.

type

array

items

column_name

minItems

1

  • filter

R code using an environment of the query result fields to filter which rows are retained for the details.

type

string

  • expression

When using reference, whether to use the referenced feature’s value itself instead of the field from its details the summarizing operation.

type

boolean

Feature combining

Combining operation.

  • mean: The average value is taken.

  • median: The middle value (when sorted) is taken.

  • mode: The most common value is taken.

  • sum: All values are added up.

  • min: The lowest value is taken.

  • max: The highest value is taken.

type

string

enum

mean, median, mode, sum, min, max

Monte Carlo parameters

Parameters for a Monte Carlo simulation to predict the feature.

type

object

properties

  • name

Name of the simulation.

type

string

  • factors

type

array

items

type

object

properties

  • column

Feature to use for the base factor.

column_name

  • multiplier

Feature to use as multiplication of the column feature.

column_name

  • scalar

Weight to apply to this factor.

type

number

  • prob

Probability density function.

type

string

  • params

Parameters for the probability density function.

type

array

items

type

number / string

  • sample

Whether to use the probability function to select new random data. When this is missing or false, the actual data from the column feature is selected instead.

type

boolean

Interval specification

Interval specification for the feature when it has the duration type. When formatting the value of the feature, each interval specifies the divisor value to apply until the value is small enough.

type

object

properties

  • unit

Interval unit.

time_unit

  • key

Shorthand key for the interval unit.

type

string

enum

s, m, h, d, w, M, y

  • num

Interval size.

type

integer

minimum

1

Icon value

FontAwesome specification for an icon that represents the value.

type

array

items

type

string

minItems

2

time_unit

type

string

enum

seconds, minutes, hours, days, weeks, months, years

Feature unit measurment metadata

type

object

properties

  • unit

Unit(s) of the feature. Either a singular unit or a fractional unit, where the divisor may be a fraction itself.

oneOf

unit

fractional_unit

  • dividend

Feature that corresponds to the dividend of the feature.

column_name

  • divisor

Feature(s) corresponding to the divisor of the feature. In the case of fractional divisor, the feature of the leading unit may be described, or the related features may be described as far as they can.

oneOf

column_name

fractional_divisor

  • superset

Feature that corresponds to a larger superset of the feature.

column_name

  • moment

Indicator of when the feature is measured compared to the current sample. post indicates that the value is only complete once the time span of the sample has ended.

oneOf

type

string

enum

post

type

array

items

oneOf

type

number

time_unit

maxItems

2

minItems

2

  • pre

Feature that corresponds to the feature’s value at the start of the current sample, which could be used to compare progress.

column_name

  • post

Feature that corresponds to the feature’s value at the end of the current sample, which could be used to compare progress.

column_name

unit

Unit of a feature.

  • change: Number of changes to some property.

  • commit: Number of commits.

  • issue: Number of issues measured by a quality control system.

  • item: Number of stories and other items tracked by an issue tracker.

  • file: Number of files.

  • line: Number of lines of code.

  • bytes: Number of bytes.

  • metric: Number of metrics from a quality control system.

  • person: Number of people involved in a software development project.

  • point: Number of story points awarded to stories.

  • sprint: Number of sprints.

  • time: Number of days.

  • meta: The sample itself.

type

string

enum

change, commit, issue, item, file, line, byte, metric, person, point, sprint, time, meta

fractional_unit

type

array

items

oneOf

unit

fractional_unit

maxItems

2

minItems

2

fractional_divisor

type

array

items

oneOf

column_name

fractional_divisor

maxItems

2

minItems

1

source_url

oneOf

Human-readable website at the source.

type

string

format

uri-reference

Indication that there is no usable human-readable website at the source.

type

null

source_type_urls

type

object

patternProperties

  • ^(project | metric_history | metric_options | quality_time | quality | sonar | jenkins | jira | vcs | git | gitlab | github | tfs | subversion | prediction)$

source_url

column_name

type

string

pattern

^[a-z_]+$

Data analysis source types

https://gros.liacs.nl/schema/data-analysis/source_types.json

source_types

source_types

type

object

patternProperties

  • ^(project | metric_history | metric_options | quality_time | quality | sonar | jenkins | jira | vcs | git | gitlab | github | tfs | subversion | prediction)$

Localization for a data source type.

type

object

properties

  • icon

Portions of a FontAwesome icon class that indicates the source type.

type

array

items

type

string

minItems

2

patternProperties

  • ^[a-zA-Z]{2,3}(-.*)?$

Localization title for a data source type in a specific language. Valid languages use two-letter ISO 639-1 language codes plus optional BCP 47 subtags, so only a subset of languages is recognized.

type

string

Data analysis source update trackers

https://gros.liacs.nl/schema/data-analysis/trackers.json

trackers

trackers

type

object

patternProperties

  • ^(project | metric_history | metric_options | quality_time | quality | sonar | jenkins | jira | vcs | git | gitlab | github | tfs | subversion | prediction)$

Trackers for a data source type.

type

array

items

type

object

properties

  • file

Filename of the tracker as stored in the database.

type

string

pattern

^[a-zA-Z0-9_.-]+$

  • format

Sprintf-compatible format string that indicates the contents of the tracker file, which consists only of the timestamp parseable with the format string. If both format and json are not provided, then the contents are ignored.

type

string

  • json

Indication that the tracker file is stored as a JSON structure, and the means to parse the contents. This is ignored if format is provided.

  • object: The JSON structure is a shallow object with keys that indicate specific components, repositories, etc. and values are timestamps.

type

string

enum

object