Understanding KPIs

Every metric in the Reports dashboard is calculated from real data. This guide explains how each KPI is computed and what it means for your testing strategy.

Time Window Filtering

All metrics support 7, 14, or 30-day time windows for historical analysis. The time window affects data retrieval but maintains the same calculation methods described below.

Team Overview KPIs

Team Health Score

Definition: A composite metric combining multiple signals to give a holistic view of team health.
Calculation: Weighted average of three components:
- 60% Pass Rate: (passed tests / total tests) × 100 across all team projects
- 25% Automation Rate: (enabled tests / total tests) × 100 across all team projects
- 15% Maintenance Health: 100% - (tests needing maintenance / total tests) × 100
Formula: Health Score = (Pass Rate × 0.6) + (Automation Rate × 0.25) + (Maintenance Health × 0.15)
Color Coding: Green (80%+), Amber (60-79%), Red (<60%)
Interpretation: A high health score indicates good pass rates, strong automation coverage, and low maintenance burden across the team.

Cross-Project Status

Pass Rate: (passed tests / total tests) * 100
Fail Rate: (failed tests / total tests) * 100
Skip Rate: (pending tests / total tests) * 100
Data Source: Aggregated test execution results
Interpretation: Shows current execution state across all team projects

Automation Coverage

Scope: Team-wide (aggregated across all projects in the selected team)
Calculation: (enabled tests / total tests) * 100 aggregated across all team projects
Growth Tracking: Shows tests within the selected time window (7/14/30 days)
Interpretation: Higher percentages indicate better automation adoption and reduced manual testing overhead

Quality Metrics

Success Rate: (passed tests / total tests) * 100
Tests Needing Maintenance: Count of disabled tests or tests with execution time > 1 minute
Interpretation: Tracks test reliability and identifies maintenance bottlenecks

Project Overview KPIs

Scope note

All project-level KPIs below are scoped to the selected project, and when you apply tags in the Reports header they are further scoped to tests that have ANY of the selected tags. If no tags are selected, these metrics fall back to true project-wide values.

Health Score

Definition: A composite metric combining multiple signals to give a holistic view of project health.
Calculation: Weighted average of three components:
- 60% Pass Rate: (passed tests / total tests) × 100
- 25% Automation Rate: (enabled tests / total tests) × 100
- 15% Data Quality: (tests with duration / total tests) × 100
Formula: Health Score = (Pass Rate × 0.6) + (Automation Rate × 0.25) + (Data Quality × 0.15)
Color Coding: Green (80%+), Amber (60-79%), Red (<60%)
Interpretation: A high health score indicates good pass rates, strong automation coverage, and complete test data. Low scores suggest issues in one or more of these areas.

Test Count

Data Source: Total count of tests in the project
Scope: Respects the selected tags (only tagged tests are counted). With no tags selected, this is the true project-wide count.
Growth Tracking: Real count of tests created in the selected time window
Interpretation: Shows project (or tagged segment) scale and recent activity

Automation Rate

Calculation: (enabled tests / total tests) * 100 within the current scope
Scope: Respects selected tags. With tags applied, both the numerator and denominator are computed only over tagged tests. With no tags, this is the project-wide automation rate.
Interpretation: Measures what percentage of tests in the current slice (project or tagged subset) are enabled for execution and automation adoption

Execution Status

Components: Passed, Failed, Pending, Running, Regenerating
Calculation: Real-time counts from project status with percentage breakdowns
Interpretation: Shows current test execution state and identifies bottlenecks

Performance Metrics

Total Duration: Sum of all test execution times
Average Duration: Total duration / number of tests with recorded times
Fastest Duration: Minimum execution time among tests with recorded times
Slowest Duration: Maximum execution time among tests with recorded times
Data Source: Tests with recorded execution duration
Interpretation: Identifies performance bottlenecks and optimization opportunities. Use "Fastest" to benchmark best-case latency and "Slowest" to highlight worst-case outliers.

Tests Trend

Definition: Daily count of updated or newly created tests over the selected time window
Scope: Project-scoped, respects the selected time window (7/14/30 days)
Learn more: See Tests Trend for details and interpretation

Quality Insights

Enabled Tests: Count of tests with enabled=true
Disabled Tests: total tests - enabled tests
Maintenance Needed: Tests disabled OR execution time > 60 seconds
Interpretation: Helps prioritize test maintenance and optimization efforts

Latest Run Insights

Scope: Project-level and tag-aware. Uses the latest run per test within the selected project and tag filters.
Data Source: Latest test run per test
Median Duration (p50): 50th percentile of execution durations for tests with recorded times
P95 Duration: 95th percentile of execution durations (high-tail latency)
Median/Mean %: Median duration divided by average duration, as a percentage
Hanging Runs: Count of tests whose latest status is running or regenerating and whose startTime is older than 10 minutes
Interpretation: Highlights typical vs tail performance and detects potentially stuck executions

Freshness & Quality

Scope: Project-level and tag-aware. Uses the latest run per tagged test (or all tests when no tags are selected).
Data Source: Latest test run per test
Stale Tests %: Percentage of tests with no endTime or whose endTime is older than 14 days
Latest Error Rate: Percentage of tests whose latest run has an error or a non-zero totalFailCount
Avg Failed Steps: Average stepFailCount per test (shown alongside average totalFailCount)
Median Recency: Median time since the latest run completion (or last update if missing), shown as a duration
Interpretation: Measures data freshness and failure signals to guide quality triage

Top Lists

Top 5 Slowest Tests

Definition: Tests with the longest execution durations over the selected time window
Data Source: Project tests with recorded duration, sorted by execution time
Time Window: Respects 7/14/30-day selection
Usage: Identify candidates for optimization or parallelization

Top 5 Flakiest Tests

Definition: Tests with the highest flakiness rate among those that have both passed and failed in the selected time window.
Calculation:
- For each test in the selected project (and matching any selected tags):
  - Let totalRuns be the number of runs in the time window.
  - Let failedRuns be the number of runs with status = 'failed'.
  - Let passedRuns be the number of runs with status = 'passed'.
  - Mark the test as flaky if failedRuns > 0 and passedRuns > 0.
  - Define Flakiness(test) = failedRuns / totalRuns.
- Sort flaky tests by Flakiness(test) in descending order, with tiebreakers on failedRuns and regression failure steps. Take the top 5.
Time Window: Respects 7/14/30-day selection.
Scope: Respects selected tags. With tags applied, only tagged tests and their runs are considered.
Usage: Highlights tests that are truly unstable (sometimes pass, sometimes fail) rather than those that simply fail often.

Top 5 Most Failing Tests

Definition: Tests with the highest number of failed runs over the selected time window.
Calculation:
- For each test in the selected project (and matching any selected tags):
  - Consider only runs where status = 'failed' within the time window.
  - Define Failures(test) = count_failed_runs_in_window(test).
- Sort tests by Failures(test) in descending order. Take the top 5.
Time Window: Respects 7/14/30-day selection.
Scope: Respects selected tags. With tags applied, only tagged tests and their runs are counted.
Usage: Focus fixes on tests that fail most often in practice, not just those with a single bad run.

Reliability Metrics

Stability & Maintenance

Definition: Stability indicators based on test failures and disabled tests.
Metrics:
- Tests with Failed Steps: Count and percentage of tests whose latest run has stepFailCount > 0 or totalFailCount > 0.
- Avg Failed Steps: Average number of failed steps per test (among tests with failures).
- High Failure Tests: Tests with totalFailCount > 3 in their latest run.
- Disabled Tests: Count and percentage of tests with enabled = false.
- Active Tests: Count of enabled tests.
- Stability Score: 100% - (tests with failed steps %). Higher is better.
Interpretation: High failure rates indicate flaky tests or unstable environments. High disabled counts suggest maintenance debt.

Data Quality

Definition: Measures completeness and reliability of recorded test data.
Metrics:
- Missing Duration: Count of tests without recorded execution duration.
- Missing Timestamps: Count of tests without createdAt or updatedAt.
- Quality Score: Percentage of tests with complete data (no missing duration or timestamps). Calculated as (tests without issues / total tests) × 100.
Note: Tests missing both duration and timestamps are counted once (not double-counted).
Interpretation: Low quality scores can skew other KPIs. Improve data collection and reporting to get accurate metrics.

Environment Breakdown

Definition: Breakdown of test execution environments extracted from test logs.
Metrics:
- Browser Distribution: Percentage of tests run on each browser type (Chrome, Firefox, Safari, etc.).
- Runner Distribution: Percentage of tests run on each runner/executor type.
Interpretation: Helps identify environment-specific issues and optimize test distribution.

Test Insights KPIs

Scope note

In the Test Insights and project-level sections, KPIs now respect the selected tags. When tags are applied, all metrics are computed over the tagged subset of tests. When no tags are selected, they use project-wide data. Team Overview remains strictly team-wide and is never tag-scoped.

Automation Coverage

Enabled Tests: Count of enabled tests within the current scope (tagged subset or entire project).
Disabled Tests: Count of disabled tests within the same scope.
Coverage Rate: (enabled tests / total tests) * 100 within the scope.
Fail Rate: Failure percentage computed from tests in the current scope.
Interpretation: Automation status for the currently focused slice of tests. Use tags (e.g. @critical, @smoke) to understand coverage for key segments.

Test Health

Flakiness Rate: Percentage of tests in the current scope that have both passed and failed at least once in the selected time window (using the same definition as Top Flakiest Tests).
Quality Score: Excellent (80%+), Good (60%+), Poor (<60%) based on pass rate within the scope.
Active Tests: Count of currently running tests (or "Idle" if none) within the project.
Interpretation: Reliability and health metrics for the current slice of tests.

Test Execution

Stability: Excellent (95%+ pass rate), Good (85%+), Fair (70%+), Poor (<70%) based on pass rate within the current scope.
Running: Count of tests currently executing in the selected project.
Regenerating: Count of tests being regenerated in the selected project.
Interpretation: Real-time execution status for the focused set of tests.

Data Freshness

All KPIs are calculated in real-time from your current test data. Metrics update automatically when you change team/project selections or tags.

Team Overview metrics are always team-wide and ignore tags for consistency.
Project-level metrics (Overview, Trends, Insights, Reliability, Top Lists) are tag-aware: when tags are selected, all calculations are performed over tagged tests only.

No cached or stale data is used in any calculations.

Data Limits & Performance

To ensure fast dashboard performance, the Reports page uses the following optimizations:

Server-side aggregations: Duration percentiles, averages, and other statistics are computed on the server to minimize data transfer.
Data limit: A maximum of 10,000 tests or test runs are processed per query. For projects exceeding this limit, a truncation indicator will be shown.
Parallel queries: Team-wide metrics fetch data from all projects in parallel for faster load times.

If your project has more than 10,000 tests, consider using tag filters to focus on specific subsets for accurate metrics.

Color Coding System

The Reports dashboard uses consistent color coding across all metrics:

Green: Excellent performance (80%+)
Amber: Good performance (60%+)
Red: Needs attention (<60%)
Blue: Neutral/informational metrics
Purple: Active/processing states

This visual system helps you quickly identify areas that need attention and celebrate successes.