JEOS RP ISSN02 | Seite 63

J. Eur. Opt. Society-Rapid Publ. 21, 36( 2025)

A previous study, available in the Supplementary Information section, was also conducted to determine how observers assessed colorfulness and contrast in addition to ranking images based on their preference. This was done with the images( a)–( h) only without the reference image being given. Regarding the colorfulness criterion, previous observations indicated that the observers’ ability to assess colorfulness was influenced by the inherent colorfulness of the original image. While highly colorful images facilitated clearer differentiation between gamuts, less colorful images led to more ambiguous rankings. On the contrary, contrast assessment appeared to be more consistent across all image types, suggesting that contrast variations were more perceptible to observers regardless of the image’ s initial characteristics.

These findings motivated the inclusion of a colorfulness metric in the present model to ensure that the relationship between perceived image quality and gamut characteristics could be more effectively captured. These findings also highlighted the need to include a metric for categorizing large color hue shifts, as this can lead to discrepancies between objective colorfulness and subjective preference. Indeed, while certain gamuts may exhibit higher colorfulness and contrast, excessive color shifts could reduce their perceived quality, meaning that colorfulness and contrast metrics alone would be far from being sufficient.

However, while some images may appear to favor one gamut over the other, this preference is not consistently evident when all images are considered together.

4 Discussion

In this section, twenty objective metrics, listed in Table 2, were selected to evaluate the quality of each simulated image. The value of each metric was calculated for the 144 simulated images corresponding to the 24 original images( Fig. 3) simulated using the 6 gamuts of Figure 2. Linear regression analysis, performed using RStudio, was employed to estimate the coefficients linking the objective metrics to the observer-based rankings.

Among the selected objective metrics, some were gamut dependent only, while others depended on both the original image and the gamut. All of them were normalized between 0 and 1 in order to ensure the comparability of the coefficients within a linear model. The normalization was done using the formula:

M normalized ¼

M � MinPossible MaxPossible � MinPossible: ð2Þ

For instance, the convex hull area metric was normalized by setting the minimum possible value to 0 and the maximum possible value to the area of the convex hull of the gamut in the a * b * plane, ensuring that the normalization process is independent of the image or gamut set used.

Before performing an ANOVA analysis of the metrics used, the dataset was first categorized based on the Hasler colorfulness of the original images. This allows to separate the data into two groups, high and low colorfulness, to

Table 3. Optimal split for threshold tuning.

Threshold	R 2
0.1	0.945
0.15	0.959
0.18	0.966
0.19	0.973
0.2	0.973
0.21	0.972
0.22	0.970
0.25	0.963
0.3	0.951
0.35	0.948
0.4	0.947
Random group 1	0.944
Random group 2	0.951
Random group 3	0.948
Random group 4	0.950
Random group 5	0.947
Mean of random group	0.948

Note: The bold font signifies the threshold value which yielded the best R 2 value for the final model.

ensure that the differences in image color characteristics were accounted for. Then, different linear models were applied to each group, rather than using a single model for the entire data set, to better capture potential variations in behavior across different levels of colorfulness.

In order to find the optimal colorfulness threshold, several values were tested to find out how to split the data set. This threshold was also compared to some arbitrary random groups. To compare these splits, the R 2 value was computed on the expected rank vs. actual rank of the final models, then the most satisfying R 2 values were selected.

The final R 2 values for the different splitting cases are shown in Table 3. The threshold value of 0.2 was chosen because it yielded the best R 2 value of 0.973 for the final model, which is also higher than those obtained by random splitting. Hereafter, the“ low colorfulness group” and“ high colorfulness group” refer to the sets of rankings and corresponding metrics derived from images with a Hasler colorfulness below 0.2 and above 0.2, respectively.

The observer rankings were approximated by linearly correlating the objective metrics with the rankings. The Variance Inflation Factor( VIF) [ 40 ] was utilized to deal with multicollinearity, where metrics are highly correlated with each other. Metrics with high VIF values, indicating a strong correlation with other metrics, were considered redundant and were excluded from the model. In this case, M14 and M15 metrics were highly correlated.

After removing redundant metrics, the linear regression was recalculated to derive the final coefficients. It was computed for both the low and high colorfulness groups. To assess the importance of each metric in predicting rankings, a one-way ANOVA was performed( Table 4).