Let’s quickly review the previous two posts on NCCA’s “The Color Project”:
- In Part Three, we showed that about 20% of our observers fell into an “extremes” category (i.e., they were either notably far less critical or far more critical); but the majority—80%—of the observers were more or less in agreement.
- In Part Four, we concluded that most of the people, most of the time, were not fooled by the identical-pair panels. Only 9% of the time were there notable color differences declared, and half of the observers saw no difference at all. If you expect to see a color difference, by golly, you will!
Now we move on to perhaps one of the most important parts of the data analysis, repeatability. Just as in a Gage R&R, repeatability is an important factor in our experiment. Our observers were looking at 54 pairs of panels over a 20- to 30-minute time frame. They did not know that 15 of those panel sets were repeated. Ideally we would have liked to see perfect repeatability (just like with any piece of equipment being evaluated in a Gage R&R), but we all know that this never happens.
Recall that we used the following rating scale throughout the experiment:
5 = No color difference
4 = Extremely slight color difference
3 = Slight color difference
2 = Noticeable color difference
1 = Very noticeable color difference
Here’s our method for performing the repeatability data analysis:
- Capture the data from each observer for each panel pair.
- Compare the first rating against the second rating for the repeated pairs.
- Create a chart to help readers visualize the data.
The chart below presents the raw data from the repeatability experiment. A “0” means no change in the rating from the first time the observation was made to the second time. A negative value means that more color difference was seen the second time around, and a positive value means that less color difference was seen. For example, a rating of “4” the first time (extremely slight color difference), and a “3” the second time (slight color difference), results in a difference value of -1.
In the chart below, I have highlighted those values that are two or more rating points different between observations. In this chart, the top row shows the Panel Pair ID number, and the Observer ID is shown in the leftmost column.
It’s really hard to draw many conclusions from this table, so let’s try a different approach.
Let’s go back to a graph from an earlier post (see Part Three). Below is the graph that measures each observer’s observations against the average of all observers. Recall that there were four observers who were rather generous in their observations (i.e., on average, they rated the 54 panel pairs as being closer in color difference compared to how all others rated them) and two quite critical observers (i.e., they saw color pairs as being further apart than others did).
Now let’s remove the text and red enclosures and add some green and red stars. The green stars indicate those observers who were unusually consistent—more so than the other observers; the red stars indicate those observers who were the most inconsistent. There was no fancy analytical tool used to determine which observers got a green star and which got a red star. Certainly, if an observer had no yellow cells in their row in the raw data chart above, they got a green star because they saw the panel pairs within one rating unit of their first observation (which, by the way, is commendable). If, on the other hand, an observer’s data generated a bunch of yellow cells, they got a red star. For whatever reason, they saw the same panel pairs as having much different degrees of color agreement.
As you look at this data, you might conclude that four of the five most generous observers were also the most consistent. But this observation—though correct—might be a simple matter of their overall ratings being on the high side. In the raw data table above, the highlighted cells represent ratings variability of two or more points. If an observer was generous, their scores are going to be on the high side. For example, they might have rated a panel pair with a “4,” while the average rating was a “3.” The second time they saw the pair, they could not have rated the pair more than one point better (a “4” initially, going up to a “5” on the second round). They could have dropped their rating by 2 or more points, of course, but these “generous” observers—by definition—were not inclined to see big color differences.
A summary of the data says that 30% of the observers were unusually consistent and 20% were unusually inconsistent. This means that 50% of the observers fall into that “normal variation” category. Believe me, when it comes to color, normal variation is, well, normal.
If you would like to get into more of the details, please feel free to contact me at firstname.lastname@example.org. I am more than happy to offer further explanation, and I would be grateful to hear any suggestions you have to offer.
In the next—and final—Color Project update, we will discuss visual color assessment ratings versus actual color difference readings. Man vs. Machine! What could be better?
NCCA Technical Director