By Emma Sage, SCAA Coffee Science Manger
With contributions by Molly Spencer, PhD Candidate, UC Davis
The scientific story behind the new SCAA Coffee Taster’s Flavor Wheel is quite fascinating. This tale includes a star cast of characters, including World Coffee Research (WCR), Kansas State University (KSU), Texas A&M University (TX A&M), the University of California, Davis (UC Davis), and key SCAA members and volunteers. It highlights research SCAA identified a need for, pioneered, and interpreted, resulting in the highly anticipated revision of the flavor wheel we’ve all known and loved for over two decades. SCAA is committed to its members—and to the advancement of the entire industry—to engage in an increasingly data-driven and scientific approach, which led us to look for a solution that would stand the test of time and be based on solid scientific research. We are very pleased with the outcome and hope that you will be equally as enthusiastic about this valuable new resource.
What is a flavor wheel?
Flavor wheels are made up of words arranged in a circular form. Why don’t we use a flavor tree or a flavor pyramid or a flavor choose-your-own-adventure? This seems to have everything to do with the first flavor wheel, developed in the late 1970’s for beer by a chemist, Dr. Morten C. Meilgaard (Meilgaard and others 1979). This was followed in the mid 1980’s for wine by Ann C. Noble at the UC Davis (Noble and others 1987; Noble and others 1984). From there, many other industries followed suit, including our own. The words used in wheels serve to create a vocabulary around a product or category of products, standardize training, and facilitate general discussion around flavor and perception. Wheels provide clear and common communication about products between tasters, facilities, exporters and importers, and consumers alike, just to name a few.
To create a flavor wheel, you need words and you need to arrange them. As it turns out, there are many ways to do this. Many existing wheels do not include scientific research as a part of their development. Some are based on a consensus by industry or a trade group, and others are created by individuals. The original SCAA wheel was created using something of those methods. It would have been very easy for SCAA to revise the wheel some years ago with a special ad-hoc committee. After all, this is in line with the origins of our wheel and is how a lot of others continue to be created. Based on my research on this topic over the past couple of years, it seems that the coffee industry was rather unique in its wholehearted and universal adoption of one wheel, which has allowed us to do a lot of meaningful work to enhance coffee quality. Perhaps this is because the original SCAA wheel, by Ted Lingle, came in 1995—relatively early in the realm of specialty food and beverage wheels. An immense amount of credit is due to Ted for that. In my research of this topic, I have found very few other wheels rooted in scientific inquiry. A few notable wheels also stemmed from sensory lexicons (Lawless and others 2012; Noble and others 1987; Koch and others 2012; Suffet and others 1999; Gawel and others 2000). However, no other wheel has used the approach that we will detail below, and thus we have engaged in groundbreaking research.
For the words that make up the wheel you likely know by now that SCAA adopted the World Coffee Research (WCR) Sensory Lexicon. It is a groundbreaking piece of research led by sensory scientists and their trained panels at KSU and TX A&M. You can all read about the work, sensory descriptive analysis, trained panels, and view the published lexicon to understand more about it. I personally encourage you to read a few informative articles published by WCR, what the WCR Sensory Lexicon is, what it isn’t, and finally, how it can be used to advance coffee research. On a side note, if you like what WCR has accomplished here, I encourage you all to participate in funding future research. As coffee roasters, there is a fantastically simple way you can make a difference, and that is through the check-off program.
From this, you know that sensory descriptive analysis is a powerful tool that provides word descriptions of products and a quantitative basis for comparing product sensorial similarities and differences (Meilgaard and others 2007; Stone and others 2012). Today, it is one of the most powerful, quantitative, sophisticated and extensively used methods in sensory science. Describing the sensory characteristics of a product enables informed business decisions, guides product development, allows benchmarking, quality control, and the tracking of product changes over time. It is also valuable in terms of academic research, where it enables the establishment of correlations with analytical measurements, thus allowing a better understanding of the mechanisms underlying flavor. Creating a lexicon is the first step in this process (Lawless and Civille 2013). Using this method for research will enable us to relate specific variables to specific changes in flavor (i.e. establish causality). For me, that means we will finally be able to begin addressing the some of the central dogmas of the coffee industry, relating to why coffee tastes the way it does. That is exciting progress.
After understanding the power and potential for the WCR Sensory Lexicon, we knew we wanted to adopt and promote it as an association. Aspects of the lexicon can and should be immediately embraced by industry, including vocabulary, definitions, and basic use of the standard references to calibrate coffee tasters. Therefore, SCAA saw a great opportunity to revise the Coffee Taster’s Flavor Wheel. The strength of WCR’s research made it clear that it was critical to adapt the SCAA flavor wheel to be compatible with the lexicon and bring a new tool to the coffee industry. As the WCR Sensory Lexicon was being finalized, we had the words describing the flavor attributes. What we didn’t have was the information on how to arrange the words around the wheel. Even within the development of a lexicon, the listing of attributes into categories is commonly based on panel opinion and, according to a review of this process, is not a rigorous nor important part of lexicon development (Lawless and Civille 2013). This information left us with lots of questions. After all, we didn’t want to take the most scientifically-based coffee flavor resource and misuse it or simply decide behind closed doors how the words should be placed. We wanted to find a way to treat the arrangement of the lexicon with the same respect, diligence, and science that went into creating the lexicon. Thus we set out on a quest to discover the frontiers of sensory science. That’s right – to boldly go where no one has gone before.
In fact, sensory science is very much a young and rapidly developing discipline. From my own study while completing the UC Davis Applied Sensory and Consumer Science Certificate Program I have learned that this field that seems to be moving just as fast as the tech world, and that is no coincidence. The number of data collection tools, software, and analysis techniques available to scientists has grown exponentially as technologies, statistics, and information management are used in more effective ways. SCAA needed a research partner who was willing to think creatively and do solid research to help us understand how the lexicon should be arranged in terms of tiers/levels as well as placement. Given the world-renowned reputation of the sensory science research conducted at UC Davis, and our ongoing relationship with the coffee initiative on campus, we reached out to the food science and technology department to see if any sensory science laboratories could work on this project. Dr. Jean-Xavier Guinard and his PhD candidate, Molly Spencer, took on the challenge.
The UC Davis lexicon sorting study
After some discussions with Molly and Dr. Jean-Xavier Guinard, it was clear that no pre-existing “cookie cutter” sensory science method would be immediately appropriate or feasible for what we wanted to accomplish. So, Dr. Guinard and Molly went to work, examining other published studies, doing background research, and brainstorming solutions to our very specific question. They determined that a modified free multiple sorting method could be used to understand the associations and relationships between the lexicon flavor attributes. In this way, we could understand how industry viewed these terms as well as how highly trained sensory descriptive panelists would group these lexicon attributes. To quantify hierarchical grouping, the researchers at UC Davis created an online program for participants to use remotely that would map out all of the instances where lexicon attributes were associated. In the end, we would understand what the main flavor categories (or, the tiers/levels of the wheel) should be, as well as which flavor categories showed relationships which indicate they could be positioned next to each other around a wheel.
The sorting exercise
Sorting is a method of classification. After all, putting groups of things into categories is one of the most common operations in human thinking (Coxon 1999). The Free Sorting procedure was originally created as a word-sorting too (Steinberg 1967), but was later adopted for sensorial analysis (Lawless and others 1995). For our current project, although we based the variables to be sorted (the attributes) on sensory evaluation (tasting) of coffee, the sorting exercise was one of vocabulary only and did not include tasting. In this way, our sorting exercise was one based on the experience of the participants.
In the Free Multiple Sorting (FMS) method, assessors are traditionally asked to sort food or other product samples into multiple groups in a way that made sense to them as individuals. They sort the samples into a subject-chosen number of groups/categories (Coxon 1999; Dehlholm and others 2012). When performing the task, the assessor is allowed to make additional sortings of the same sample set until they feel they have covered all sorting possibilities (Steinberg 1967). For our study, we modified the method so that instead of sorting food samples themselves, panelists were asked to sort the attributes (with definitions) into categories and sub-categories, and the sorting task was done only once per panelist. Additionally, the panelists were to sort the attributes into categories and sub-categories in a hierarchical manner until they felt there were no more sub-categories to be sorted. A user-friendly web interface (see Figure 1) was created to allow for simple, efficient sorting of the 99 attributes. The user would then see instructions and the list of attributes, each with an information bubble to the far right with a scroll-over pop-up including the definition/description of that attribute as defined by KSU and TX A&M. If a panelist was unclear about the meaning of one of the words of the lexicon, they could scroll over the information bubble to access the definition. The participant was able to drag and drop the attributes into categories and sub categories, for as many hierarchical levels as they deemed necessary.
Due to the high number of attributes to be sorted, the high number of expert panelists compared to most sensory descriptive methods (72 panelists), and the end goal of this experiment (a flavor wheel), it was determined that each panelist only complete the sorting task once. This method was modified from the typical FMS method in which a small number of descriptive panelists (8 – 15) may each perform the same sorting task on the same sample set multiple times, until they feel they have exhausted the sorting possibilities. Multiple sortings would have caused fatigue for the panelists with 99 items to be sorted, and there was both sensory expertise and coffee industry expertise to be considered, so it was in the best interest of the new flavor wheel to accommodate a large panel of participants to include input, experience, and expertise from both fields. After the sorting was completed, we focused on three statistical analyses to help us understand the results.
We had two study groups, which ultimately were able to be grouped into one population post hoc based on the results. The first group consisted of 29 trained, experienced, sensory panelists who worked on chocolate and wine panels at UC Davis. These panelists were not required to be trained specifically on coffee, but they had all participated in sensory studies and worked with and been exposed to most of the flavor attributes on the coffee list. They were sent written instructions to perform the free multiple sorting task on the web remotely and individually, from their personal computers.
To make sure the results would accurately reflect the needs of the industry, we knew we had to invite coffee people to contribute to this research and create the data. The SCAA invited hundreds of our nearest and dearest coffee professionals to participate in this work, including our boards, committees and councils, subject matter experts and SCAA instructors, WCR affiliates and stakeholders, Q instructors, colleagues at CQI and ACE, and other industry leaders. In the end, 43 judges recruited by SCAA from the coffee industry performed the same online procedure as the UC Davis panelists.
The point of this work was to understand how closely related the flavor attributes were, as judged by a group of expert sensory panelists and coffee industry alike. To organize the raw data, a program was written using Ruby programming language to translate the sorting data into matrices that could be used for analysis. For both of the methods detailed below, first two binary matrices were created for each participant (1 if the relationship existed, 0 if the relationship did not exist), one matrix for “sibling-sibling” relationships, in which the attributes appeared in the same sub-category, and one matrix for “parent-child” relationships, in which one attribute appeared in a sub-category under another attribute. These type of relationships were therefore our similarity criteria. See Table 1 for an example of this. From all of the individual sorting data collected, a symmetrical proximity (similarity) matrix with sums of counts of how many times the attributes appear together in “sibling” relationships or “parent-child” relationships was compiled for the 72 participants. This similarity matrix was then used to complete the following analyses.
Table 1. Example excerpt of a similarity matrix for one study participant, highlighting 7 attributes
First, we needed to compare the two study groups (panelists versus industry). We wanted to know if the trained panelists arranged the lexicon differently than our industry group. We knew that either result would be interesting in its own way, but had hypothesized that the groups would have different results based on their different backgrounds and training. For this, two similarity matrices, one for UCD panelists and one for Industry participants, were used to run two separate 5-Dimensional Multidimensional Scaling (5D-MDS) analyses. The results of the 5D-MDS analyses were used to run a Multiple Factor Analysis (MFA), a technique to compare two datasets. These analyses were competed in XLSTAT® 2015. The MFA was performed and showed that there was no significant difference between the UC Davis panelists and the Industry group. We knew this because the RV-coefficients were greater than 0.70, meaning the two groups were related and came from the same population. With this in mind, we could move forward with the evaluation of the relationship between the flavor attributes based on one population of 72 panelists and industry participants.
Agglomerative Hierarchical Cluster (AHC) analysis was performed on the data using the similarity matrix. AHC groups the attributes into different categories and sub-categories on different levels based on similarity criteria. AHC clustering is typically visualized in the form of a dendrogram. The classic example of this is species taxonomy or genetic linkages, which we are used to seeing in dendrograms. Agglomerative hierarchical clustering starts with every single object in a single “cluster.” The unweighted pair group average linkage agglomeration method was performed to link the attributes back together, one pair at a time, from the bottom (most similar) to the top (least similar). In each successive iteration (linkage), it agglomerates (merges) the closest pair of objects (either an individual or the average of a group) by satisfying the specified similarity criteria, until all of the data is a part of one large category. On a dendrogram, each of these linkages is represented by a horizontal line. The y-axis of the dendrogram represents the specific similarity (between 0 and 72) of the clusters that were merged. The number of main classes can be specified by the user or determined by the software. In this case, the number of main classes was originally indicated at four by the XLSTAT® 2015 software, but this was not adequate for the flavor wheel and did not separate the great number of attributes distinctly enough. After observing the data, nine classes were specified, as this was determined to be the optimal number of main classes that statistically separated the 99 attributes clearly while still maintaining intuitiveness.
The most common approach for analyzing data from sorting tasks is Multidimensional Scaling (MDS) (Lawless and others 1995). MDS was carried out on the similarity matrix in order to get a 2 dimensional representation of the relationship between the flavor attributes. This MDS analysis was also performed using all 72 participants’ data (from the full similarity matrix) to create a visual aid to see where the attributes fall in proximity to one another. Specifically, non-metric (ordinal) MDS was performed, meaning the order of the “distances” (using Kruskal’s stress values) calculated for the resemblance matrix matched the order/ranking of the distances in the representation space (the plot). This was done to supplement the AHC data and to guide the order of the main classes (clusters) around a circular form of the flavor wheel. All analysis were completed in XLSTAT® 2015.
For all participants together, AHC was truncated at nine main classes (see Figure 2). The MDS plot for the compiled data of all 72 participants is depicted in Figure 3. This led to the main suggested flavor categories and hierarchy for the flavor wheel. However, what a dendrogram does not do is name the horizontal linkages (or larger groups), and thus, certain “umbrella” terms for the 9 main classes were needed.
Figure 2. Dendrogram representing the results of the AHC analysis on the attribute sorting results.
Figure 3. The 2-dimensional 5D-MDS plot.
Designing the wheel
The development of the wheel also necessitated a qualitative approach. The fact is any statistical result must be interpreted. That is how we went from a dendrogram and an MDS plot to a wheel. As you can see, there is no completely accurate way to accomplish that. Interpretation is a human exercise. Luckily, we had a lot of smart humans at our disposal, including those at UC Davis, WCR, SCAA, and sensory scientists at KSU. After our results were finished, we spent many hours examining the possible iterations of a wheel. We waxed poetic on the merits of a four-versus-three-tiered wheel. We considered the dendrogram and the inherent challenges of grouping relationships into pairs. We had to work together to agree with WCR and KSU on the overall (inner ring) flavor categories. And that is just what we did.
To create a template that would fit around the wheel, we used the dendrogram to create a chart for each of the main flavor categories (see Figures 4, 5, 6, and 7 below for examples). Then, we went back to the MDS and compared all results, grouped and individually, to evaluate each of the 9 flavor classes in terms of where they were in relation to one-another, circularly. This helped us inform which segments of the wheel would be contiguous. Therefore, the wheel was designed not only to show relationships between and among the 9 main flavor categories, but also to show relationships amongst the individual attributes, down to the order and placement of the third ring on the wheel. Categories that are near each other on the MDS plot were perceived to be similar (based on the 72 participants) and therefore are generally located near each other on the new wheel. For example, within fruity, citrus fruits were often seen to be closely associated with the sour attributes on the MDS plot and therefore those sections of the wheel are contiguous.
General words that encompass the broader categories at the top of the dendrogram (for example, “sweet” or “fruity”) were pulled from the lexicon based on the recommendation of the KSU scientists and panelists. In some instances, it was necessary to put two lexicon terms together as a category heading, in which cases a slash (/) was used to differentiate the two terms. Due to fact that the WCR sensory lexicon and this project were being completed simultaneously, a few terms were moved, re named, or added to the lexicon and therefore the flavor wheel to create the final organization. See Figure 8 (below) for the final iteration of the new Coffee Taster’s Flavor Wheel.
Figure 4. Example of interpretation of the dendrogram in the Nutty/Cocoa flavor category, with colored circles indicating the matching clusters with levels on the charted hierarchy.
Figure 5. Example of interpretation of the dendrogram in the Floral flavor category, with colored circles indicating the matching clusters with levels on the charted hierarchy.
Figure 6. Example of interpretation of the dendrogram in the Fruity flavor category, with colored circles indicating the matching clusters with levels on the charted hierarchy.
Figure 7. Example of interpretation of the dendrogram in the Sweet flavor category, with colored circles indicating the matching clusters with levels on the charted hierarchy.
The ultimate goal of this project was to sort the given coffee flavor attributes in such a way that simplified the choice of words describing the coffee, whether it is in general or more detailed terms. The categories and sub-categories developed using these sorting methods were used to create a new coffee flavor wheel. AHC analysis provided a hierarchy and MDS provided a visual representation of how the main flavor categories should be arranged around the flavor wheel. You may notice, if you are a student of flavor wheels, that there are a lot of commonalities between the WCR Sensory Lexicon & the new flavor wheel with other wheels, describing different products. The original wine wheel, mentioned previously, has many of the same main flavor categories as our new wheel, including spice, fruity, floral, vegetative, and nutty (Noble and others 1984). This is not because wine and coffee share a multitude of similarities that should be dwelled upon. Rather, it is because what we are actually recording and quantifying is the human sensorial experience. Humans, as instruments, find many similarities in the perception of foods and beverages – we are just capturing the experience.
In summary, we have created a revision of the Coffee Taster’s Flavor Wheel. We crafted a wheel appropriate for coffee cuppers and industry that is also useful for descriptive panelist training and product developers. Perhaps most importantly, it is a solid tool for communication with customers and consumers. It represents a true collaboration between descriptive panelists, sensory scientists, industry, WCR, SCAA, and UC Davis. It is the product of creative and collaborative approach to problem solving on the frontiers of sensory science methods and analyses. We can all be proud to have contributed to this work – because every single person who has gotten to the ending paragraphs of this article has absolutely been a part of the process. We have created this tool for you. You have inspired it. It is yours to use and share and grow with. Thus, we have created a completely revised SCAA Coffee Taster’s Flavor Wheel.
Figure 8. The final version of the new SCAA Coffee Taster’s Flavor Wheel ©SCAA and WCR 2016.
Coxon APM. 1999. Sorting Data. Thousand Oaks, CA: SAGE Publications, Inc.
Dehlholm C, Brockhoff PB, Meinert L, Aaslyng MD, Bredie WLP. 2012. Rapid descriptive sensory methods – Comparison of Free Multiple Sorting, Partial Napping, Napping, Flash Profiling and conventional profiling. Food Quality and Preference 26(2):267-77.
Gawel R, Oberholster A, Francis IL. 2000. A ‘Mouth-feel Wheel’: terminology for communicating the mouth-feel characteristics of red wine. Australian Journal of Grape and Wine Research 6(3):203-7.
Koch IS, Muller M, Joubert E, van der Rijst M, Næs T. 2012. Sensory characterization of rooibos tea and the development of a rooibos sensory wheel and lexicon. Food Research International 46(1):217-28.
Lawless HT, Sheng N, Knoops SSCP. 1995. Multidimensional scaling of sorting data applied to cheese perception. Food Quality and Preference 6(2):91-8.
Lawless LJR, Civille GV. 2013. Developing Lexicons: A Review. Journal of Sensory Studies 28(4):270-81.
Lawless LJR, Hottenstein A, Ellingsworth J. 2012. The McCormick Spice Wheel: A Systematic and Visual Approach to Sensory Lexicon Development. Journal of Sensory Studies 27(1):37-47.
Meilgaard MC, Civille GV, Carr TB. 2007. Sensory Evaluation Techniques, 4th ed. Boca Raton, FL: Taylor & Francis Group.
Meilgaard MC, Dalgliesh CE, Clapperton JF. 1979. Beer Flavor Terminology. Journal of the Institute of Brewing 85(1):38-42.
Noble AC, Arnold RA, Buechsenstein J, Leach EJ, Schmidt JO, Stern PM. 1987. Modification of a Standardized System of Wine Aroma Terminology. Am. J. Enol. Vitic. 38(2):143-6.
Noble AC, Arnold RA, Masuda BM, Pecore SD, Schmidt JO, Stern PM. 1984. Progress Towards a Standardized System of Wine Aroma Terminology. Am. J. Enol. Vitic. 35(2):107-9.
Steinberg DD. 1967. The Word Sort: An instrument for semantic analysis. Psychonomic Science 8(12):541-2.
Stone H, Bleibaum RN, Thomas HA. 2012. Sensory Evaluation Practices, 4th ed. San Diego, CA: Elsevier Academic Press.
Suffet IH, Khiari D, Bruchet A. 1999. The drinking water taste and odor wheel for the millennium: Beyond geosmin and 2-methylisoborneol. Water Science and Technology 40(6):1-13.