 The following is a guest post by David Morrison.  Morrison grew up in Australia. He eventually acquired a PhD degree in plant biology, during which he became interested in wine. He started visiting wineries throughout Australia, just at the time when high-quality wineries were beginning to boom. Somewhere along the line he moved to Sweden, and in his semi-retirement he runs a blog called The Wine Gourd (winegourd.blogspot.com). In the interests of doing something different to every other wine blogger, this blog delves into the world of wine data, instead. He is particularly interested in the relationship of wine quality to price, and quantifying value-for-money wines.

See David’s earlier guest post on this site: A Mathematical Analysis of The Judgment of Paris:

Among many other things, I am interested in how to combine quality judgments from different judges into a single score. This is a topic of some interest in the academic literature, with many different opinions on the matter.

It is also of practical importance, to all activities in which human judgment plays a part. For example, it appears in all competitions where judges award scores for what we might call artistic merit. This includes gymnastics, ice dancing, and ski jumping, as but three examples. It also arises when assigning quality scores to products, such as vacuum cleaners, hammer drills, and wines.

I thought that the Judgment of Paris wine tasting from 1976 (see my earlier post) might be a suitable data set for studying the different methods of combing quality scores. So, I went in search of the original data.

I didn’t find it; and I am here tell you why I never will.

The tasting was organized by Steven Spurrier and Patricia Gallagher, intended to support the Bicentennial of the USA, by introducing the French to Californian wine. As quoted by William Rice (Those winning American wines. The Washington Post, June 13 1976) Spurrier noted that the event was not “a competitive tasting, but an opportunity to acknowledge that a young vineyard area can produce top-quality wines, given the same love, interest, skill, and money that has been lavished on European vineyards for centuries.”

George M. Taber, the only writer present at the tasting, described the production of the data in his 2005 book Judgment of Paris: California vs. France and the Historic 1976 Paris Tasting that Revolutionized Wine (Scribner). Spurrier added up the scores from the individual tasters’ cards by hand immediately after the tasting of the ten white wines, and then again after the tasting of the ten red wines. Spurrier kept the cards and his copy of the totals; and Taber also acquired a copy of the data at the tasting. This implies that at that time there were two independent copies of the original data.

Further copies of the results were sent to various members of the media, so that the data were widely available. For example, Robert Lawrence Balzer wrote about the tasting later in 1976 (Paris tasting results confirmed in California. Balzer’s Private Guide to Food and Wine, Vol. 6 No. 8 pp. 77-84), and referred to “the published and widely distributed individual scorings”. He also noted that he had met Spurrier in Paris, who allowed him to “study the mimeographed four pages of scored results”. Balzer cited some of these scores in his article, but otherwise produced only the average scores for each wine (rounded down to the nearest decimal point). These averages were based on the scores from only the nine French tasters (the other two being Gallagher and Spurrier).

Reporting only the totals was also the approach taken by most of the other media at the time, usually based on only the nine French people. As Taber has noted in his book: “The results sent to the two winning wineries after the event clearly gave the ‘Official Jury Results’ separately from the ‘Results Including Mr Spurrier and Miss Gallagher’. The results Spurrier announced on the day of tasting and the ones he used later in talking to the press were only those of the ‘Official Jury’.”

The first media report was, of course, by Taber himself, a week after the event (Judgment of Paris. Time, June 7 1976, p. 58). No scores were noted in the article, but the four top-ranked wines were named. Incidentally, this brief article apparently coined the name by which the tasting has been known ever since, which is a pun derived from a story in ancient Greek mythology. Unfortunately, this name takes a simple wine tasting and presents it as a panel of judges pronouncing a formal and important decision. In that sense, the name is possibly regrettable.

The next important report was from Frank J. Prial, an influential writer for the large New York wine market, who first discussed the white wines (California labels outdo French in blind test. The New York Times, June 9 1976, p. 27) and then separately the red wines (California reds score high in tasting, but some caveats must be weighed. The New York Times, June 16 1976, p. 39). He listed the totals in each article, based on the nine French tasters for nine of the whites, but all eleven people for the tenth white and all ten of the reds.

Other early reports included that of William Rice (noted above), who actually dismissed the whole thing as “the latest in the continuing, if rather pointless, taste-offs pitting American versus French wines”. He did, however, present the totals based on all eleven people. Also of interest, Mary Blume (Three cheers for red, white and cru. Los Angeles Times June 13 1976, p. 42) clearly fabricated much of her article, but accurately produced a few of the totals based on nine people.

Of most interest for our purposes here, however, is the venerable Connoisseurs’ Guide To California Wine, founded in 1974 by Earl Singer and Charles Olken, and still extant today (unlike most of its contemporaries). In the July 1976 issue (pp. 54-55), it reproduced the entire Judgment of Paris dataset, comprising each score from each taster for each wine. The article is perhaps overly enthusiastic (and spells Spurrier’s first name incorrectly), but the information contained seems otherwise to be accurate. As a historical document this is priceless, because it is the only copy I have been able to find of the results as released at the time. As the Guide noted:

“In this country, a great deal has been written about the tasting. Unfortunately, many of the significant details have fallen by the wayside in our collective rush to self-congratulation.”

If anyone knows of any other contemporary source of the detailed scores, please let me know.

This source of the data is the one that was first used in the academic literature. Dennis V. Lindley wrote a manuscript in 1993, mathematically analyzing the full data set, and this was eventually published in 2006 (Analysis of a wine tasting. Journal of Wine Economics 1: 33-41). Unfortunately, he cited the source of his data as “the Underground Wine Journal, July 1976”, but this newsletter was not founded until 1979 (as The Underground Wineletter). According to Orley C. Ashenfelter, in an email to me, Lindley actually obtained his data from Ashenfelter himself. In 1999, Ashenfelter and Richard Quandt published their own analysis of the red wine scores (Analyzing a wine tasting statistically. Chance 12(3): 16-20), and they cited the source of their data as the Connoisseurs’ Guide.

Three decades after the original tasting, Taber published his own copy of the data, in the 2005 hardcover edition of his book. He presented the scores for each of the nine French tasters only; and these scores are identical to the those published in the Connoisseurs’ Guide. This seems to be an independent confirmation of the data from the Guide, as Taber’s information is based on his original personal notes. (In an email to me: “I was there that day and got all the material from the organizers of the event that day. I got the scorecard numbers from the organizers of the event, and that was the basis of my information in my book.”)

Interestingly, Taber’s data do not appear in the 2006 paperback edition of his book. In 2009, Neal D. Hulkower corresponded with Taber about this, and he noted (The Judgment of Paris according to Borda. Journal of Wine Research 20: 171-182) that:

“There is some uncertainty in the individual points … that were obtained by Taber and published in the hardback version of his book. They do not add up to the total points reported and used to determine the winner. As a result, Taber did not include them in the paper-back version.”

And therein lies the problem, for me as well as for history.

For three of the white wines, the sums of the scores as published in the Connoisseurs’ Guide in 1976, and quite independently by Taber in 2005, do not add up to the totals published by all of the other media in 1976!

This matters, both in theory and in practice. Two of the three wines involved in the discrepancies are the Chateau Montelena (1973), for which we were originally given a total of 132 points (from the nine French tasters), and the Mersault Charmes Roulot (1973), for which we were given a total of 126.5 points. However, when we add up the individual scores presented by the Connoisseurs’ Guide and Taber, both wines get 130.5

points (from the nine French tasters). The latter score hardly indicates the Montelena wine as the clear-cut “winner” among the whites, as history has written it.

Obviously, we need to revisit the original score cards, but this we apparently cannot do. Taber, in his email to me, has noted: “The actual scores cards were lost when Spurrier sold his business several years later, and the new owners seemingly threw them out. In any case, the original score cards have now been lost.” Neal Hulkower received the same information from Taber in 2009; but I have not been able to confirm this with Spurrier himself.

So, there we have it. The only scores we have do not add up to the only totals we have, for three of the white wines.

Either some of the individual published scores are wrong, or the totals originally calculated by Spurrier are wrong. I favor the first explanation, for at least one reason, which I will now explain.

The discrepancy in the scores was first noted by Dennis Lindley, in his 1993 manuscript. After all, he could hardly have performed a proper mathematical analysis and not have noticed this. However, in addition, he noted another odd thing about the data — the white-wine scores for two of the French tasters (Raymond Oliver and Jean-Claude Vrinat) are almost identical (one wine score differs by a half point). This similarity of scores does not happen among any of the other tasters, nor does it happen for these two people for the red wines.

Lindley suggested that, in this case, possibly “the tasters compared notes”. Taber, in his book, certainly refers to a lot of talking during the tasting of the white wines; and Balzer, in his newsletter, quoted Spurrier as saying “I asked them not to talk, but she [Odette Kahn] and others were constantly comparing notes.” However, this is not the same thing as two people writing down the same scores for each wine. Furthermore, as far as I can tell from the few online photographs of the event, Oliver and Vrinat were not seated next to each other.

So, I have a different suggestion — there was a transcription error when the scores were being compiled. That is, when they were copying the numbers, someone swapped from reading one row or column to another, part way through. We have all done this when reading, for example, and we end up reading the same line of text twice. So, I am suggesting that part (or most) of the white-wine data from one of these two people is simply an inadvertent duplicate of the other one. For one of these two people we do not have their original scores — they have been consigned to the proverbial scrap heap of history.

For me, this is the simplest explanation for why some of the individual scores do not add up to the published totals. And it is also why I can never have the original data, because the error lies at the source, in the original data compiled at the event. In my scenario, since both Taber and the Connoisseurs’ Guide independently have the same erroneous data, no-one can now possibly have the correct data, in the absence of the original score cards.

This doesn’t change history, of course, but it might change how we commemorate it.

For example, with regard to Stag’s Leap Wine Cellars, whose wine topped the red-wine rankings, the Wine Enthusiast magazine noted last year (The Judgment of Paris turns 40) that “At the new Fay Outlook & Visitor Center, special tastings in May [2016] will showcase both library wines and barrel samples. Memorabilia include a bottle of the 1973 Cab and copies of the judges’ score sheets.” No, I’m sorry, but that latter sort of memorabilia is sadly not possible, in this case.

For their help with various aspects of my search, I gratefully thank Orley C. Ashenfelter, Darrell Corti, Christine Graham, Bob Henry, Aaron Nix-Gomez, Charlie Olken, Steven Spurrier, George Taber, and Becca Yeamans Irwin.

  1. Brian
    February 9, 2017 at 11:53 am

    In an era of #alternativefacts and fake news, I greatly enjoy when folks do their research and approach topics from an objective lens. Thank you for this interesting read. Now, 40 years and a number of winemaker changes later (plus, too many new wine labels to keep track of), the particular scores from the Judgment of Paris now seem irrelevant. The sentiment remains (some New World wines can hold their own) as does the challenge (finding ways to combine a multitude of scores for different aspects and qualities of a wine into something easy to discuss and share).

