Stephen Doty
27 November 2019
We just look for correlations.1
–Jeff Gundlach
I hear it’s tight, most every night,
but now I might be mistaken.2
–ZZ Top
Nullius in verba.3
–The Royal Society
Yes, based on this test, it appears to be a valid indicator. The copper/gold price ratio (“C/G”) was found to correlate positively and strongly with changes in the yield of the ten-year Treasury note (“Note”), with a correlation coefficient r = 0.977 (on a scale from –1 to +1).
Soon after Jeff Gundlach publicized the correlation, testing began on April 19, 2019 for six months, using data from each Friday’s close. The purpose was not to see if it was data-mined,4 but if such publicity would diminish the reliability of the correlation going forward. Notice the linear nature of the scatterplot,5 the trendline, and the formula:
1Quotation from the YouTube video “Jeff Gundlach Highlights” at time 10:07; and at 39:20, Jeff says, “the ratio of the price of copper to the price of gold: directionally, it’s highly predictive of at least the direction of the ten year Treasury.” And J. Mayberry wrote that the copper/gold ratio “can serve as a leading indicator of the direction of the yield on the 10-year U.S. Treasury note… The ratio’s absolute level is irrelevant.” https://doublelinefunds.com/wp-content/uploads/ThePowerofCopper-Gold_Mayberry2019-Fund.pdf.
2 From the song “La Grange.” The same sentiment applies to “tight” correlations, it seems. They change over time and have margins of error.
3 Latin for “On the word of no one.” Motto of The Royal Society in England, founded in 1660, a society dedicated to learning science firsthand by experiment and re-testing, not from dogma. It was an outgrowth of the British empiricist tradition that Francis Bacon pioneered in the early 1600’s. It’s a good motto for investors.
4 A computer could search or mine a database for any two variables that varied in synch by coincidence, regardless of no causal connection between them, nor any predictive validity going forward. So it would be unreliable as a guide for future investing.
5 While looking at any scatterplot, remember that you can’t tell the date of any dot. The most recent one could be on the far right or left. Think of time as the invisible third plane z coming at you. Every Friday, z will signal time to measure coordinates for another dot in two dimensions, x, y. Ask yourself if a move to the right or left along the x axis has resulted in y getting higher, lower, or staying the same. The formula for calculating the correlation r is often forgotten today, given the function “CORREL” on Google Sheets. It’s r = E(ZxZy)/n-1. https://store.fmi.uni-sofia.bg/fmi/statist/education/IntroBook/sbk17m.htm It’s essentially the average of the sum of the products of the z ratios for the x and y data pairs. So the key is not the magnitudes of x and y, but how they each vary in synch about their own means over time.
For easier charting, I used ten pounds of copper and one ounce of gold, but it had no effect on the value of r.
When the x,y values are converted to z ratios (raw score – mean) / standard deviation = z ratio), the correlation r becomes equal to the slope of the trendline.6
Types of correlation
Correlation is dead, some economists suggest, because any two variables may vary similarly by coincidence.7 The price of the Dow stocks once correlated perfectly with the price of butter in Bangladesh, they say. All right, that is one class of cases, but there are others.
Correlation is essentially a tool for learning by induction. Correlation is a necessary condition for inferring a causal relation, but not a sufficient one. In scholarship, it traces back to the early 1600’s, to Francis Bacon’s Table of Comparison. He called it the “rule of differing degrees.”8 Over two centuries later, John Stuart Mill presented four methods for discovering causes via induction. The one derived from Bacon’s differing degrees is taught today in logic textbooks as the Method of Concomitant Variation.9 It’s essentially a test of correlation. It has proven its worth in science. Smoking more cigarettes was initially associated with a greater likelihood of lung cancer, as drinking more alcohol was associated with a worsening liver. After those associations were noted, they were found to correlate mathematically, and then causal processes were later discovered for each.
6 John Phillips, How to Think about Statistics 6th ed. (New York: W. H. Freeman, 2001) 78: “one meaningful interpretation of R is as the slope of the regression of Y on X when both are laid out in standard units.” Other synonyms for the trendline on a two-variable scatterplot: the regression line, line of least squares, and line of best fit. Id. 81. I prefer trendline, because it‘s the shortest and least pretentious.
7 Deirdre McCloskey says, “For a long time in Britain the number of ham radio operator licenses granted annually was very highly correlated with the number of people certified insane.” https://www.deirdremccloskey.com/articles/stats/why.php
8 “Francis Bacon” Encyclopedia of Philosophy vol. 1 ed. Paul Edwards (New York: Macmillan, 1967) 239
9 Patrick Hurley, Concise Introduction to Logic 8th ed. (Belmont, CA: Wadsworth/Thomson Learning, 2003) 495.
The strong, positive correlation between C/G and the Note yield seems to be neither a case of direct cause & effect, nor merely coincidental effects, but in a third class of correlation.10 Two variables may correlate as joint effects of many hidden lurking variables.11 Students who do well on the math SAT tend to do well on the verbal section. So, if you want to get good at English, read a math book? No, but proficiency in both subjects seems to co-occur from a hidden set of lurking variables: high intelligence, good study habits, good teachers, good parenting, sound nutrition, etc.
Similarly, in the case of how copper, gold, and Treasuries react to broad investor sentiment, we note that (1) copper is a leading industrial metal, and its price tends to rise when the economy is strong and investors are bullish or risk-on; (2) gold is used in industry, and for jewelry, but is held mainly as a store of value in coins or bars by banks and investors when the mood is risk-off; and (3) Treasuries are bought as a reliable source of income when the outlook for stocks is bad or uncertain.
What advice did the seasoned investment-advisor A. Gary Shilling give in early 2008? He said (1) short copper and (2) buy Treasury bonds.12 He might as well have added buy gold, which was up 4% in 200813 – while the S&P 500 tumbled -37%, the price of copper dropped, and yields on Treasuries fell (as their prices were bid up). So, the 2008 market affirmed how the copper/gold ratio correlates with the Treasury note yield.
R Squared
Whenever you have a correlation r, it’s wise to square it, because 1– r2 tells you what percentage of the change in y is independent of the change in x.14 Therefore, be cautious with correlations that are .7 or less, because .7 squared = .49. So 1 – .49 = .51, which indicates the majority of change in y is independent of x.
But, in our case, r2 = 0.955. So less than 5% of the change in the Note yield appears to be independent of change in C/G. (More frequent readings over a longer period of time are needed to determine this more accurately, however. And these results justify further testing.)
10 A fourth class of correlation occurs when two variables vary as co-effects of a single, third variable, as when a car’s muffler and oil pan heat up gradually, not due to thermal induction between them, but due to the engine heating both.
11 David Moore et al. Basic Practice of Statistics 8th ed. (New York: W.H. Freeman, 2018) 147.
12 http://www.agaryshilling.com/insight
13 https://onlygold.com/gold-prices/historical-gold-prices/
14 Moore, Basic Practice 136-7.
Significance of the statistics
Nowadays, computers can process columns of numbers x,y and determine the likelihood of obtaining a correlation by chance. If the probability of obtaining your value of r by chance is less than one in 20 (less than 5%), which is the customary threshold for statistical significance, you can report your statistic as significant at the p = .05 level, meaning something probably underlies it beyond chance. According to StatCrunch.com, our statistic is significant at the higher p =.0001 level, since the probability of getting an r of .977 by chance with 26 pairs of numbers is less than one in 10,000.15 So the test results are significant. But just how valid and reliable they are still takes some showing.
A control correlation on two seemingly-different things was also performed to illustrate how easily two assets may move in tandem. A cereal grain and a tech stock were selected with a correlation of zero expected. But after the 26 daily closing prices of 100 bushels of oats and one share of Apple stock were analyzed, Sept to Oct 2019, a correlation r = .84 was found. As two publicly-traded assets, they may have appreciated similarly from inflation, broad economic factors, or coincidence.
How may the trendline formula assist the bond investor?
When a tweedy, Boston blue-nose buys Treasuries, he becomes a reluctant market timer, since the purchase has to occur at some point in time. But, before deciding when, the trendline indicates what market sentiment has said a Treasury note should yield on average given that day’s current copper/gold ratio, which acts like a thermometer, in effect, taking the market’s temperature in terms of risk-on/risk-off.
Suppose the price of copper is $26.00 (for ten pounds) today, and one ounce of gold is $1,485.70, then the C/G is .0175, times 100, = 1.75, x value (abscissa) on our chart. The formula for our trendline gives us the ordinate value, y (expected mean value) = –1.335 + 1.743.16 It says the average Note yield should be 1.714% today. You’d then compare 1.714% to today’s yield on the Treasury note, say, 1.80%, and you’d confirm you’re getting an above-average yield – a decent risk premium – given the recent history of market sentiment.17 Conversely, a yield below 1.714% pays you less and is less appealing.
15 On StatCrunch.com, one enters the x,y data on a spreadsheet and selects simple linear regression to get the readout. You can also check it on a table of critical values for your correlation r, given its degrees of freedom, n-2. In our case with n – 2 = 24, and the table indicates any correlation above .5986, such as ours, has a probability of happening by chance less than one in 1,000 times, when the correlation is in fact zero, viz. the slope of the trendline is zero. See Table E in Moore, Basic Practice 701. One could also use the formula for significance of r by finding its t value: t = r(square root of {n – 2})/(square root of {1 – r2}), which gives us a t above 22. See David Voelker et al. Statistics Quick Review 2nd ed. (Boston: Houghton, 2011) 108. And any value of t over 2.797 is significant at the p = .01 level according to the applicable table in a statistics book. Moore, Basic Practice 699.
16 Our x,y data gave us the trendline formula y = -1.34 + 1.74x, an instantiation of y = a + bx, where a is our statistic, an estimate of the unknown population parameter A, the y-intercept; and b is our estimate of B, the slope of the trendline, which is also an unknown population parameter. Moore, Basic Practice 603.
17 To be sure, Gundlach made the softer claim that a change in C/G predicts the Note yield “directionally”in footnote #1. He did not say magnitudinally, but the trendline and its standard error and the prediction interval for a level of confidence, say, 95%, combine to offer data on the reasonable, expected magnitude range of the Note yield, given a C/G, which is of added utility to the bond investor, I submit.
Based on the standard error of the Note yield, StatCrunch tells us the prediction interval, with a 95% level of confidence, is 1.56% –– 1.87%. And if you prefer a 99% level of confidence, your prediction interval widens to 1.50% –– 1.92%, which is plus/minus 21 basis points from the mean 1.71%.18 Think of the trendline as the centerline of an invisible bell-curve at point x which contains a distribution of potential y values.19
To hold out for a bargain, you could delay your purchase until the Note yield pays at or beyond the high end of your prediction interval. But data must be kept up to date to make sure the daily formula is still current, because the trendline evolves over time.
What about the trader?
Could a bond trader program a computer to profit from the trendline? Probably not, since there is no proof of mispricing or inefficiency to exploit yet. Just as stock prices vary, so do prices for copper, gold, and bonds. Contrast that to the mispricing of S&P futures when they were first introduced in 1982. Legendary applied-mathematician Ed Thorp detected it, sold the futures, bought stocks as a hedge, and made $6 million in four months.20 But word spread, and more people with computers soon made the price of futures more efficient, so profits dried up.
In our case, despite widespread knowledge, the correlation remained very high, .977. This does not seem like a mispricing anomaly to exploit, but a longstanding tendency among risk-asset classes, e.g., the price of the Dow index often correlates negatively with the price of gold (the latest weekly data from 8/10/19 to 10/26/19 shows and r = –.66, for DIA and SGOL). These types of correlations seem more relevant to asset-allocation decisions, for minimizing portfolio risk and for optimal overall performance. How the trader can profit from them is an open question. In the case above, where the investor bought a note yielding 1.80%, if interest rates then regressed to the mean,21 the note would have appreciated in value. A trader could sell it for a small profit, in theory. But this seems like saying buy the Dow when its price is a historically low multiple of gold, then sell when its price regresses to the mean. A bromide of dubious utility in practice.
18 StatCrunch.com allows you to select your level of confidence before it prints out the applicable y values. The statistics textbook recommends using such a computer program, instead of a calculator and formula, because otherwise “roundoff error” accumulates to distort your results. Moore, Basic Practice 606.
19 We use “standard error” instead of standard deviation, sigma, for such an invisible bell curve, because sigma remains an unknown population parameter, which we try to estimate with our clumsy statistics, based on our samples, an inexact science, but it’s still the best tool in our drawer for estimating the unknowns, such as the slope of the trendline, its y-intercept, r, and the mean value of y, given x. Phillips, How to Think 96-9. Moore, Basic Practice 603.
20 William Poundstone, Fortune’s Formula (New York: Hill & Wang, 2005) 177.
21 Regression to the mean is a tendency of nature. We get the term from the study of heredity. It was observed that unusually tall parents tended to have children shorter than them, viz. height regressed to the mean. Moore, Basic Practice 129.
* * *
“If you wish to be good, first believe that you are bad,” says the master of ethics.22 And to combat hubris in investing: if you wish to predict something, first believe it is unpredictable. The trendline formula offers the appearance of precision for something imprecise, so it can be hazardous. Once this is realized, however, the formula assists us by telling us more than we’d know without it. And I’d say that’s enough to make it worth knowing.
© Stephen Doty 2019
22Epictetus, Enchiridion trans. by George Long (Mineola, NY: Dover, 2004) 24.