Replication and Reproduction III: Holmes & Otero (2019), unit root testing of differentials and cointegration analysis

Steve Cook
Swansea University
s.cook at swan.ac.uk
and Duncan Watson
University of East Anglia
Duncan.Watson at uea.ac.uk

Published November 2025

https://doi.org/10.53593/n4411a

Summary
1. Introduction
2. Reproducing Holmes & Otero (2019)
3. Conclusion
References
Footnotes

This case study is the third in a set of materials on the effective incorporation of research in undergraduate econometrics edited by Peter Dawson of the University of East Anglia.

Summary

Cook and Watson (2025a) have recently championed the benefits of using reproduction and replication (R&R) as a means of incorporating empirical research in the teaching of econometrics. By requiring close engagement with published findings, they argue that R&R offers a number of pedagogical benefits. This study provides an illustration of R&R in practice by drawing upon Holmes and Otero’s (2019, Energy Economics) analysis of crude oil spot and futures prices. Using the data from this study, unit root and cointegration analysis are applied and discussed in the course of reproducing published findings.

1. Introduction

Building on the approach set out in Cook and Watson (2025a), the recent case studies of Cook and Watson (2025b, 2025c) have demonstrated how replication and reproduction (R&R) can be used to embed research directly within the teaching of undergraduate econometrics. This study extends that work by shifting the focus from unit root analysis of individual series to the examination of relationships between series – specifically, through the unit root testing of ratios, or differentials, and subsequent cointegration analysis. As in the earlier case studies of Cook and Watson (2025b, 2025c), the analytical framework is anchored in the R&R of a published study, with Holmes and Otero (2019, hereafter HO) providing the empirical context here. Rather than simply presenting econometric techniques, the R&R approach employed challenges students to actively reproduce published findings, thereby fostering deeper understanding of both methodology and interpretation.

2. Reproducing Holmes & Otero (2019)

HO provide an analysis of crude oil spot and futures prices. Their study examines a range of series, considering different maturities for futures contracts as well as alternative data frequencies (daily, weekly, and monthly). For the non-daily observations, both average and end-of-period figures are used, thus further increasing the number of series considered. These series are available from the supplementary data appendix provided with the original paper.[1] To illustrate how this paper can serve as a resource for integrating research into teaching, the present study focuses on daily data for two of the series considered by HO, namely, spot prices and 4-month futures prices. Expressed in natural logarithmic form, these series are denoted here as SPOT and CL4 respectively.

The augmented Dickey-Fuller (1979, hereafter referred to as ADF) test results reported for these series in Table 3 of HO (p. 231) are reproduced in Tables One and Two below. As Cook and Watson (2025a) note, reproducing empirical findings encourages closer engagement with the original research. Here, for example, the structure of the testing equation – as presented in equation (1) below – requires careful consideration, including discussion of its components and their roles. Similarly, the construction of the test statistic – as shown in equation (2) – can be examined in relation to the testing equation itself. The approach to augmentation adopted by HO, which uses the Schwarz Information Criterion (SIC) with a maximum lag of 30, is employed to reproduce their findings. However, this offers a basis for discussion of alternative approaches employing, for example, sequential -testing or the Akaike Information Criterion (AIC), as well as use of the Schwert (1989) rule, to determine the maximum lag length.[2]

(1)

(2)

Table One: Replicating SPOT unit root test results

Null Hypothesis: SPOT has a unit root Exogenous: Constant, Linear Trend Lag Length: 0 (Automatic - based on SIC, maxlag=30)
		t-Statistic	Prob.*
Augmented Dickey-Fuller test statistic		−2.654313	0.2560
Test critical values:	1% level	−3.959311
	5% level	−3.410428
	10% level	−3.126974

Table Two: Replicating CL4 unit root test results

Null Hypothesis: CL4 has a unit root Exogenous: Constant, Linear Trend Lag Length: 1 (Automatic - based on SIC, maxlag=30)
		t-Statistic	Prob.*
Augmented Dickey-Fuller test statistic		−1.923580	0.6419
Test critical values:	1% level	−3.959311
	5% level	−3.410428
	10% level	−3.126974

The results in Tables One and Two lead to non-rejection of the unit root hypothesis at conventional significance levels for both SPOT and CL4. HO then proceed to examine the long-run relationship between spot and futures prices. This is explored through unit root testing of spot-to-futures differentials (or ratios) and the application of the Johansen (1988) procedure to assess potential cointegration. The differential between the SPOT and CL4 series is constructed and referred to here as ‘DIFF’. The ADF test results for this series are reported in Table Three and reproduce the findings of HO (p. 232, Table 4) for this SPOT:CL4 differential.

Table Three: Replicating results for the differential

Null Hypothesis: DIFF has a unit root Exogenous: Constant Lag Length: 19 (Automatic - based on SIC, maxlag=30)
		t-Statistic	Prob.*
Augmented Dickey-Fuller test statistic		−5.159380	0.0000
Test critical values:	1% level	−3.431083
	5% level	−2.861749
	10% level	−2.566923

Again, this analysis can be used to challenge and deepen understanding of the application and interpretation of the ADF test. For example, reproducing the test statistic requires careful attention to the approach to augmentation and the inclusion of deterministic terms. Reviewing the full set of results from the ADF test also enables reproduction of the value of the estimate of the coefficient on the lagged differential regressor reported in the original study. These findings can be drawn upon for classroom discussion alongside other issues such as the use of alternative augmentation methods and their effects upon inferences, and the implications of detecting stationarity in a ratio constructed from non-stationary series.

Turning to the consideration of cointegration, the findings of HO for our chosen series are again reproduced – see Tables Four and Five below and Tables 5 and 6 in HO (pp. 233-234). This replication requires learners to engage closely with the original research, supporting the development of a deeper understanding of the Johansen (1988) procedure. A range of issues can be explored through this application, including: the nature and role of the underlying vector autoregressive (VAR) model; the structure and purpose of the vector error correction model (VECM); the impact of alternative information criteria; the connection between unit root testing of the differential and the Johansen analysis of the underlying series; the use of the Johansen approach in bivariate settings; comparisons with the Engle-Granger (1987) procedure; and the interpretation of non-zero eigenvalues in identifying cointegrating relationships.

Table Four: Reproducing Trace statistic results

Unrestricted Cointegration Rank Test (Trace)
Hypothesized No. of CE(s)	Eigenvalue	Trace Statistic	0.05 Critical Value	Prob.** Critical Value
None *	0.009528	70.54988	15.49471	0.0000
At most 1	0.000232	1.669685	3.841465	0.1963
Trace test indicates 1 cointegrating equation(s) at the 0.05 level * denotes rejection of the hypothesis at the 0.05 level **MacKinnon-Haug-Michelis (1999) p-values

Table Five: Reproducing Maximum eigenvalue test results

Unrestricted Cointegration Rank Test (Max-eigenvalue)
Hypothesized No. of CE(s)	Eigenvalue	Max-Eigen Statistic	0.05 Critical Value	Prob.** Critical Value
None *	0.009528	68.88020	14.26460	0.0000
At most 1	0.000232	1.669685	3.841465	0.1963
Max-eigenvalue test indicates 1 cointegrating equation(s) at the 0.05 level * denotes rejection of the hypothesis at the 0.05 level **MacKinnon-Haug-Michelis (1999) p-values

3. Conclusion

This study demonstrates how reproduction and replication can do more than reinforce econometric technique – they can transform the way students engage with research. By working through the empirical structure of HO, learners are exposed to the practical application of unit root testing, the construction and interpretation of differentials, and the logic of cointegration through the use of the Johansen procedure. Each stage prompts methodological reflection: from lag selection and deterministic components to the meaning of stationarity and long-run equilibrium.

Crucially, the study equips students to move beyond formulaic application and to develop a clearer sense of how econometric decisions shape economic conclusions. For lecturers, it offers a ready-made resource that connects abstract concepts with real-world data and published research, encouraging replication not just as a technical exercise but as a tool for critical learning.

References

Cook, S. and Watson, D. 2025a. From provision to understanding: The effective incorporation of research in undergraduate econometrics. Economics Network Handbook for Economics Lecturers. https://doi.org/10.53593/m4412a

Cook, S. and Watson, D. 2025b. Replication and Reproduction I: Leybourne (1995, Oxford Bulletin of Economics and Statistics) and the maximum Dickey-Fuller test. Economics Network Ideas Bank. https://doi.org/10.53593/n4409a

Cook, S. and Watson, D. 2025c. Replication and Reproduction II: Leybourne et al. (1998, Journal of Econometrics) and the Dickey-Fuller test in the presence of breaks under the null. Economics Network Ideas Bank. https://doi.org/10.53593/n4410a

Dickey, D. and Fuller, W. 1979. Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association 74, 427-431. https://doi.org/10.1080/01621459.1979.10482531

Engle, R. and Granger, C. 1987. Co-integration and error correction: Representation, estimation and testing. Econometrica 55, 251-276. https://doi.org/10.2307/1913236

Holmes, M. and Otero, J. 2019. Re-examining the movements of crude oil spot and futures prices over time. Energy Economics 82, 224-236 https://doi.org/10.1016/j.eneco.2017.08.034

Johansen, S. 1988. Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control 12, 231-254. https://doi.org/10.1016/0165-1889(88)90041-3

MacKinnon, J. G., Haug, A. and Michelis, L. 1999. Numerical distribution functions of likelihood ratio tests for cointegration. Journal of Applied Econometrics 14, 563-577. https://doi.org/10.1002/(SICI)1099-1255(199909/10)14:5<563::AID-JAE530>3.0.CO;2-R

Schwert, G. 1989. Tests for unit roots: A Monte Carlo investigation. Journal of Business and Economic Statistics 7, 147-159. https://doi.org/10.1080/07350015.1989.10509723

Footnotes

[1] https://www.sciencedirect.com/science/article/pii/S0140988317303018?via%3Dihub#s0030

[2] Note that the sample-based Schwert rule leads to a maximum lag length of 34 for the sample considered, rather than the value of 30 employed by HO.

↑ Top

Other teaching ideas in