Beyond exercises in substitution and direct interpretation: The use of "puzzles" in teaching linear regression
Steve Cook
Swansea University
s.cook at swan.ac.uk
Peter Dawson
University of East Anglia
Peter.Dawson at uea.ac.uk
Duncan Watson
University of East Anglia
Duncan.Watson at uea.ac.uk
Published January 2025
1. Introduction
In their exploration of effective strategies for teaching introductory econometrics, Cook et al. (2025) highlight the advantages of using simple examples based on artificially simulated data. While this approach may initially seem counterintuitive — given the widespread preference, and often a strong demand, for incorporating real-world data — they provide a detailed pedagogical evaluation to address such concerns. They argue that various elements of pedagogical research support the use of this ‘simplified’ approach, particularly when it is situated within a replication-based framework. Notable sources of support include research on cognitive load theory (Sweller et al., 1998, 2019; van Merriënboer and Sweller, 2005) and the Expertise Reversal Effect (ERE; Kalyuga et al. 2003). However, while emphasising the benefits of simplification, Cook et al. (2025) are careful to stress that their advocacy for simple examples does not come at the expense of using real-world data. Instead, they propose that the two approaches should be viewed as complementary, with the choice of method depending on the nature of the material being taught and the learners’ stage in the educational process. By striking a balance between simplicity and real-world complexity, their approach aims to optimise learning outcomes without making an either/or decision between these methods.
This case study examines an alternative approach to designing examples and tasks for teaching linear regression, focusing on exercises that are both simplified and cognitively challenging. Drawing on the ERE discussed in Cook et al. (2025), it considers how the utility of examples depends on a learner’s prior knowledge and experience.[1] The goal is to provide an alternative to traditional methods that balances clarity with intellectual engagement. By ‘simplified’, we refer to examples that are clear-cut and hypothetical, allowing learners to concentrate on econometric methods without being distracted by computational complexity. However, these tasks still pose cognitive challenge due to their design and the way questions are framed.
Two widely used approaches in econometrics education can be labelled as ‘exercises in substitution’ and ‘direct interpretation’. In the ‘exercises in substitution’ approach, learners are tasked with applying provided values to formulae, such as calculating a t-statistic to test a null hypothesis or using sums of squares to derive an R2 statistic. In the ‘direct interpretation’ approach, learners analyse empirical output and interpret results, often by evaluating t-statistics and p-values to determine the statistical significance of coefficients. While both approaches are valuable, this case study seeks to offer a complementary alternative that enhances learning outcomes.
Taking inspiration from the examples in Cook et al. (2025), where significance testing of a coefficient is requested without providing the estimated coefficient, standard error, or t-ratio, we aim to create examples that pose a challenge by presenting a potential puzzle or puzzling content. The questions we design share a common approach: they may initially appear to lack all the information necessary for their completion. We argue that by requiring learners to reconsider familiar material from a different perspective, these tasks can both challenge and enhance learning. This approach is evident in the exercises involving the testing of multiple hypotheses concerning regression coefficients. Instead of, for example, simply asking learners to substitute provided values into a formula and apply a decision rule to determine whether a null can be rejected, the exercises require deeper reflection on the intuition behind hypothesis testing. Specifically, learners are encouraged to understand that the null hypothesis imposes a restriction, and if the penalty associated with this restriction is too great, the data will support rejecting the null.
Given the ‘puzzling’ nature of our examples, we propose that such tasks complement more conventional exercises involving substitution and direct interpretation and be introduced at a later stage in the learning process. Once a foundational understanding has been developed, these tasks can be used to deepen knowledge and explore issues related to specific topics. In the following section, each task’s central question is preceded by a discussion of its underlying motivation and followed by a sketched solution. We conclude with a final section offering reflections on our proposed approach.
2. Example 1
Motivation
This question explores hypothesis testing through the activities of two hypothetical investigators. While their approaches share many similarities- both estimate multiple linear regression models of the same dimension, use samples of the same size, perform a joint test of the significance of two coefficients in their models, and observe that imposing the null hypothesis restriction increases the residual sum of squares by 20 units- they draw very different (yet correct) inferences from their analyses.
Question
Two investigators estimated multiple linear regression models. Investigator A estimated Model A, while Investigator B estimated Model B:
Both models were estimated using samples of 44 observations, resulting in residual sum of squares (RSS) values of 400 for Model A and 50 for Model B. Each investigator performed a joint hypothesis test:
- Investigator A tested the null H0: α1 = α2 = 0.
- Investigator B tested the null H0: β1 = β2 = 0.
Both investigators noted that imposing the restrictions specified by their null hypotheses increased the RSS of their respective model by 20 units.
However, the investigators reached different conclusions: one claimed their null could not be rejected at the 10% significance level, while the other claimed that their null could be rejected at the 1% significance level. Is this possible? How can the same increase in RSS lead to such different conclusions? If so, which investigator can reject their null, and which cannot?
Answer
This question highlights the importance of the relative change in the RSS following the imposition of a restriction. The two F-statistics will be given as 1 and 8 respectively, as follows:
The difference in calculated test statistics reflects the relative impact of the 20-unit increase in RSS. For the first model, the change represents 5% of the unrestricted RSS (20 compared to 400), while for the second model, it represents 40% (20 compared to 50). The relevant critical values at the 10% and 1% significance levels for F2,40 are 2.44 and 5.18 respectively. Therefore, the null cannot be rejected at the 10% significance level for Model A, but it can be rejected at the 1% significance level for Model B.
3. Example 2
Motivation
This question requires the calculation of an F-statistic to test multiple hypotheses concerning regression coefficients. However, specific values for the restricted RSS (RRSS) and unrestricted RSS (URSS) are deliberately omitted. This omission is intended to deepen understanding of the F-statistic by highlighting how its value reflects the impact of the null hypothesis restriction, particularly in terms of the relative increase in the RSS. The focus is therefore on the proportional change in the RSS following imposition of a restriction, rather than on specific numerical values. By omitting actual values and highlighting relative changes, the question encourages learners to grasp the conceptual significance of this relationship.
Question
Suppose an investigator estimated the following multiple linear regression model using a sample of 107 observations:
The investigator then tested the null hypothesis H0: β2 = β3 = β4 = β5 = 0. State the resulting F-statistic associated with testing this null if:
- Imposing the restriction given by the null results in a restricted model with an RSS twice the size of the RSS for Model 1.
- Imposing the restriction given by the null results in a restricted model with an RSS four times the size of the RSS for Model 1.
- Case I: The RSS associated with Model B is 5% greater than that of Model A.
- Case II: The RSS associated with Model B is 10% greater than that of Model A.
- Case III: The RSS associated with Model B is 15% greater than that of Model A.
- Case IV: The RSS associated with Model B is 20% greater than that of Model A.
Answer
The required F-statistic is given as . For this specific exercise, the value of is given as 25. Although specific values for the RRSS and URSS are not provided, their relative sizes are given. Therefore, the term takes the value 1 in the first case and 3 in the second. Using these values, the resulting F-statistics are 25 and 75, respectively. This simplified example illustrates how the value of the F-statistic varies based on the penalty associated with the restriction imposed by a null hypothesis.
4. Example 3
Motivation
Here, we build on the previous example by introducing an additional layer of complexity. Once again, the task involves calculating F-statistics for a joint test of regression coefficients without specific values for the RRSS and URSS. However, this time, more detailed relative values for the changes in the RSS are provided, and inferences must also be drawn. The design of the example, with its finer gradation of relative RSS values, facilitates a progression from non-rejection to overwhelming rejection of the null hypothesis. This shift in inferences underscores the key message: evidence against a null hypothesis depends on the penalty associated with the restriction it imposes.
Question
Suppose an investigator estimated the following multiple linear regression model using a sample of 64 observations:
The investigator then tested the null hypothesis
H0: β2 = β3 = 0
Imposing the restriction specified by this hypothesis generates a new model, denoted as Model B. What inference would you draw when testing this null hypothesis using an F-test under each of the following hypothetical circumstances?
Answer
As with the previous example, we will use the expression . For this example, While specific values of the RRSS and URSS are not provided, the relative values for Cases I-IV result in taking the values 0.05, 0.1, 0.15 and 0.2, respectively. Using these values, the F-statistics for Cases I-IV are calculated as 1.5, 3, 4.5 and 6, respectively. These test statistics are compared to critical values from the F-distribution with 2 and 60 degrees of freedom (F2,60). The relevant critical values at the 10%, 5% and 1% levels of significance are 2.39, 3.15 and 4.98. The following diagram, although not to scale, summarises this information:
Test statistics | |||||||
Case I 1.5 | Case II 3 | Case III 4.5 | Case IV 6 | ||||
● 2.39 (10%) | ● 3.15 (5%) | ● 4.98 (1%) | |||||
Critical values (level of significance) |
In Case I, the null hypothesis is not rejected, even at the 10% significance level. However, as the penalty imposed by the restriction associated with the null increases in subsequent cases, we observe rejection of the null. This progression demonstrates how increasing penalties associated with the restriction lead to stronger evidence against the null hypothesis, moving from non-rejection to rejection at increasingly stringent significance levels.
5. Example 4
Motivation
This example aims to enhance and challenge students’ ability to recognise the relationships between models and their transformation under imposed restrictions. Three models are presented, but their RSS values are deliberately omitted. Learners are tasked with comparing these unreported RSS values, creating a unique and engaging challenge.
Question
Investigator A estimated the following multiple linear regression model using a sample of 64 observations:
Using the same sample, Investigator B estimated the following model:
Using the same sample, Investigator C estimated the following model:
Investigators A and B compared the RSSs for their models and noted that Model A had a lower RSS than Model B. How does the RSS of Model C compare to the RSSs for Models A and B?
Answer
Models B and C are alternative restricted versions of Model A, derived under the restriction α1 + α2 = 1 . Model B arises if the restriction is considered as α1 = 1 − α2 , while Model C arises when the restriction is considered as α2 = 1 − α1 . Consequently, the RSS values for Models B and C are equal. Since the RSS for the unrestricted Model A is lower than that of the restricted Model B, it must also be lower than the RSS of Model C.
6. Example 5
Motivation
This example is designed to deepen understanding of the R2 and adjusted R2 (R2) statistics, as well as the specification of linear regression models. Instead of presenting a straightforward task, such as calculating the R2 statistic for an estimated model using provided components (e.g. RSS, ESS, and TSS), it moves beyond a simple ‘exercise in substitution’. It offers a more challenging test of understanding, requiring learners to synthesise background knowledge about these statistics and apply it to specific examples.
Question
An investigator collected data on five variables, denoted as Y, X1, X2, X3 and X4. Using a sample of 100 observations, the investigator estimated three models, referred to as Models A, B and C. The estimation results for these models are presented below.
Model A Dependent Variable: Y Method: Least Squares Sample: 1 100 Included observations: 100 | ||||
Variable | Coefficient | Std. Error | t-Statistic | Prob. |
C | 0.934490 | 0.200060 | 4.671042 | 0.0000 |
X1 | 0.834145 | 0.220485 | 3.783227 | 0.0003 |
X2 | 1.764009 | 0.203859 | 8.653083 | 0.0000 |
Model B Dependent Variable: Y Method: Least Squares Sample: 1 100 Included observations: 100 | ||||
Variable | Coefficient | Std. Error | t-Statistic | Prob. |
C | 0.904417 | 0.192253 | 4.704318 | 0.0000 |
X1 | 0.841975 | 0.211617 | 3.978767 | 0.0001 |
X2 | 1.787656 | 0.195799 | 9.130066 | 0.0000 |
X3 | 0.557781 | 0.182752 | 3.052119 | 0.0029 |
Model C Dependent Variable: Y Method: Least Squares Sample: 1 100 Included observations: 100 | ||||
Variable | Coefficient | Std. Error | t-Statistic | Prob. |
C | 0.957089 | 0.204137 | 4.688467 | 0.0000 |
X1 | 0.829754 | 0.221324 | 3.749042 | 0.0003 |
X2 | 1.769946 | 0.204759 | 8.644029 | 0.0000 |
X4 | 0.116475 | 0.191846 | 0.607129 | 0.5452 |
Suppose you asked the investigator for the calculated R2 and the adjusted R2 (R2) statistics for these models and received the following response:
‘The R2 for Model B was 53.7%. and the R2 for Model C was 50.9%. The remaining four statistics you requested took the values 55.1%, 49.7%, 50.7% and 49.4%. However, I cannot recall which of these four values correspond to the different statistics for the different models.’
Can you correctly assign the four values— 55.1%, 49.7%, 50.7% and 49.4% — to the R2 and R2 statistics for the models?
Answer
We are given two calculated values but need to assign values to the remaining R2 and R2 statistics across the three models. To denote the model being considering, we will use subscripts A, B and C. Using this notation, we know the values for and but need to assign values to , , and .
One approach to solving this is as follows: Model A serves as a base model, while Models B and C are extensions that include an additional variable. Model B adds the variable X3, which has a highly significant t-ratio for its associated coefficient. In contrast, Model C adds a variable with a coefficient that is insignificant at conventionally accepted levels of significance. Given this information and knowing that the inclusion of an additional variable increases the R2 statistic only if the variable’s associated coefficient has an absolute t-ratio greater than 1, we can conclude:
(1) >
We know that the R2 statistic for Model A will be lower than the associated R2 statistic for this model. Therefore, we can extend this as:
(2) > >
Finally, since the R2 statistic cannot decrease when an additional variable is added to a model, will have the highest value among the four remaining statistics. Thus, we assign: = 55.1%. We are now left with three values to assign. Using expression (2), we can assign the remaining statistics as follows: = 50.7%, = 49.7% and = 49.4%. These results can be tabulated as follows:
Model A | Model B | Model C | |
---|---|---|---|
R2 | 50.7% | 55.1% | 50.9% |
R2 | 49.7% | 53.7% | 49.4% |
7. Concluding remarks
Drawing inspiration from Cook et al. (2025), we propose a novel approach to developing exercises that both challenge and enhance understanding of introductory econometrics. Departing from the traditional methods of substitution and direct interpretation exercises, our approach focuses on creating simple yet cognitively demanding tasks. A key feature of these tasks is their puzzling nature: at first glance, it may appear that the questions lack sufficient information to generate solutions. However, this perception arises from the different perspective these questions provide on familiar topics, requiring learners to adopt a similarly different perspective to arrive at solutions. We argue that this approach not only challenges and deepens knowledge but also offers a fresh perspective, highlighting the underlying motivation and intuition behind the econometric methods being considered.
References
Cook, S., Dawson, P. and Watson, D. 2025. Bridging the quantitative skills gap: Teaching simple linear regression via simplicity and structured replication. (forthcoming in The Handbook for Economics Lecturers)
Kalyuga, S., Ayres, P., Chandler, P. and Sweller, J. 2003. Expertise reversal effect. Educational Psychologist 38, 23-31. https://doi.org/10.1207/S15326985EP3801_4
Sweller, J., Van Merrienboer, J. and Paas, F. 1998. Cognitive architecture and instructional design. Educational Psychology Review 103, 251-296. https://doi.org/10.1023/A:1022193728205
Sweller, J., Van Merrienboer, J. and Paas, F. 2019. Cognitive architecture and instructional design: 20 years later. Educational Psychology Review 31, 261-292. https://doi.org/10.1007/s10648-019-09465-5
van Merrienboer, J. and Sweller, J. 2005. Cognitive load theory and complex learning: Recent developments and future directions. Educational Psychology Review 17, 147-177. https://doi.org/10.1007/s10648-005-3951-0
Notes
[1] In our discussion we use the terms ‘examples’, ‘exercises’ and ‘tasks’ interchangeably. The pedagogical literature often refers to ‘examples’, particularly in the context of distinguishing worked examples from problem solving activities. However, we are considering examples that a lecturer might use to illustrate issues related to linear regression while also engaging learners in active participation. Consequently, we use the terms ‘exercises’ and ‘tasks’ to emphasise the interactive nature of activities where learners are set challenges and required to find solutions.
↑ Top