The Economics Network

Improving economics teaching and learning for over 20 years

IREE Best Paper Presentation 2022

The International Review of Economics Education Best Paper Award for 2022 has been awarded to Daria Bottan, Douglas McKee, George Orlov and Anna McDougall for their paper on "Racial and gender achievement gaps in an economics classroom" published in volume 40.

Online presentation

The authors could not attend the DEE 2023 conference in Edinburgh for the award ceremony, so the award was celebrated in a special online session on 5 October 2023.

Transcript

This has been lightly corrected from an automatic transcription, so may contain errors.

David McCausland (Editor in Chief, IREE): many thanks for everyone to come on this Thursday afternoon. It gives me great pleasure to welcome Douglas McKee, whose paper was co-authored with Daria Button, George Olaf and Anna Mcdougall. That paper, Racial and gender achievement gaps in an economic classroom, won this year's IREE best paper award.

And we normally have this presented at our Economics Network Developments in Economics Education Conference, which was held a few weeks ago. Unfortunately, that couldn't make it a long, long way across to to Edinburgh. So he's very kindly agreed to to give an online presentation of his paper. This session is co-hosted by the Economics Network. So without further ado, I'll hand over to Doug. Many thanks.

Douglas McKee: David. Thank you for the kind introduction. I really appreciate it.

This is actually a paper that I think really fondly of, not just because it's won this award which I really appreciate. But because this is actually one of my very first economic education papers. It's had a long history. You'll know when you see the data. The data was collected actually, a few years ago. But it was also one. It's one of the first papers written using data from a project here at Cornell we call the Active Learning Initiative, where we've invested actually, quite a bit of resources in developing assessments of learning and also incorporating evidence based pedagogy into undergraduate classes in our offered by our department.

We've collected a tremendous amount of data through that project which we have started writing lots of doing lots of research that we didn't anticipate doing using that data. So this paper, though it focuses on really just 3, I think, fairly straightforward questions. Let me share my slides. So it really focuses on just 3 pretty straightforward questions. the first is how do race and gender predict student performance. And what we're doing there is we're looking for for race gaps.

Is it true that male and female students have, some learn different amounts during a class, or come out with different performance? And similarly, how important is race and the race categories we're gonna use are really simple. We're gonna have 2 categories: underrepresented minorities and everyone else. Okay and underrepresented. I'll say that the under the "everyone else" is White or Caucasian students and students from Asia. which are quite represented in our classes at Cornell. And then we've got African American students. Hispanic students, students from other places in the world are the underrepresented students. Okay? So we're gonna look for evidence of those gaps in in an economics course.

And then we're going to see if the race gaps - because we're going to see race gaps - and we're going to see if they differ from male and female students. I'll prevail. Our answer. Would they do? We see bigger gaps for female students of color than we do male students of color.

And then we're gonna try to test some theory we're gonna propose. And then we're gonna test some theories for why those gaps exist. And then that's going to inform what we can actually do about it, because the treatment really depends on the think of the root causes of these gaps. Okay, so that's where we're headed.

Alright. The approach we use is pretty simple. We're going to take one class taught by one instructor. So we're really gonna reduce variation from other sources by focusing on that. And it's a class that was taught in 4 separate terms. So we can pool the data from from all 4 classes. And we're gonna look for gaps in student outcomes. So I'm gonna show you lots of regression results and the regressions are pretty straightforward. Then we're going to see if those racial gaps differ by gender.

We're gonna propose 5 possible explanations. And then we're gonna test 3 of them with other regression models. So that's what we're gonna do. Okay? So now, I want to step back and tell you a bit about bit more about the context. So the class that we're actually operating in so you can decide whether you can, we can generalize these results to other courses, instructors, institutions and then I'll talk about the data that we have available that we're gonna be analyzing. Okay?

So one course, it's an applied econometrics course. This course is required. By the economics, Major. Most economics, majors take the course in their sophomore or junior year requires them to have taken an intro statistics course. It requires them to have taken the intro micro economics course.

I would say, we get about a about a third of the students are not economics majors they come from other departments. And those students, they're pretty pretty motivated. They don't have to be there. They're there because they specifically want this content. Okay. the classes range in size, they're all kind of medium size. So none of them are small and none of them are really big. That's between like 50 and 150 students. Okay? And our sample. This is the 304 students for which we have all the demographic data, all the assessment data, all the survey data that we want and it's a pretty big chunk of the total number of students that were actually registered in these classes. So it's not. It's not actually a very weird, select sample at all.

The curriculum's nearly identical in all 4 terms. And the pedagogy is quite similar. I always change something when I teach but sometimes it's it's a major thing, and other times it's a minor thing. And between fall, 17 and spring, 2019, there are not major changes in all 4 semesters. The class has lots of active learning. There is a significant amount of lecture. But I also ask the students to answer questions using polling software throughout. There are usually 5 to 10 questions per class.

There are also small group activities in the class. And so most classes have one, sometimes 2, but usually one kind of 10 to 15 min small group activity. And it's either a case study where I'll give them kind of a research question and a bunch of tables that are generated using methods that they've learned, and they have to interpret the results and then critique the study and talk about what could be better about it? What could be worse, or the assumptions valid and invention activities. So these are activities where they try to figure something out. Before I've told them how to do it. So they're definitely not kind of deliberate practice. The idea is that these activities, they prime them for future learning. They don't usually know how to solve the problem. So I'll say we need a measure of But the classic invention activity is to show people data on people throwing baseballs at targets and come up with a measure of how good are the throws, and so they come up with average distance. And then you say, Well. how do we distinguish between data where the throws are often quite far away from the target and data where the throws are quite close to the middle of the target. And they basically have to invent variance. And so it's things like that. But in a, mostly a regression model context.

And then, finally, the class has a group project outside class where they come up with their own research question and work on it. So there's a lot of peer interaction in this class. There's a lot of inside the classroom and outside the classroom. Alright. So that's the class, by the way, I should say.

Feel free to interrupt me with with questions at any point. We've got a full hour. and I'm only planning to talk for maybe 30 min. And then the rest is all questions, but feel free to interrupt at any point.

Alright. we have quite a bit of data on these students from the 4 semesters. First, we have their final exam data. In all 4 of these semesters these were pre-COVID. And all the exams that they take are in person exams. The final exam is something where it was handed out and then collected and they never saw it again. And so we feel pretty good about giving that same exam year over year. So it's comparable. It's 25% of their course grade, which here is high stakes. I understand anything that's not 100% is low stakes in the UK. But here it's high stakes. And it's several, maybe 6 to 8, I can't remember exactly what the number was, multi-part questions that have short answers. Okay, so it's a handwritten exam. That we have a very detailed rubric. And so we actually, we had several of the exams regraded by other people, and then compared answers or compared scores. And they're very, very similar. So we feel really good about the reliability of the measure, even though it's not multiple choice. There's a lot of human capital that goes into the grading of these exams. It's a long exam. It's not open book and they don't get full access to their notes, but they do get to bring 3 pages double sided to the exam with anything they want. And that way, they don't have to necessarily memorize they, they're not focused on the memorization for the exam, because they can write down formulas. They don't necessarily want to remember.

Okay, during the last week of classes we also give a a test called the applied econometrics skills, assessments. It's low stakes in the sense that students get a little bit of extra credit for participating in for just taking the tests. It's multiple choice. So it's automatically graded. We invested a lot of energy in this test. It's really hard to write good multiple choice tests. It takes a long time. It's in person bubble sheets. Students have 45 min. Close book, close notes, and if anyone wants to like, look at the questions or see what the learning goals are that the test evaluates. They can go to https://econ-assessments.org/ . And anybody can actually set up an online version of this test for their students if they want to. And so all automated, and they'll get reports generated when your students are are done with the test.

So that's the other measure. These are the 2 measures we're going to use at the end of the term. For to test, to test learning. We have a couple of measures of student skills at the beginning of the term. We give a statistics assessment that's similar in form to the applied econometrics assessment. It's multiple choice. It's required. But the grade that students get on this test is not does not affect their grade for the course. But we found it's a pretty good measure of their statistics skills. And then we use GPA as a measure of like, how good a student is this? And we have their GPA at the beginning of the of the term.

Okay, we have for administrative data. We have demographics. We've got race gender class year, major. We have rental education. Should we? We can calculate whether or not this is a student who's the first in their family to go to college. And we have whether they're a native English speaker. Okay? And then, finally, the last bit of data we're gonna apply in this paper is a study survey that was given at the same time as the final exam. where we ask students to report on average, how many hours do you spend? Did you spend during the semester studying for this class. And then separately, how many hours did you spend specifically studying for this final exam? And we asked some questions about how they studied. Did you mostly study by yourself? Did you mostly study with friends?

And then we gave them a list of about 6 or 7 study strategies, and they could check which ones they used and which ones they didn't. And we kind of made some semi-arbitrary decisions about what we considered effective because we just wanted to distinguish the students that read the book and highlighted (not terribly effective) and the students that looked at old exams and then read the solutions (not very effective) versus the students that actually would do things, work through old questions themselves, create new problems. And we called those effective study strategies.

Okay, so that's so, that's the data we're gonna analyze. Alright. Now, are there gaps? Okay? So if we look at the final exam scores. We regress those final exam scores on a just a plain old dummy, variable for female. there's a there seems to be a gap, but it's not statistically significance at all. Okay, so average scores for male and female students not very different now, when we control for under represented minority. We see pretty big gap quite statistically significant. The URM students are definitely doing worse. Yeah, Carlos.

Carlos Cortinhas: Can you just explain the division you made along race? I suspect it's because of sample size. You want to keep the two. What was the reasoning between for adding the Asian students with the whites and you know, and everyone else; what was the reason? Asian students are really over-represented in economics courses in the US.

Douglas McKee: Especially at Cornell. We've got a very high Asian population and those students they tend to come from relatively well-off families and they tend to do quite well in the classes. And so I mean, under represented minorities. A fairly standard term here. So that's that's why I mean. We have run these analyses separating out the Asian students; they tend to do a little better than the white students. The other thing I didn't say here is all of our outcomes are standardized. So everything we're looking at here is in terms of standard deviations. And so we're looking at URM students are half a standard deviation below which is not great. It's not great at all. Okay.

But then we were really curious. If the gap differed for female and you're from if the URM gap was different for female and male students. Yes, Amira.

Amira Elasra: Can you give us an idea how big is the sub sample for the minority students?

Douglas McKee: Minority students are about 22% of the sample. So of the 300, about 50: somewhere around there. Actually, I have the numbers at the end. I have a whole slide about limitations of the study and and the sample size is a huge one. There aren't very many, but they're quite different from the from the other students.

Okay, so we're seeing a pretty big gap. Alright. So then, we want to know: is that gap different for male and female students? And what we see is that it's a pretty big difference. And so what this implies is that the URM gap for males is .34, and for females it's the sum of these two. It's about minus .95. So it's about a full standard deviation. Now, I'm sure you're wondering. Well, how many female URM students do you have in your sample? And the answer is 22. It's not very many at all, but those 22 students. It's not none. And we're seeing quite significant differences. Between those students and the and the other students. Okay.

So the next question is, why are we seeing these URM gaps. Why are these gaps different for male and female students some time? Before I show that. If we control for college year, first generation status. Whether you're an Econ major. Controlling for those doesn't affect the gender URM gaps at all. Okay, it's not that. URM students are more likely to be first-generation. And that's why we're seeing this. Okay, these. These are not not very different. When we control for those characteristics. We lose some statistical power, but not even all that much statistical power.

Alright. So then the question is why. We have 5 potential explanations here, and I'm sure with an audience this big there will be more that you can suggest. But I'm just going to lay out five here. So the first one is, maybe it's differences in incoming preparation. Maybe what's happening is the URM students are coming in and they have weaker math skills. Or they're they just don't have the kind of preparation, the preparation that the other students have okay? And we have decent measures of preparation. So we're gonna we're gonna look at that.

Another possibility is that it's stereotype threat. So stereotype threat is a theory from psychology that when a group is associated with negative stereotypes and they're in a high pressure situation that they're thinking "I'm representing my group". Maybe other people are right. I don't know. That causes stress and distraction. And it causes them to underperform. Okay? And so, even if they're not naturally going to perform any worse that high stress situation can definitely make them perform worse. Okay? And so we've got 2 different tests that we can look at. We've got a high pressure test and a low pressure test.

A third possibility is maybe that some students have different, more effective or study behaviors. Or maybe they just study more. And maybe that could explain. Some of this fourth is role model theory, which is: this is a class that's taught by a white guy and maybe that white guy is not very inspiring to the non-white-male students. We can't test this. Okay, we can't test this because we don't have any variation. But I'm not saying that it's not a possibility. All I'm saying is that we only have the one instructor and I didn't try to vary my gender or ethnicity between these classes. Okay?

And then, finally. it's possible that the students came in with different interest in the subject. And so if you're more interested in it. You invest more and we can't test this now, either. Okay. But the problem here is, we don't have any variation. And the problem here is. We don't have measures of intrinsic interest. But we will. Okay. So we are actually measuring. We are asking students about intrinsic interest in economics at the beginning of the term in classes since this time. And so if we redo this analysis with with more current data, we should be able to to answer this this question.

Okay? Alright. So let's look at preparation. Okay? So in this model, what we're doing is, we're adding GPA and our measure of statistical skills. So this is our stats skills. And this is general, now, school ability. Okay? And they're very predictive of what this is but they're very predictive of performance on the final exam. So each point on a 4 point scale of GPA that you have increases your final exam score by 1.23: hugely predictive.

And similarly, each standard deviation increase in your statistics skills at the beginning of the term is associated with a .23 increase in the standard deviation. Okay, so this is at the same time not surprising and a little disturbing. Okay it would be really great if everybody walked in, and the course was structured such that it provided support at the bottom such that everybody had an equal chance of succeeding in the course. And we don't see that.

I find it peculiar that Econ major actually do worse. It's not that surprising, right? At least in this context. And that's because the people that are Econ majors it's required for all of them and the people that are not Econ majors really want to be there? Self selection. It's their totally self selected group.

Alright so. But for this paper the more interesting piece is over here which is what happens to the male URM Gap, and what happens to the female URM Gap. And so we go from a negative, precisely estimated gap. When we control for other demographics. That male gap goes down a little and it loses what's happened with the male URM effect is, it's gone from being negative to actually now being positive and not significant at all. So it does seem that controlling for prior preparation entirely eliminates the male URM Gap.

Okay, that's what we're seeing in this course. In this data the female URM Gap, on the other hand, remember, this is only the difference between the gap for males and females. The female gap has gone from the sum of these 2. Okay, which is minus .9 to the sum of these 2, which is minus about .5. So it's smaller. But it's still pretty big and statistically significant. Okay, so it explains part of the female URM Gap, but not all of it.

So our next try is to look for evidence of stereotype threat. And what we're doing there is, we're going to say, well, do we see these same gaps when we look at the low stakes assessments?

Douglas McKee: Yes, Amira.

Amira Elasra: Can you go back to the previous slide just to look at your female coefficient here, and I see, it's becoming bigger when you include all these controls. So would you...  so when we go from right here? So are you saying between 3 and 4, or are you saying between 4 and 5? Between? Well, yeah, 4 and 5, but obviously between 3 and 5 as well.

Douglas McKee: So the remote that you're the URM Gap for female. The coefficient is telling you the difference between the male URM Gap and the female URM Gap and the female URM gap itself is actually the sum of those two. And so if we add those two, it goes from minus 9 to minus 5. So the difference between the gaps is getting a little bit bigger but the gap itself is getting smaller.

Amira Elasra: Okay.

Douglas McKee: And so the next step is to say, Well, all does all of this have? This is a high, very high stakes test. Does all of this happen with the low stakes test? And we we find with the low stakes test is, there's a moderately significant URM effects. And that goes away entirely once we control for prior preparation. So that URM effect. We're actually seeing once you control for prior preparation on the low stakes assessments, the male URM students are doing better. And the effect for women. Okay? So the female effect is now plus .03, which is almost nothing.

Alright now these are the stakes. The only difference between these 2 tests. Now. there are other differences between the 2 tests. We believe that the most important difference is the stakes. but the fact is the low-stakes test is also shorter and it's given earlier in the semester. So it's given before students do their serious studying for the final exam. Okay? And so you could argue that, maybe that's what's actually driving. It is the timing of the test or the link for the test? We don't think that's true. but it could be true. Alright.

The third explanation, that we're going to test, is study behavior. And so what we wanted to see is how do gender and race predict study behavior? And so the really interesting thing that comes out of this is, when we look at regular study hours and we regress them on female, URM, and our same interaction, the main finding is that the female students study more. Okay. Anyone that's ever been in a classroom and looked around is not gonna come as a surprise. It's also true that they studied a lot more for the final exam.

We didn't observe much in the way of a gender gap, even though the female students are studying more. Why aren't they doing better than the male students? And that's a great question, and I don't know the answer to that. It tells me that there's still something going on. But when we look at gaps for URM students and specifically gaps for male versus female URM students, we don't see really any difference to speak of.

Ashley Lait: I just wonder whether you collected any information on if anyone's doing part time work around the study.

Douglas McKee: So this is Cornell. Very few students are doing part time work and the students that are are doing it through their financial aid. It would be nice to have that measure. We don't have it but I'm not sure how I would interpret it.

Alright, those are all the numbers I want to show you. So to summarize, we're seeing large, significant gaps for male URM students relative to non URM, male students and among female students. URM and female students are doing significantly worse than non-URM female students. It seems that prior preparation can explain the whole gap for male students. but it only explains about half the gap for female students. And it does seem that we have evidence consistent with stereotype threat because we don't see those same gaps when we look at a low stakes test given at the end of the semester. And I should also say that if you look at correlation between the two exams, it's very high. Okay. So they're both measuring very similar things, but at different times and in different contexts.

Carlos Cortinhas: I'm just wondering the entry criteria for this URM students? Are they more or less the same as Non URM, or did they get bonuses? Let's say, and get slightly lower grades entry.

Douglas McKee: So that's a really loaded question. So the answer is, I believe. but I'm not speaking as a representative of the University but I believe that the admission standards are different, or URM and non URM students. I will also say, as a representative of the University, that Cornell is abiding by the recent Supreme Court decision of just a few weeks ago. That says that we are not allowed to condition admission separately. Based on race. So we are. We are abiding by the law. But yeah, I do think that in practice that preparation of the students is pretty different.

Carlos Cortinhas: Now, I don't want to catch you out or anything. Obviously, we've got something similar here in in England. We've got targets to meet but my question was more: Sometimes students from those backgrounds get additional help. Once to ensure that the level comes up, and I was just wondering if they get get get access to additional support.

Douglas McKee: So that's a great question, too. So the answer is, yes, we do have several programs at Cornell that provide extra support for underrepresented groups. We have no quantitative evidence of effectiveness. although I have a paper with George Orlov where we evaluate a supplemental course. So one credit course that you can take alongside of our introductory micro economics course. That will give you extra support in problem solving and math. And that's mostly taken by under represented minority students, although not entirely. And it actually does seem fairly effective.

So maybe these gaps would be even bigger. Otherwise we don't know. And then study behavior: there are differences, but the differences don't explain the gaps alright. So what's the problem with this study? Well, first of all sample size is small. Well, we don't have to dwell on limitations. Yeah. Amira.

Amira Elasra: you mentioned in one of your possible mechanism or explanation the effective role model, although you only have one feature, really. So how are you going to capture that? Exactly.

Douglas McKee: Oh, we're not in this paper?

Amira Elasra: I'm just saying it's a possible explanation.

Douglas McKee: But we can't evaluate it here. I mean, I've thought about trying to dress differently for class, and seeing if that made a difference. But I don't think that approach would work very well. Would students evaluations, for example, be used in a context of how they benefit from a type of instruction, type format, or how well they perceive the tutor to have prepared. So I think there are 2 problems with using course evaluation data to get at it. So one is, I think it's hard to code. You'd have to come up with like pretty reliable, repeatable rules for how to code what people said as kind of how the gender or race of the instructor or their teaching assistants affected behavior. And then you'd also have to trust that people were explicitly aware of what was going on. And my feeling is that a lot of what happens is implicit. So it's not that people get up, they go to class, and they say "that's a white male. I'm I feel consciously not inspired." But if I wasn't, if I was a black female and I was about, and and then there was a black female students, they might be more inspired, but they not recognize that was the issue.

I mean, what I would much rather see is variation in instructor, race and gender and then see if, in fact, these gaps go away. In response now anecdotally, and the data that I have looked at that I have not published on, my feeling is that those effects are small. What matters much more is, how is being the behaviors of the faculty? How inclusive the fact, what like how inclusive the classroom is and what the I actually think this is really important, which is, what are the substantive examples that are used during the teaching. So if I am a female faculty member of color and I stand up and I'm teaching econometrics, and all my data comes from the stock market, or mergers and acquisitions of aircraft, manufacturing firms, I don't think that's going to be terribly inspiring. Whereas I think a white male instructor who talks about a wide range of substantive examples, including examples from healthcare examples from development economics, examples from labor economics. I think that's gonna make a much bigger difference than race gender of instructor. I have a little bit of data to back that up, but it hasn't gone through peer review.

So I'm going to go back to limitations. So the big one is the sample size too small. Okay, there's only so much we can say when we only have 300 students and honestly, this is the biggest red flag on the whole study, because we've only got 22 here. It's also really hard to generalize. Got the one institution, the one instructor, the one way of teaching the class, the one content. Okay, everything we're observing here might only happen in this context. We don't know.

So how do we fix it? To collect more data? We need lots of context. We want to do this at like lots of institutions with lots of instructors. And we want lots of students. Okay, and then we want to redo the analysis. Now. if we do that, we get nice, precise, generalizable results and we can see how the institutional characteristics and the pedagogy characteristics are correlated with these gaps. Now, this criticism and this solution could apply to, I would say, more than half the empirical papers published in economic education. Probably much more than half, actually, and so we need like some way to get past this. And so here's where I say I have a way.

So over the summer some colleagues at Cornell and outside Cornell have decided to build something called the Economic Education Network for Experiments which we call EENE . It's a set of instructors all over the world that cooperate to run synchronized experiments in the classroom. Right? They can be OP. They can be actual randomized experiments or observational studies. We agree on research questions like, I wonder if the gender gaps we observe are different between male and female students. I wonder why they exist. We come up with study protocols. So maybe we want to know how important is kind of having a speech at the beginning of the term, where you lay out explicitly that there are no wrong answers in class. No one should be afraid of giving a wrong answer and having behaviors that encourage student participation. How do those actually affect behavior? We're going to standardize treatments and protocols. And then we're gonna collect comparable data. So we're gonna have, we're gonna agree on measures that actually can be used in lots of different contexts. We're gonna pull our samples and analyze the data. We've already we've applied for National Science Foundation funding as of literally as of yesterday. We apply for National Science foundation funding for this for building this network and running our first study.

If you want to participate in this, you can join right here (https://eene.org/). There's a link to where you can express interest. Tell you right now. It's gonna be hard to get rid of us once you're in and once we actually have a kind of a true web portal off the ground, you'll be able to propose studies. Look at ongoing studies. Have discussions about this stuff. Tell us what you're teaching and what you are interested in and get matched up with appropriate studies. Okay, I'm happy to talk. I can talk for a long time about this and why it's important and how we're actually going about it but I see it as a treatment for this, and we all have this problem.

So anyway, I'll stop there. We still have, I guess, 15 min for questions, and so happy to to take any questions you might have about any of this.

Carlos Cortinhas: Well, I've got the obvious one. You know, the policy implications. I mean, what can be done to improve this? I think the result is probably similar in other places. But what can be done?

Douglas McKee: Well, I mean, I think there are two. The big one: so I think prior preparation is really important. And so anything you can do to have extra support early in the semester for those students that are less prepared is gonna help them. That in itself has two parts. First, you have to identify who those students are and after their first midterm exam is generally too late. We give this math test at the very beginning of the semester, and I think even here we probably don't do enough with that with that data. But then reaching out specifically to the students that do badly on that test and giving them access to online modules, telling them exactly where the tutoring is, having a lot of office hours and really pushing them to come to office hours and ask questions. I think that's a big part of it.

The other piece is the stereotype threat and that, I think, is hard. We don't have a lot of evidence for treatments that work. I mean, reducing stakes. You can spread it out. I mean, there's there's actually a fair amount of research that says it's better to just give grades based on low stakes assessments that are given throughout the semester. Some institutions you have flexibility to do that. Others you don't. And I think, but I think that's probably the big one. It's actually reducing the stakes of individual exams.

Carlos Cortinhas: okay, thank you.

Tomek Kopczewski: Hello. A question for you. Because you you were talking about the correlation. And I wonder there's a possibility to conduct this causal inference procedures, because I am going to publish a paper about the foreign languages and the influence on wages. And this approach is, in my opinion, very useful to your data, I suppose. Because it's a mixture of experimental and observation type of data.

Douglas McKee: So I think causal effects of race and gender, even just philosophically the're hard. Because when when I talk to people who study causal effects, they say, Well, what is a causal effect? It's "if I ran the experiment where I changed this variable, what would happen?" And it's hard to even know what it means to change someone's gender or race. On the other hand, I think experiments are critical for understanding the impact of treatments. So observationally different people do different things in their classrooms. Some people have really inclusive classrooms. Other people don't. But those classrooms, they differ on lots of other dimensions that are unmeasured. And so that's a case where, if you can randomize who does what, it takes care of all of this: these unobserved differences that might be correlated with who's using the particular method. So I think these observational studies, they can only get you so far.

I'm really curious about the influence of - So you're saying foreign languages - So this is students whose native language is not English. But you're teaching in English?

Tomek Kopczewski: Yes, yes.

Douglas McKee: I mean, I think you can randomize kind of support programs for those people. But I think you're stuck with observational studies if you want to know about differences between student outcomes for people that do and do not have English as a first language. So I'm really curious how you go about it.

Tomek Kopczewski: I have to say it's not about the student, it was the question about wages, and how one or two foreign languages influence wages, and and the it was my first attempt to go to to to use this perl approach and this graph. And it's quite good how to say a way to how to say, to find this, this possible source of disturbance, and first of all help with like this instrumental variables approach.

Douglas McKee: So it is true that, in this exact applied econometrics class that I talked about today, most of the class is actually about how to get causal effects out of observational data. And there are lot. There are lots of methods and instrumental variables are one. And it's actually, I think. one of the big reasons why economists have so much to contribute to education research is because we have this really relevant toolbox. And so there are instruments for things like foreign languages that we can take advantage of. I think there's a lot of scope, for when you, when you can't randomize something, you can randomize encouragements to do something and then use instrumental variables methods to get the causal effect of that thing that you're encouraging people to do.

So I've tried to do this with office hours attendance where I want people to come to office hours. I want to know what the effect of office hours is on student outcomes. But I can't really randomize. I can't randomize and say, you have to go to office hours, and you can't go to office hours, but I can randomize encouragements to go to office hours and then back out the causal effects. So things like that. I think there's a lot of scope for interesting studies by economists about using those techniques.

Amira Elasra: Actually, I was going to talk about the attendance element you just mentioned, and also whether you have maybe information in your model about class attendance or lecture attendance by both genders and the minority and how you could maybe use that to predict and explain, well, yes, female students study a lot, but is their engagement and attendance in the classes where they prepare for exam is as high as other students. The other question is, I don't know what is going on in Cornell. But do you have, for example, contextual offers where you give students from certain social accounting background - offers to admission - even though they don't meet the let's say the academic or GPA criteria for entering that particular course. Is that happening in your school? Because then this could explain a little bit more on the prior preparation element, because you include the GPA. But is there another layer of that, because some of the students are actually coming from certain backgrounds, and they have lower achievement entering to university. And so that increases the gap.

Douglas McKee: I can't make any statements about admissions policies that Cornell has had in the past, or will have in the future. Okay, but I believe I think it would be reasonable for someone to believe that it's exactly what you describe: that the admission standards are lower for underrepresented students. I will also say that that's not legal anymore since the Supreme Court decision from a few weeks ago. And so that will change.

Now, maybe there's a scope for like exploiting this change in admissions policy moving forward. That would be pretty interesting. But I also think that admissions is really complicated. And now there's going to be a bigger focus on diversity statements that students submit. So maybe in the end the same students might end up getting admitted but for on paper quite different reasons. It's hard to know.

Now, going back to your first question about attendance. We don't actually have good attendance data for these, the students in these specific classes in these specific semesters. But since then we've gathered a lot of data about attendance. And what we know is that female students are way more likely to attend than male students, but that underrepresented minority students attend less than non underrepresented minority students, and that can very well be part of the problem. Yeah.

Well, thank you all for for coming, really appreciate it. It was great to have this opportunity to share this this paper. And I really appreciate International Review of Economic education for doing this and for just doing it all. There's great work in IREE. Many pertinent questions! Yeah, much appreciated.

David McCausland: Alright, thank you. Thank you very much and take care.

Contributor profiles