Some initial lessons from using ChatGPT and what I will tell my Macroeconomics students
In November 2022, OpenAI launched ChatGPT, an artificial intelligence chatbot that allows users to have a human-like conversation. The chatbot answers questions and can help with a series of tasks such as composing emails, checking code, but also writing essays and helping students with homework. This created a big splash in the academic community and universities are increasingly expressing their concerns about the impact of the technology on academic integrity (although many are also looking at how to make use of the tool to facilitate learning).
Given my interest in assessments, I recently spent some time interacting with ChatGPT, trying to understand how to use it, how this may affect my assessments, and thinking on how this new tool can impact students’ engagement with Macroeconomics. I started by asking ChatGPT various questions based on the module content for Macroeconomics 1, which I teach to Year 1 single honours economics students. I considered how I would mark the answers given, if they were submitted by a student. I then consider how to explain this to students, the potential uses and limitations of the tool.
While I do not claim to be an expert on this matter and I am still going through my own learning journey, here I share some of my initial learning points in the hope that you can too start your journey with this technology.
1. GPT is very good at providing simple answers to simple questions
You need to register to use ChatGPT. Currently, it is free of charge, although it is not clear if this is going to be the case after the research and feedback collection phase. Once registered, I started asking various general questions, such as: What is GDP? What is inflation? and more general, basic questions on Macroeconomic topics. It is very useful if you want to access a quick explanation of a concept. For instance, I asked:
Question 1: What is GDP?
GPT Answer 1:
Gross domestic product (GDP) is a measure of the economic activity within a country, typically used to estimate the size and growth of its economy. It represents the value of all goods and services produced within a country in a given period of time, usually a year. It is often considered as an indicator of a country's standard of living and economic health.
I followed up with:
Question 2: How can we measure potential GDP and output gap?
GPT Answer 2:
Potential GDP is the level of GDP that would be achieved if all resources, including labor and capital, were being fully utilized. The output gap is the difference between actual GDP and potential GDP.
Measuring potential GDP is difficult, as it requires estimates of the natural rate of unemployment (the rate of unemployment that would exist if the economy were at potential GDP) and the level of output that corresponds to that natural rate of unemployment.
One way to estimate potential GDP is to use a statistical model called a production function, which describes the relationship between inputs (labor and capital) and output (GDP). By estimating the production function and assuming that the economy is at its natural rate of unemployment, we can estimate potential GDP.
Another way to estimate potential GDP is to use a trend-line analysis. This means drawing a line through the historical data of GDP and assuming that the economy will grow along that trend line in the future.
The output gap is calculated by taking the difference between actual GDP and potential GDP. A positive output gap means that the economy is operating above potential GDP and may be in danger of inflation. A negative output gap means that the economy is operating below potential GDP and may be in danger of recession.
It is important to note that these measures are estimates and are subject to uncertainty and revision.
I was satisfied with the answers in the sense that, if I asked these questions in an exam, and I got this answer, I would believe this was written by a student and give a good mark. In general, any question regarding definitions and based on simply recalling information can easily be answered by the chatbot, and it would be very difficult for me to recognise this was not written by an actual human being.
Therefore, asking questions that merely require students to recall basic information, such as Question 1 or 2, may not be good assessment questions. However, even without Artificial Intelligence (AI), students can easily find this information with a quick internet search, or by opening any macroeconomics book if they are sitting a remote examination. I would add that asking this type of questions even in an invigilated exam is not a good idea. These questions encourage students to memorise information for the exam, rather than aiming to fully understand the concepts and apply knowledge. If we want students to engage with the subject, we need to create assessments that require more than memorisation of basic concepts.
2. Ask questions related to contemporary issues
Continuing my conversation with chatbot, I start asking more specific questions from the material in my syllabus. For instance, I asked:
Question 3: What do you expect to be output gap in the next years in the UK and the Euro area? Explain your assumptions.
GPT Answer 3:
As a language model, I do not have the ability to predict future events or make assumptions about the economy. However, I can tell you that the output gap is affected by a variety of factors, such as economic growth, unemployment, and inflation, and can change over time.
In the UK, the outlook for the economy depends on a range of factors, including the ongoing COVID-19 pandemic and its impact on the economy, the terms of the UK's exit from the European Union, and broader global economic conditions.
In the Euro area, the economic outlook is also uncertain. The COVID-19 pandemic has had a significant impact on the economy, with many countries experiencing sharp contractions in GDP. The recovery has been uneven across the region, with some countries experiencing a faster rebound than others. Additionally, the Euro area faces ongoing challenges such as high unemployment, slow productivity growth and public debt.
I recommend following the developments and analysis of the leading Economic Institutions and experts, they will provide more updated and accurate predictions and assumptions.
Here we start observing some limitations of the chatbot. Lack of ability to use the information available and create an educated opinion (as we can read in the first lines of the answer). The rest of the answer shows a good ability to put together information on general aspects, however, I asked this question in January 2023, when we had far more information about the efficacy vaccine roll-out in various countries, and the outcome of the Brexit negotiations, and therefore we could say more about this. I would have expected any student to be able to elaborate more on these points. The reason for this is that the chatbot’s ‘knowledge’ cut off is some time in 2021, as stated among the limitations when opening ChatGPT (see Figure 1). Of course, we can all expect this to be updated soon, but for now, it is a limitation. If I had to mark this answer, I would set for a mid 50s (lower if this were a Year 2 or 3 module), and the feedback will emphasise that the answer did not fully address the question asked, there are also no sources cited, but we will come back to this in the next point.
Figure 1: Opening screen of ChatGPT
Notice that another recognised limitation is that it may generate incorrect information. On very few occasions, the answers provided to me were incorrect. For instance, when talking about the 3-equation model (more about this below), in one of the iterations mentioned that there was no graphical representation for it, which is incorrect. This was corrected after re-asking the question, but someone less familiar with the topic may not be able to discern incorrect information (e.g., a student who has not studied enough for the assessment).
3. Make what I teach relevant for the assessment!
I have previously commented on the importance of ‘Google-ing’ your questions when setting up an assessment, especially if this is to be taken remotely (Paredes Fuentes, 2020). I was quite interested in checking how the chatbot will answer my exam questions. I asked ChatGPT to answer questions from a past exam.
Question 4 is a short-answer question, marked out of 10, from a past exam. I usually provide students with a word limit for each question. As this exam is taken remotely, students can consult textbooks, lecture material and the internet more generally. The word limit drives students to be selective with the information they want to include and minimises merely copying and pasting from other resources. More prepared students are likely to select better material for a more focused and relevant answer.
Question 4 (from a real exam question): At the monetary policy meeting in February 2021, the central bank of Mexico cut its policy rate for the first time since September in an effort to stimulate the economy. However, after the United States’ government announced in March 2021 the introduction of a new financial stimulus package to boost the US economy, the Governor of the Mexican central bank announced that the board would consider the impact of the US fiscal stimulus in the Mexican economy, and whether it is necessary to change their monetary response. Explain how a US fiscal stimulus may affect the Mexican economy and why the central bank may respond to it in less than 100 words.
GPT Answer 4:
A US fiscal stimulus, such as a large-scale spending package or tax cuts, can affect the Mexican economy in several ways. It can lead to increased demand for Mexican exports, as US consumers have more money to spend. This in turn can lead to increased economic activity and job creation in Mexico. However, a US stimulus can also lead to inflationary pressures in Mexico, as the demand for goods and services rises, pushing up prices. The central bank may respond by adjusting its monetary policy, such as raising interest rates, to keep inflation in check.
One thing to notice is that the answer is 129 words. I followed up requesting to shorten it to 100 words to which the chatbot replied: “I apologize for not being able to keep my answer under 100 words, I will try to be more concise in the future. To summarize, a US fiscal stimulus can increase demand for Mexican exports and create jobs, but it can also cause inflationary pressures in Mexico. The central bank may respond by adjusting its monetary policy, such as raising interest rates, to keep inflation in check.”
Both answers are satisfactory enough. If a student submitted this, I would appreciate the effort to explain the various channels through which changes in monetary policy in a main trade partner affect consumption and potentially impact monetary policy decisions in the domestic economies. The answer, however, is very general and does not take into account the specific time period in which this decision was made (February 2021). A very good student would have at least mentioned this. Notwithstanding this, this answer could potentially be awarded a good mark in the low/mid 60s, as we consider that students have limited time to answer it. Now that the chatbot has made this task extremely easy, we may need to reconsider the design of these type of questions for future assessments.
Question 5 is a long answer question from a past exam. Again, we asked the chatbot to keep within a limit: this time 300 words. However, the answer given is 373 words; this will be considered well above the word limit and penalised, but as before, we can ask to reduce the words. Obviating this point, the answer is very general and, again, does not take into account the context provided. During the course, we work on various exercises that require students to make use of the 3-equation model to explain contemporary macroeconomic events and policy responses. They are exposed to these exercises through the weekly problem sets discussed in small groups. Therefore, all students engaged with the module should be aware of what is required to answer questions like Question 5. Students who use this answer in their assessment would not receive a good mark. Engaging with the lectures and problem sets provided during the course is essential for providing a good answer.
Question 5 (adapted from a real exam question): In the Financial Times’ article “Fed signals no rate rise until at least 2024 despite growth upgrade” (17 March 2021), we can read:
“Federal Reserve officials signalled that they expect to keep interest rates close to zero until at least 2024, even as they sharply upgraded their US growth forecasts because of a massive fiscal stimulus and an accelerating vaccine rollout.
[...] The upgrades to the forecasts from Fed officials were significant: whereas in December they predicted 4.2 per cent growth this year, that estimate was bumped up to 6.5 per cent, which would be the fastest economic expansion since 1984.
Meanwhile, the unemployment rate is now forecast to fall to 4.5 per cent by the end of the year instead of 5 per cent. [...] inflation [...], is expected to rise to 2.2 per cent and above the central bank’s 2 per cent target, compared with the smaller rise to 1.8 per cent predicted in December.
[...] The Fed has pledged to maintain rock-bottom interest rates until the US reaches full employment, with inflation hitting 2 per cent and being on track to exceed that target.”
Represent the US economy in a three-equation model and explain, in not more than 300 words, how the fiscal stimulus affects the economy, the potential multiplier effect of this stimulus considering the US current situation, and the potential limitations (if any) of the US policy mix adopted.
GPT Answer 5:
A three-equation model to represent the US economy could include equations for GDP, inflation, and unemployment. The GDP equation would represent the relationship between spending (consumption, investment, government, and net exports) and GDP. The inflation equation would represent the relationship between inflation and various factors such as GDP growth, unemployment, and monetary policy. The unemployment equation would represent the relationship between unemployment and GDP growth, labor force participation, and other factors.
A fiscal stimulus, such as the one announced by the US government in March 2021, would likely have a positive impact on the economy. The stimulus package would increase government spending and consumer spending, which would in turn lead to increased economic activity and job creation. This would be reflected in the GDP equation as an increase in GDP and a decrease in unemployment.
The potential multiplier effect of the fiscal stimulus would depend on a number of factors, including the current state of the economy. In the US, the economy was recovering from the impacts of the COVID-19 pandemic, so the stimulus may have a larger multiplier effect than it would have if the economy were already close to full capacity. The multiplier effect refers to the effect of the stimulus on GDP. For example, an increase in government spending could lead to an increase in consumer spending, which in turn could lead to an increase in business investment and so on. This chain reaction could lead to an increase in GDP that is greater than the initial increase in government spending.
However, there are also potential limitations to the US policy mix adopted. One of the main limitations is that fiscal stimulus may cause inflationary pressures. As the Fed has forecasted, the inflation is expected to rise to 2.2 per cent and above the central bank’s 2 per cent target. The Fed has pledged to maintain rock-bottom interest rates until the US reaches full employment, with inflation hitting 2 per cent and being on track to exceed that target. Therefore, the Fed may need to raise interest rates to combat inflation, which could slow down economic growth and increase unemployment. Additionally, the increase in government debt that results from the stimulus may also have long-term negative effects on the economy.
I tried various ways to re-write the question and try to make sure that the chatbot considers the context provided, but none of the answers I got was particularly satisfactory. We use the 3-equation model from Carlin and Soskice, but the answer provided does not consider this in detail, and of course does not provide the graphical representation. In fact, more complex or niche prompts seem to lower the quality of answers. More in general, there are problems with ChatGPT not recognising its own limitations of knowledge when providing answers, and even making up some answers (see Bowman, 2022).
For questions similar to Question 5, students must provide sources for any cited work, but the chatbot is not able to provide sources for their answers (see Question 6). This is one of the main limitations of this tool. ChatGPT does not go on the internet and look at potential sources of information and summarises for you (I initially thought this was the case), but its knowledge is based on the model conversations provided by AI trainers and does not include external sources. Of course, this is on development, and we need to wait to see what happens in the future.
Question 6: can you provide sources for your answers?
GPT Answer 6:
I apologize, but as a language model, I don't have access to external sources. My answers are based on the information and knowledge that I was trained on and don't include external sources. However, the concepts and models that I've mentioned in my answers such as the IS-LM model, the Phillips curve, and the 3-equation model are well-established in macroeconomics. You can find more information on these models by consulting macroeconomics textbooks, academic journals, and online resources such as the websites of central banks and government statistical agencies.
In fact, Williams (2023) identifies current affairs, referencing, and critical analysis as the three main limitations of ChatGPT and invites instructors to include these in any essay prompt for the assessment to remain robust and reliable.
I would argue that, when teaching (macro)economics, focus on how the module content helps to understand current affairs, engage students in critical analysis, and be able to provide adequate and correct references should be at the core of setting assessments even before the advent of ChatGPT or similar technologies. Of course, now these have to become a priority for our assessments (see also Stokel-Walker, 2022). After all, if we are setting assessments that can be easily completed to a very high standard by AI technology, we may not be doing a great job as educators (Sharples, 2022).
I am not naïve to believe that none of my students are going to use such technologies and try to make their way through Macroeconomics without putting enough effort, and perhaps engaging on poor academic practice. After all, essay mills exist and are a very lucrative business, and there are whole departments at universities dealing with plagiarism and academic misconduct. Nonetheless, I do believe that our teaching practice and assessment methods need to evolve and keep up with the world we live in. Artificial Intelligence tools are not going to disappear, and they are going to change the way we learn (and hopefully teach).
Question 7: Can you provide the sources for this answer?
GPT Answer 7:
ChatGPT is a language model that utilizes machine learning techniques to generate human-like text. Essentially, it is a type of artificial intelligence (AI) that is trained on a large dataset of text. During the training process, the model learns the patterns and structures of language, and is then able to generate new text based on the patterns it has learned.
One way to explain ChatGPT to undergraduate students would be to use the analogy of a student learning a new language. Just like a student, ChatGPT is exposed to a large amount of text in the language it is being trained on, and begins to understand the grammar, vocabulary, and structure of the language. As the student continues to learn and practice, they become better at understanding and producing the language. Similarly, as ChatGPT is exposed to more and more text, it becomes better at understanding and producing text in the language it has been trained on.
Another way to explain ChatGPT would be to break it down into two main components: the encoder and the decoder. The encoder takes the input text and converts it into a numerical representation that the model can understand. This numerical representation is then passed to the decoder, which uses it to generate the output text.
It's important to note that ChatGPT is a statistical model, it is not capable of understanding text like a human does. It lacks of common sense, world knowledge, and the ability to infer context. It works based on patterns and statistics, it is not able to make decisions or have personal opinions, it's only able to generate text based on the patterns it has learned during the training process.
4. A conversation with students about ChatGPT
Rather than “banning” the use of ChatGPT (as if we could actually do that; it would be like trying to ban Google!), let’s engage with a conversation with students regarding the limitations of this technology but also on ways to use it. Starting may feel daunting at first. How to explain how ChatGPT works to undergraduate students? Wait… let’s make use of ChatGPT to help us!
Question 8: How to start a conversation with university students about the use of ChatGPT?
GPT Answer 8: Here are a few conversation starters that you can use to introduce the topic of ChatGPT to university students:
"Have you heard of ChatGPT or other language models like it? What are your thoughts on the use of these technologies in various fields?"
"What do you think are the potential benefits and drawbacks of using language models like ChatGPT in writing and research?"
"How do you think ChatGPT and other language models will impact the future of industries such as finance, healthcare, and customer service?"
"What ethical considerations should be taken into account when using language models like ChatGPT?"
"Have you ever used a language model like ChatGPT for a project or assignment? Can you share your experience and the results you achieved?"
By starting with these open-ended questions, you can encourage students to share their thoughts and ideas on the topic. It will be a good way to gauge the level of knowledge and interest of the students in the topic and to adjust the conversation accordingly.
I like the idea to start with open-ended questions, but we should be prepared to go further and just explain what this is. Sometimes we assume that students know everything about new technologies, but this is so new, some may not have necessarily heard about it or used it. It may be good to demonstrate how this works. Question 9 asks to the chatbot to explain how it works (there is almost no difference in the answer if you change the question to omit ‘intelligent’ from it).
Question 9: How do you explain ChatGPT to very intelligent students at university?
GPT Answer 9: ChatGPT is a state-of-the-art language model developed by OpenAI. It is a deep learning model that uses a neural network architecture known as a transformer to generate human-like text. The model is trained on a large dataset of text and uses that knowledge to generate text that is similar to the input it has seen during its training.
ChatGPT is particularly useful for tasks that require natural language understanding and generation, such as language translation, text summarization, and question answering. Additionally, it can be fine-tuned for specific use-cases such as language-based customer service, content generation, and automated journalism.
One of the key advantages of ChatGPT over other language models is its ability to generate long-form text, such as articles or essays, and its ability to generate text that is contextually relevant.
To a very intelligent student, you can explain that ChatGPT is a powerful tool that can be used to automate a wide range of language-based tasks and have the potential to improve efficiency and accuracy in many industries. However, it is important to remember that ChatGPT is a machine learning model and its performance is only as good as the data it was trained on, and it may produce biased or incorrect results if the training data is not diverse and representative enough. Therefore, it is important to use ChatGPT and other similar models with caution and to interpret the results critically.
In the answer to question 9, it is interesting to read the list of tasks that can be completed with ChatGPT, and perhaps worth diving on some of these thinking how the diversity of students in your classroom can benefit from the different functionalities. For instance, students with English as not the first language can use the functionalities of the chatbot to improve their language skills, check the grammar, vocabulary and punctuation of their writing.
However, in any conversation with students, we need to explain the many limitations of the tool. I asked ChatGPT about this:
Question 10: What are the main limitations of ChatGPT?
GPT Answer 10: ChatGPT, like other language models, has several limitations. Some of the main limitations include:
1. Bias: ChatGPT, like other language models, is trained on a large dataset of text, which may contain biases. This can lead to the model generating biased or unfair text.
2. Lack of common sense: ChatGPT is trained to generate text that is similar to the input it has seen during training, it lacks common sense and may not understand the context and meaning of the text it generates.
3. Lack of understanding: ChatGPT can generate text that is grammatically correct but doesn't make sense semantically or doesn't capture the intended meaning.
4. Data quality: The quality of the data that ChatGPT is trained on is crucial for its performance. If the data is not diverse and representative, the model may produce biased or incorrect results.
5. Privacy and security: ChatGPT may not be able to protect the privacy of the data it's trained on and the data it generates, which may pose a security risk.
6. Lack of creativity: ChatGPT can generate text that is similar to the input it has seen during training. It can not generate truly creative text without human input or guidance.
7. Requires large computational resources: Training and using large language models like ChatGPT requires a significant amount of computational resources such as memory and processing power, which can be expensive.
It is important to keep these limitations in mind when using ChatGPT and other similar models, and to interpret the results critically. It is also important to use the model in a responsible and ethical manner, taking into account the potential impact on individuals and society as a whole.
Some of the limitations are very worrying, and the potential harm of ill-use of AI technology goes well beyond academic integrity. Spread of misinformation, privacy and security concerns, ethical considerations are all real issues that should be discussed with students, not only in economics.
Finally, any conversation cannot be merely based on ChatGPT answers! If you want to learn more about how GPT works, look at Hughes (2023) and the developing literature in this area. You may want to investigate what others are already doing to engage their classrooms in a constructive way. For instance, Sharples (2022) explains how AI can offer a very quick way to show students different ways to express their ideas. Instructors can use answers generated by ChatGPT as a tool to critical analyse the answer and encourage writing better versions.
Summing up, ChatGPT (or similar) is not going away and it may become as used as a research engine. Despite the current serious limitations of this technology, academia cannot avoid engaging with it, and teaching and assessments must adapt. Trying to explore the limitations of our current assessments, and investigate how this can be used to increase engagement with the module and course material, and how students can benefit from using the chatbot, can be good starting points.
Bowman, E. (2022) "A new AI chatbot might do your homework for you. But it’s still not an A+ student." NPR, 19 Dec 2022. Available at: https://www.npr.org/2022/12/19/1143912956/chatgpt-ai-chatbot-homework-academia last accessed on 20 Jan 2023.
Hughes, A. (2023) "ChatGPT: Everything you need to know about OpenAI’s GPT-3 tool." BBC Science Focus, Available at: https://www.sciencefocus.com/future-technology/gpt-3/, last accessed on 20 Jan 2023.
Paredes-Fuentes, S (2020) "Assessments in the Brave New World: A Reflection on 24/48h Assignments", Economics Network Ideas Bank, Sep 2020. Available at: https://doi.org/10.53593/n3326a
Stokel-Walker, C. (2022) "AI bot ChatGPT writes smart essays – should professors worry?" Nature, 9 Dec 2022. Available at: https://www.nature.com/articles/d41586-022-04397-7, last accessed on 20 Jan 2023. https://doi.org/10.1038/d41586-022-04397-7
Williams, C. (2023) "Hype, or the future of teaching and learning? 3 Limits to AI’s ability to write student essays." LSE Impact Blog, 12 Jan 2023, Available at: https://blogs.lse.ac.uk/impactofsocialsciences/2023/01/12/hype-or-the-future-of-teaching-and-learning-3-limits-to-ais-ability-to-write-students-essays/, last accessed on 20 Jan 2023.
Handbook chapter on Plagiarism in Higher Education