- Suppose that you are interested in determining the effect of number of hours worked on earnings. The Sheet “Question 1” on dataset “Midterm Data.xlsx” consists of the earnings, working hours, and productivity of 40 workers.
wkearns: avg. weekly earnings
wkhours: avg. weekly hours
outphr: output per labor hour
ChatGPT Link:
- [3 Points] Using GPT 4o or GPT 4.5, estimate the naïve regression:
wkearns = 𝑚̃ ∗ wkhours + 𝑏̃ + 𝑒̃ (1)
Test for linearity and heteroscedasticity in the chat. Do not correct for any violations.
- [4 Points] Assuming the estimates to be unbiased, interpret the estimates of 𝑚̃ and
𝑏̃, including the p-value. Does this result align with business intuition? Why or why not? [No AI allowed]
- [3 Points] Using GPT 4o or GPT 4.5, estimate the full regression:
wkearns = 𝑚1 ∗ wkhours + 𝑚2 ∗ outphr + 𝑏 + 𝑒 (2)
Test for linearity and heteroscedasticity. If any of these two assumptions are violated, instruct GPT to correct the model accordingly in the chat linked above. If there are no issues, proceed to question 1d.
- [4 Points] Interpret the estimates of 𝑚1 and 𝑚2, including the p-value. [No AI allowed]
- [4 Points] Why do the estimates of 𝑚̃ and 𝑚1 from regressions (1) and (2) differ so drastically? What ‘story’ explains this divergence in the marginal effects of working hours on salary? [No AI allowed]
- Suppose you are interested in estimating the effect of household income on children’s birthrate. The Sheet “Question 2” on dataset “Midterm Data.xlsx” consists of the family income, birth rate, and daily cigarette consumption of the mother.
Family Income: household income
Birth Weight: the birth rate of the child
Daily Consumption of Cigarettes (Mother): the number of cigarettes consumed by the mother while pregnant each day
ChatGPT Link:
- [3 Points] Using GPT 4o or GPT 4.5, estimate the naïve regression:
Birth Weight = 𝑚̃ ∗ Family Income + 𝑏̃ + 𝑒̃ (3)
Test for linearity and heteroscedasticity in the chat. Do not correct any violations.
- [4 Points] Assuming the estimates to be unbiased, interpret the estimates of 𝑚̃ and
𝑏̃, including the p-value. [No AI allowed]
- [4 Points] Why might the daily consumption of cigarettes by the mother need to be included in the regression? When omitting Daily Consumption of Cigarettes
(Mother), would regression (3) be over or underestimating the effects of income on birth rate and why would that be the case?
- [3 Points] Using GPT 4o or GPT 4.5, estimate the full regression:
Birth Weight = 𝑚1 ∗ Family Income + 𝑚2
∗ Daily Consumption of Cigarettes (Mother) + 𝑏 + 𝑒 (4) Test for linearity and heteroscedasticity. If any of these two assumptions are violated, instruct GPT to correct the model accordingly in the chat linked above. If there are no issues, proceed to question 1d.
- [4 Points] Interpret the estimates of 𝑚1 and 𝑚2, including the p-value. [No AI allowed]
- [2 Points] If we are only interested in estimating the effects of household income on birthrate, what are some other potential omitted variables from the regression (4)? Would the proposed omitted variable cause regression (4) to over- or underestimate effect of income on birth rate? Why is that?
- ChatGPT uses a Python code interpreter to conduct the analysis necessary for this course. As I said before, Python is a free open-source programming language with data analysis capabilities that far exceed the capabilities of Excel. While learning to program isn’t a requirement for this course, it is important to know how to leverage the capabilities of ChatGPT’s data analysis features.
For this question, have ChatGPT do something that is practically infeasible with existing Microsoft Office programs and provide the link. The points that you receive for your responds to this question depend on how impressive the task GPT does.
For example, I instructed ChatGPT to create 1,000 Excel spreadsheets, each containing 1,000 coin flips, in the chat link here. I uploaded the 1,000 Excel spreadsheets to Google Docs, which can be found here.
ChatGPT Link:
- [2 Points] What did you have ChatGPT do?