CHAPTER 12

Conclusion: Research Project and Pitfalls

Congratulations! You have reached the last chapter on your initial journey into the fascinating world of econometrics. At this time, we are all talking about how to prepare for our final research paper. This is exciting but also challenging. Taila and Invo both say that they have never written a full econometric paper in their whole life.

Prof. Metric tells them that there is nothing to worry about, because this class focuses on empirical study which is less challenging but equally fun to do when compared to a theoretical research. He says that by finishing this last chapter, we will be able to:

1. Master step by step in writing an empirical paper.

2. Collect a dataset and process it for estimations.

3. Avoid the pitfalls in applied econometrics.

Assured that Professors Metric and Empirie will guide us through every step of our research projects, we are looking eagerly forward to learning how to write a good one.

Writing an Empirical Research Paper

Prof. Metric says that regardless of whether a paper is theoretical or empirical, if we can motivate ourselves to look for a topic that addresses a burning question that connects any theoretical concept that we have learned in Economics for Daily Lives and that we wish very much to answer, then we will have the passion and desire to pursue the research. Being able to raise a good question is crucial in sustaining our interest, because, if we can raise one question, we will be able to raise many subsequent questions after the first one is solved.

There is no limit on the number of pages of a theoretical paper. In fact, some theoretical papers are only 3 pages long and still very valuable to the professional community, but an empirical paper is often longer and rarely less than 10 pages. That is because professional researchers usually expect an empirical paper to follow a certain format so that research, methods, and results can be replicated by other researchers. We learn that an empirical paper is an applied project and can be divided into five or six sections, excluding the abstract and the references. If we search for a scientific work on the Internet, we will see that the main paper is always published separately from the abstract and the references, which are often free to all readers on a journal’s website.

Writing an Abstract

Once we choose a topic, we need to write an abstract, which is a statement of the problem at hand and which is usually between 100 to 300 words. An abstract should have the following information:

  (i) The question we wish to answer.

 (ii) The economic concepts related to this question.

(iii) The econometric model we choose to carry out the research.

(iv) The data used.

 (v) The results.

(vi) The potential contribution to the existing literature.

It’s important to note that the abstract will help us to focus our thoughts on what we really want to investigate and help us to write a well-organized paper.

Writing an Introduction

Prof. Metric tells us that we should be able to provide a clear description of our topic in the first paragraph of the introduction section. We want to provide some interesting background about our topic and how it relates to an economic concept that we learned. Using the Internet, newspaper, or magazine sources to motivate the readers is acceptable in this section.

In the second paragraph, we need to explain why the topic is challenging and why it is important to answer the question. In this paragraph, we want to attract the reader’s interest by providing more argument for the relevant of the topic. In order to sustain the reader’s interest, we might want to avoid technical terms or jargon and focus on verbal explanations of why it is interesting. For example, is there a puzzle that contradicts a stipulated theory with the existing empirical results? What is the main contribution of our paper to the existing literature? Will the results help foster a certain policy? Being able to state the problem well will get the readers to care about our research and wish to continue to read.

In the next paragraphs, we need to provide a brief verbal description of the econometric model, estimation method, and the data used in our project. It is also helpful to provide one or two sentences about how the paper will be organized in the remaining sections.

Literature Review

While conducting research, you might notice that other researchers have already examined some aspect of your topic. For this reason, we should provide a summary of the existing literature related to our paper. This section should be placed after the introduction but before the section on our econometric model. For an empirical paper, the length of this section is typically two to four pages long.

Furthermore, while researching your topic, it is necessary to only survey the papers directly related to the topic. It is better to write a brief description of each paper, instead of quoting them directly, so that we can focus on pointing out the strengths and the weaknesses of the existing papers. If you can find any contradictory results in the previously published papers, this is very beneficial because it justifies the reason we want to pursue the project. Prof. Metric reminds us to highlight how our own paper can shed light on this contradiction and extend the existing knowledge of the topic.

We should avoid using the Internet, newspaper, or magazine sources in this section because their comments are usually based on a writer’s subjective opinions instead of scientific research. For this reason, we should stay with papers published in scholarly journals. There are several sources for finding the scholarly papers:

Once you find a related paper in a journal, you can go to a public library or a university library to either find the paper or to fill out an interlibrary loan for free. Alternatively, you can join the American Economic Association that charges a very affordable fee for students to enjoy free access to all journals in EconLit.

Econometric Model

Prof. Metric reminds us that our econometric model is always related to an economic theory that we might have learned in an economic class. We should be able to explain the theory behind our econometric model clearly and concisely. This is the crucial point in order to distinguish an econometric research project from other statistical research that focuses only on the statistical fit among variables. Economic theory should be the most important factor that guides our selection of dependent and explanatory variables.

Once we finish analyzing an economic framework and pointing out how it is related to our econometric model, using common sense and the existing literature results, enables us to add more control variables to our list of explanatory variables. We should be able to justify our functional form such as logarithm, quadratic, and so on. A combination of scatter plots, intuition, and discussion of prior research is very helpful in choosing a functional form. If there are contesting theories related to our topic, it will be interesting to estimate two or more models instead of one so that we can have fun comparing and contrasting the results.

Estimation Method and Data

We need to state certain assumptions in order to make a decision on our estimation method. Most of the time, this requires us to perform many preliminary tests before making this decision. For example, if a preliminary test reveals that the classical assumption of homoscedasticity is violated, we will need to perform a Generalized Least Squares (GLS) estimation instead of an Ordinary Least Squares (OLS). If a modified Hausman test also shows that there is an endogenous problem, we need to combine this GLS estimation with a Two-Stage Least Squares (2SLS) procedure and thus need to launch a search for an instrumental variable and its corresponding data.

Since finding an appropriate method is usually related to searching for new data, it makes sense to combine the section on the estimation method with the section on the data. Prof. Metric says that many researchers love to start with an OLS estimation to obtain basic results for comparison no matter how many classical assumptions are violated.

Concerning the data, we need to be able to describe the source and the content of the dataset and the justification to use them. We then explain how we acquired the data, the nature of the data (cross sectional, time series, or panel data), the time period, the frequency (quarterly, annually, etc.), the units, the number of observations, and whether any observations were eliminated and why. Provide a summary of descriptive statistics such as the means, standard deviations, and so on, so that we can show the readers that the dataset is reliable for our estimations. Graphs, charts, and maps of the related regions are also helpful in sustaining the reader’s interest in our paper.

Presenting and Discussing the Results

Prof. Metric says that this section is the highlight of our paper so it is important to make sure that it is well organized. In a professional paper, we should avoid reporting the results with a display of the Excel’s summary output, which was displayed in this textbook for our learning purpose only. Summarize our results into a table or several tables, reporting the coefficient estimates and the test statistics in each table.

Be very thoughtful in making decisions on how to combine several estimations into a table. For example, if we perform OLS, GSL, and 2SLS estimations on the same model, placing the results from all three estimations into a single table is better than using three separate tables. This enables the readers to compare and contrast the results and inspires them to read your analysis. Use multiple tables when we change data, from regional to provincial data in a country for example, so that we can compare the regional effect versus the aggregate effect. We also want to use a different table if we change the models, from a model with labor and capital as explanatory variables to a model in per-worker form for example, so that the readers can follow our analysis with ease.

Prof. Metric emphasizes that providing a clear interpretation of our results is crucial in keeping the readers interested in the paper. Three elements, the direction, the statistical significance, and the magnitude of the coefficient estimates, are the most important in our discussion of the results. Point out if a positive or negative direction fits our expectation and why. Explain why some variables are statistically significant or insignificant. Provide an interpretation for the magnitude of the coefficient estimates based on the unit of each variable and the functional form (logarithm or level, etc.).

Conclusion

We should focus on three issues when writing the conclusion. The first is to summarize the results into two or three short sentences, so that the readers can wrap up the main points of the paper. The second is to relate the results with the research question, the economic theory, and the existing literature presented in the introduction section to show the readers that we have achieved the main purpose of the paper. Finally, present the caveats and the suggestions for further research based on these caveats. This is to inspire and open up future opportunities for the next generation of researchers to continue with the topic of our interest.

References

This section does not belong to the main paper and contains a list of the existing papers we cited and the data sources we used in our main paper. The list is in an alphabetical order, often of the last names, with the exception of the authors from several Asian countries such as China, Korea, Vietnam, and so on, in which the family name comes before the given name. Make sure that you provide all articles and books cited in your paper and delete any one not cited in your research.

To wrap up this section, Prof. Metric reminds us that a journal always uses a certain style of writing such as Chicago or Harvard Manual Style. Hence, we need to read the Guidelines for Author section on the journal website carefully to follow its instruction before submitting our paper for publication consideration.

Empirical Issues in Econometrics

Prof. Empirie says that there are two important issues in applied econometrics. The first is the data and the second is the pitfalls in empirical research. We are going to discuss these two issues in the next two sections.

Data Issue

A dataset is the main ingredient in econometric analyses and needs a great deal of attention so that the results of our estimations are reliable.

Data Sources

There are various sites on the Internet that provide economic data for analyses. Listed as follows are the most common sources:

For U.S. macroeconomic data

Federal Reserve Bank of St. Louis compiles United States and international time series data at

https://fred.stlouisfed.org/

The Bureau of Labor Statistics hosts data on U.S. labor force unemployment and consumer prices at https://bls.gov/

The Bureau of Economic Analysis (BEA) provides data on GDP, personal income, and corporate profits at the national and state levels at https://bea.gov/

The National Bureau of Economic Research (NBER) publishes data on U.S. economy, industry, and international trade at http://nber.org/data/

For U.S. microeconomic data

The Panel Survey of Income Dynamics (PSID) is a panel survey recording many observations for the same people over time at https://psidonline.isr.umich.edu/

The Integrated Public Use Microdata Series (IPUMS) at the University of Minnesota provides historical microdata at https://ipums.org/

The Centers for Medicare & Medicaid Services (CMS) offers extensive data on medicare, medicaid, and health insurance at https://cms.gov/Research-Statistics-Data-and-Systems/Research-Statistics-Data-and-Systems.html?redirect=/home/rsds.asp

International Data

The World Bank (WB) provides country-level data on many different economic and demographic indicators at http://world-bank.org/

The International Monetary Fund (IMF) compiles data on financial markets and government finance at http://imf.org/external/index.htm

The World Trade Organization (WTO) offers data on exports and imports of goods and services at https://wto.org/

Oanda is a foreign-exchange company that posts historical data on currency exchange rates on its website at https://oanda.com/

Processing Data

Prof. Empirie reminds us to read the explanations that usually accompany a dataset, in order to understand the meaning of each dataset once the data has been downloaded. We need to know the definitions of the variables, the units, and the scope (cross sections and time periods) to synchronize the dataset. For example, if production of cereal for a province is in kilograms per person but production of maize is in tons, we need to change the units of maize to kilograms then divide all values in this dataset by the province population to obtain the production of maize in kilograms per person. We also need to arrange the data into appropriate forms for regressions.

To form cross-sectional data or time-series data:

Most datasets are arranged horizontally to save space, and we need to transpose the data from horizontal to vertically by doing the following steps:

Copy the entire dataset then right click an empty cell.

Under Paste Options choose Paste Special.

A dialog box will appear. Choose Transpose then click OK.

To form panel data:

Transpose the time-series section of each identity then copy and paste the time series for each identity gradually into Excel.

Missing Observations

Excel cannot handle missing observations. Touro notes that Vu (2015) offers a solution by calculating an average of the two adjacent values and use this average to fill in the missing value. Booka raises her hand and says, “That is an acceptable solution but only viable if the two adjacent values are not missing. I can offer a better solution that I found in an econometric book at my company.” We are very eager to hear of her solution, and Booka starts to give us the following instructions, assuming that Column A has missing observations:

First, replace the missing observations in column A with zeros.

Next, insert an empty column next to Column A (Column B).

Label Column B with Ds for missing dummy.

In Column B, fill the cells next to the zeros in Column A with number ones.

Fill the remaining cells in column B with zeros.

Perform the regression with the dummy Ds added to the original model.

This will solve the problem of missing observations.

We think that it sounds very interesting, but Invo is wondering what the meaning of the coefficient estimate is for Ds. In answering this query, Prof. Empirie first praises Booka for utilizing the available books in her company and then says that it is not required for us to interpret the meaning of Ds. It is just a binary dummy to control for missing observations, a way to tell Excel that these are missing observations instead of zero values in the dataset.

Missing Observations in Logarithmic Model

Taila asks, “If we treat the missing-observation problem with the dummy Ds and then have to take the logarithm of the model, we will run into a problem because the logarithmic values of these zeros are undefined. How can we fix this problem?” Touro says that he remembers this excerpt in Vu (2015),

We can replace the zero with a small number. In order for the number to be small enough so that it will not bias the results, we need to scale up the dataset. For example, if the units are in thousands of dollars and the other values are in the range of 5 through 10, we can change them to dollars so that the values are in the range of 5,000 through 10,000 and then add 1.0 to the whole series so that 5,000 becomes 5,001 and the zero becomes 1.0.

Invo says, “Then we can replace the missing observations in Column A with number ones before taking the log of the whole dataset so that these number ones become zeros. Then and only then, we insert Column B for the next step.” Prof. Empirie smiles saying that is correct and that what a terrific class she has. She then guides us to the next issue.

Pitfalls in Empirical Research

Prof. Empirie says that there are several common mistakes in applied econometrics that we want to avoid.

Failing to Ask Questions

We learn that it is easy to get obsessed with running regressions without asking why a certain exercise is meaningful to the readers. To guarantee that a topic is interesting, we need to ask ourselves how the topic relates to an economic theory that is attractive to the readers. What is our common sense that can motivate the connection between the explanatory and the dependent variables? Why we believe that the topic is sensible to other people? These conceptual questions are more important than technical questions arisen from the model itself.

Being Careless on Literature Review

One of the common mistakes for beginners in applied econometrics is to ignore the work of other researchers. It often happens when we think that we have found a unique question, but that was in fact already addressed by other researchers. Prof. Empirie reminds us that a paper is only interesting if it is new to the readers. If many researchers have already found various answers for a certain question, our regression exercise is meaningless. To avoid this mistake, we need to review most of the existing papers carefully and state clearly what the contribution of our paper is to the current research community.

Believing that a Dataset Is Perfect

We will fail to provide a meaningful interpretation of the results unless we understand the limitation of the data. Most students assume that the information in a dataset is accurate. This is far from true, especially if the dataset comes from a third-world country where the data compilers are sometimes motivated to change the values of the dataset for certain purposes. For example, many provincial officials direct the data compilers to increase the values of the province’s GDP to receive rewards from the central government. Some compilers also fill in missing observations with numbers to avoid reprimand from the leaders for failing to obtain the data. Finally, the central government might direct the compilers to reduce the number of terrorist attacks in the country to create a false image of a peaceful country hoping to increase tourist arrivals.

Understanding the limitation of a dataset will reduce our chances of getting unexpected results by performing descriptive statistics, charting, and preliminary testing more carefully so that we can pinpoint the measurement errors in the dataset and look for solutions among the econometric methods we learned.

Using a Too Complicated Model

Prof. Empirie says an econometric model does not need to be overly complicated to predict well. Econometric students often use difficult methods hoping to demonstrate their econometric skills. They also tend to use more variables than needed out of fear for omitted variable biases. Using a complicated model can lead to difficulties in making interpretation of the results and therefore confuse the readers. Adding too many irrelevant variables inflates the variances of the coefficient estimates and leads to imprecise interpretations of the significance levels. To avoid this mistake, make a careful survey of the existing literature and learn from other researchers on their choices of models and variables.

Giving Up too Early

When conducting research, we often have expectations about the results to our question. When the actual estimation results are contrary to our expectation, we give up. This should not be the case. In fact, an opposite results should inspire us to find out the explanation in economic theory or existing literature.

Touro offers a story. Last year, he tried to estimate the impact of Hurricane Andrew on the production of manufacturing sectors along the U.S. East Coast. Since a hurricane is a devastating event, he expected the impact to be negative. When the results came out positive and significant, he thought that something was wrong with his model or estimation methods or data, and eventually gave up on the research. He later found that his results were consistent with the “investment-producing destruction” hypothesis by Tol and Leek (1999) who argued that the positive effect on GDP can be explained by the efforts of the government and private sectors to replace the capital stock destroyed by the disasters. Since capital has a positive effect on output in economic theory, this new capital raises the level of production.

Prof. Empirie praises him for a good example and says that different economic models can explain different phenomena. Sometimes, an insignificant coefficient estimate leads to a significant economic discovery and inspires economic theorists to develop new models so that they can explain the new results. She hopes that we will remain persistent in our pursuing of a topic if we are really interested in it.

Prof. Empirie concludes the course by telling us that this book is only a beginner textbook in econometrics. She hopes that Prof. Metric and she have sparked enough passion in econometrics that we will continue our joyful journey to success by reading more advanced books on data analyses.

We are happy to see that Prof. Metric also shows up and actually casts a tearful “goodbye” to us, emphasizing that he had a great time with the class and will miss us dearly. We all say thank you to the professors and acknowledge that their teaching styles were very inspiring. We promise them we will pass on their love of econometrics to our colleagues and will apply the knowledge we learned in this class into analyzing problems we encounter in our daily lives.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.135.81