CHAPTER 13

Sources of Data for Studying the Indian Economy

Introduction: Reliable and comprehensive economic statistics are a public good. They are the bedrock of informed policymaking, besides being a key facilitator for the investment decisions of private firms. They are a vital necessity for informed public discourse in democracies, where citizens seek accountability from their government. The use of scientific methods for their collection and estimation and their timely dissemination, therefore, form a vital public service. Statistical data should be sound. There is no point in offering data that has a weak base. Infirmity, if any, should be rectified first and then data should be released. A concerted effort should be made to maintain the sanctity of all official statistics, especially those related to the economy. The Economic Survey 2018–2019 dedicated an entire chapter to the topic of data, entitled, “Of the People, by the People, for the People,” making a bold call to harness data as a “public good” in the service of the people. This chapter provides an overview of the Indian statistical system and important economic data sources.

Concept of data: Data is defined as “facts or figures from which conclusions can be drawn.” Once data has been collected and processed, it is ready to be organized into information. Indeed, it is hard to imagine reasons for collecting data other than to provide information. This information leads to knowledge about issues, and helps individuals and groups make informed decisions. Statistics represent a common method of presenting information. In general, statistics is “a type of information obtained through mathematical operations on numerical data.”1

Data are plain facts, usually raw numbers. Think of a spreadsheet full of numbers with no meaningful description. In order for these numbers to become information, they must be interpreted to have meaning. Data can come from a government census or organization surveys or research studies.2

Statistical data analysis is a procedure of performing various statistical operations. Statistics is “a collection of methods for collecting, displaying, analyzing, and drawing conclusions from data.”3

It is a kind of quantitative research that seeks to quantify the data, and typically, applies some form of statistical analysis. Quantitative data ­basically involves descriptive data such as survey data and observational data. Statistical data analysis generally involves some form of statistical tools, and a layman cannot perform it without having any statistical knowledge. Various software packages can be used to perform statistical data analysis. This software includes Statistical Analysis System (SAS), Statistical Package for the Social Sciences (SPSS), and Statsoft. Statistics can be in the form of numbers or percentages, and they are frequently presented in a table or graph. Harnessing data consists of four steps—gathering, storing, processing, and disseminating—each of which has room for improvement in India. Not all types of data are amenable to real-time storage, of course.

Importance of data: The provision of accurate and authoritative statistical information strengthens modern societies. It can also lead to life-­saving breakthroughs in medicine and can help conserve the earth’s natural environment. All professional economists, statisticians, and independent researchers in policy, regardless of their political and ideological leanings, seek access and integrity to public statistics. Credibility of the statistical system will incentivize investors about the economy. Statistical information would help the governing class adopt a more informed policy for the benefit of the larger public.

Data must have a long enough time series so that dynamic effects can be studied and employed for policymaking. By being able to retrieve authentic data and documents instantly, governments can improve targeting in welfare schemes and subsidies by reducing both inclusion and exclusion errors. Datasets that utilize information across various datasets can also improve public service delivery. For example, cross-verification of the income tax return with the GST return can highlight possible tax evasion.

The point is, India’s data is and should be sacrosanct, because it is the basis on which much of the nation’s rising economic might has also been recognized internationally. The data is a crucial input to a better understanding of the health of the economy and the consequent policy response to it. Utilizing the information embedded in distinct datasets would inter alia enable the government to enhance ease of living for citizens, enable truly evidence-based policy, improve targeting in welfare schemes, uncover unmet needs, integrate fragmented markets, bring greater accountability in public services, generate greater citizen participation in governance, and so on.

The Indian Statistical System

The growth of India’s vast national statistical infrastructure dates back to its first decade as an independent country. The birth of a new nation led to an explosion of national statistics, based on the need to plan the economy through five-year plans. These initial years of the 1940s and 1950s would see the establishment of the office of the Statistical Adviser to the Government, bi-annual National Sample Surveys (NSSs), the Central Statistical Organization (CSO), and National Income Committees (that made the estimates similar to GDP measurements). The moving spirit behind these developments was Prasanta Chandra Mahalanobis, who founded in Calcutta in 1931 the Indian Statistical Institute (ISI). The institute’s fingerprints were readily apparent in the creation of India’s National Income Committee, the CSO, the International Statistical Education Centre in Calcutta, and the NSS—all created around the mid-century mark.

The UN General Assembly adopted the Fundamental Principles of Official Statistics (FPOS) in January 2014. This adoption was the culmination of the efforts of international agencies and member countries to ensure and secure the autonomy and independence of their statistical systems to produce appropriate and reliable data that adhered to certain professional and scientific standards. The Government of India also adopted the UN FPOS in May 2016. The importance of the statistical system became more prominent when the government constituted the National Statistical Commission (under the chairpersonship of C. ­Rangarajan), which submitted its detailed report in 2001. The Rangarajan Commission went into great detail on the data gaps and infrastructure constraints of the national statistical system both at the central and the state government levels.

In pursuance of the recommendations, the government formally constituted the National Statistical Commission (NSC) in 2005 as a regular institution with a mandate to evolve policies, priorities, and standards in statistical matters. The NSC comprises a chairman and five members along with one ex-officio member (CEO, NITI Aayog [erstwhile Planning Commission]) and the chief statistician and secretary, Ministry of Statistics and Programme Implementation (MoSPI), which also serves as secretary to the NSC. The chairman and members of the NSC are leading experts in their respective fields of statistics, economics, demography, and so on. They are selected by a committee constituted by the government.

The NSC has a much larger ambit and remit in terms of improving the national statistical system. The draft National Policy on Official Statistics, 2018, was a step in this direction to strengthen various pillars of the national statistical system. The NSC has been giving strategic directions to the national statistical system at the central and the state levels from time to time. Over a period of time, there has been an increasing demand on the statistical system for production of relevant and quality statistics through its publications, survey reports, and administrative sources. Looking at the gaps in various sectors, in 2017–2018, the ministry had sought additional resources to undertake several new activities like the Economic Census of Establishments, Annual Survey of Services Sector Establishments, Annual Survey of Unincorporated Sector, National Data Warehouse on Official Statistics. The ministry has also initiated processes for introducing new technological interventions in the data collection process as well as in bringing out its analytical reports.

The MoSPI has two wings: one related to statistics and the other to program implementation. The Statistics Wing, redesignated as National Statistical Organization, consists of the CSO, the Computer Centre, and the National Sample Survey Office (NSSO). The government has stated that the National Statistical Office (NSO) will be headed by the chief statistician of India-cum-secretary (Statistics and Programme Implementation). The NSS has introduced several new interventions like the use of handheld devices, rotational panel samples, and changes in the criteria of selection of households.

Globally, statistical institutions are usually bestowed with professional autonomy. For decades, India’s statistical machinery enjoyed a high level of reputation for the integrity of the data it produced on a range of economic and social parameters. It was often criticized for the quality of its estimates. The NSC has stood up for the cause of clean and independent statistical data. It is equally important to invest more into data collection. While most economists—foreign and Indian—testify to the CSO’s quality and the NSSO using the household as its unit of survey, the fact remains that samples need to be increased. With greater availability of information in a digitized world, India’s statistical data can be greatly improved.

Administrative data: The need was felt for collecting administrative data on a large scale to monitor scheme progress. The first attempt to create a 21st century technology-based monitoring system or management information system (MIS) was made with the launch of the National Rural Employment Guarantee Act in 2006. The large-scale adoption of technology tools by the government has accelerated these efforts. Technology has also enabled administrative data to become increasingly more sophisticated. GIS maps, data dashboards, and mobile apps are today the new shiny tools for collecting and tracking progress (Aiyar 2019). The NITI Aayog has taken the lead in using this data to develop indices that rank states on health, education, water, and sustainable development–­related indicators. The Aayog’s flagship scheme—the Aspirational Districts ­Programme—is now using administrative data to rank a cohort of backward districts on progress made on key social indicators.

By all accounts, various line departments have created their own ­rankings—from the Swachchata Sarveskhan to the ease of living index. These rankings are often cited as evidence of India’s progress toward the goal of “competitive” federalism. The growing use of digital ­transactions—by consumers, investors, tax payers—as well as the rise of newer forms of data collection has the potential to revolutionize Indian public policy. It is unlikely that these newer forms of data will completely replace the more traditional numbers derived from surveys, national accounts, and administrative data; they will more likely complement each other. Government agencies will increase their dependence on big data analytics in the coming years, although the risks to individual privacy should not be underestimated (Rajadhyaksha 2019).

This new data regime is an important step in the right direction. This can serve to improve the quality of administrative decision making, incentivize apathetic bureaucrats to do their job, ensure course correction, and perhaps most crucially, induce transparency. There, however, exist critical credibility challenges with the data. The design of data collection systems is flawed. There is no independent data collection machinery. Data is collected by the very officials entrusted with implementing schemes. This creates perverse incentives for overreporting. And when the stakes increase, so does the incentive to misreport. There is no better example of this phenomenon than the Swachh Bharat Mission data regime. Studies by the Centre for Policy Research’s Accountability Initiative reveal the tendency to overreport. Further, there are no clear data protocols or standards, which makes it difficult to hold the government accountable for claims. For instance, the direct benefit transfer website routinely gives a figure for monies “saved” by the government from using the scheme but with no explanation for how it arrived at this number, rendering it meaningless (Aiyar 2019). This points out the challenges of building credible administrative data systems in India. It serves as a reminder of the dangers and vulnerabilities of data when it is deployed without adequate investments in quality, objectivity, and credibility. Getting this right is now a challenge.

Sources of Data

The following are the large datasets open to the public in India. Such India-based datasets from the public domain and government bodies can come in handy.

  1. RBI Database of Indian Economy: The RBI database is a website launched by Reserve Bank of India and has data on the macroeconomic indicators of the Indian economy. It is loaded with relevant information and data for researchers, analysts, and general users all alike. It has datasets across money and banking, financial markets, national income, savings and employment, and others. The idea is to facilitate contemporary styles of data analysis that can provide important real-time numbers about economic activity, prices, and more.
  2. MoSPI Dataset: This is the dataset provided by MoSPI, a union ministry concerned with the coverage and quality aspects of statistics released. The datasets are collected by conducting large-scale sample surveys across India for various parameters, eventually leading to the creation of the database. The ministry applies standard statistical techniques and extensive scrutiny and supervision to enable this.
  3. Gateway to Indian Earth Observation: An initiative by the Indian Space Research Organization (ISRO), the open data archive provides free satellite data, products download facility, and thematic datasets. It uses a crowdsourcing approach to collect enriching and point-of-interest data. It also acts as a platform to host government data such as those of the forest department. Apart from being a repository of data, it allows users to explore the 2D and 3D representation of the surface of the earth, pest surveillance, disaster services, high-resolution imagery of cities, among others.
  4. National Portal of India: A web portal for Indian citizens, it was developed by the Indian government with the objective of facilitating a single window access to information and services of all government entities. It was designed and developed jointly by the National Informatics Centre (NIC) and the Ministry of Electronics and Information Technology. A single point access to a lot of information, it has a searchable contact directory, a database of the government website, and others.
  5. Survey of India: India’s central engineering agency, the Survey of India, is in charge of mapping and surveying, under the Department of Science and Technology, and is one of the oldest scientific departments. With data centers spread across India, it has user-focused, cost-effective, reliable, and quality geospatial data from across India.
  6. India Weather Data: Datasets for various meteoroid indicators, water resource planning, rainfall, and others from across various parts of India are available for users in simple formats. It also contains databases for several other parameters such as temperature, pressure, relative humidity, precipitation amount, wind speed, solar radiation.
  7. Aadhaar4 Metadata: This provides a huge database generated by the daily count of total registrations, enrolment applications accepted and rejected by state and district. It also contains other details such as Aadhaar generated by age, gender, and so on.
  8. Import Exports Datasets: ICEGATE or the Indian Customs Electronic Commerce/Electronic Data Interchange (EC/EDI) Gateway is a portal with e-filling services for trade and cargo carriers. It also has an exhaustive National Import Database (NIDB) and Export Commodity Database (ECDB) for the Directorate of Valuation. It has information such as documents, messages, and various processes provided by the Indian Customs EDI System (ICES).
  9. Open Government Data (OGD) Platform India: The union government’s OGD platform allows citizens to access a range of government data in machine-readable form in one place. The portal allows union ministries and departments to publish datasets, documents, services, tools, and applications collected by them for public use. Excluding datasets that contain confidential information, all other datasets are made available to the public, ranging from data on welfare schemes to surveys to macroeconomic indicators. The platform also includes citizen engagement tools like feedback forms, data visualizations, application programming interfaces (APIs). Open data not only helps government officials make better decisions but also gets people involved in solving problems. Throwing open government data to the public multiplies the number of people analyzing and deriving insights from the data. The usability of the data itself, consequently, increases.

    To engage people meaningfully in solving problems, the Ministry of Human Resource Development recently initiated the Smart India Hackathon—an open innovation model to discover new, disruptive technologies that could solve India’s most pressing problems. Smart India Hackathons are product development competitions in which participants get a problem statement and relevant data, using which they develop a prototype software or hardware. These competitions crowdsource solutions to improve governance and increase the efficacy of welfare schemes. None of this would be possible, of course, without reliable data.

The Government of India collects four distinct sets of data about people: administrative, survey, institutional, and transactions data.

  1. Administrative datasets include birth and death records, crime reports, land and property registrations, vehicle registrations, movement of people across national borders, tax records. Governments also gather data to evaluate welfare schemes, for example, the Ministry of Drinking Water and Sanitation gathers data on toilet usage to assess the efficacy of the Swachh Bharat Mission. Governments hold administrative data mainly for nonstatistical purposes.
  2. Survey data is data gathered predominantly for statistical purposes through systematic, periodic surveys. For example, the NSSO conducts large-scale sample surveys across India on indicators of employment, education, nutrition, literacy, and so on. Because these data are gathered for statistical analyses, the identity of participants is irrelevant and unreported, although these identities may be securely stored at the back end without violating any legal guidelines on privacy.
  3. Institutional data refers to data held by public institutions about people. For example, a government-run district hospital maintains medical records of all its patients. A government-run school maintains personal information about all its pupils. State-run universities maintain records of students’ educational attainment and the degrees awarded to them. Most such data are held locally, predominantly in paper-based form. This data can be digitized to enable aggregation at the regional or national level.
  4. Transactions data are data on an individual’s transactions such as those executed on the United Payment Interface (UPI) or Bharat Interface for Money (BHIM) Aadhaar Pay. This is a nascent category of data but is likely to grow as more people transition to cashless payment services. While the latter two (the institutional data and transactions data) databases are in a fledgling state, the first two (administrative data and survey data) are comprehensive and robustly maintained.

    Data collection in India is highly decentralized. Of late, there have been some discussions around the linking of datasets—­primarily through the seeding of an Aadhaar number across databases such as the Permanent Account Number (PAN) database, bank accounts, and mobile numbers. But this does not mean that the Unique Identification Authority of India (UIDAI) or the government can now read the bank account information or other data related to the individual. The government has taken steps to eliminate all privacy concerns by utilizing technological advances.

  5. Wildlife Institute of India Dataset: An autonomous institution under the Ministry of Environment, Forest and Climate Change, Government of India, it has datasets on different wildlife species in India. A total of 4,591 specimens are housed at the Wildlife Institute of India (WII) herbarium, of which data related to 4,322 are digitized and published through the Global Biodiversity Information Facility (GBIF) network. The data is mainly used by researchers and field managers from the respective protected areas of the country to prepare management plans and for other research.
  6. National Institution for Transforming India (NITI Aayog): The NITI Aayog has laid out a vision for making available anonymized data across sectors. Data science is still a very nascent field in India, despite the recent surge in interest. From agriculture to health care, there are a plethora of challenges the government faces on a day-to-day basis, and that was the primary reason for founding a data science department under the NITI Aayog initiative.
  7. Indian Statistical Institute: The Indian Statistical Institute Act of 1959 designated ISI as an “Institution of National Importance.” The activities steadily grew, existing interests became more broad based and a number of science units were created in the interest of a live interaction between statistics and the natural and social sciences. The ISI provides a number of statistical services. Over the years, it has played a key role in the development of statistical theory and methods by promoting research and practical applications in different areas of the natural and social sciences.

It is to be noted that governments already hold a rich repository of administrative, survey, institutional, and transaction data about citizens, but these data are scattered across numerous government bodies. Currently, much of the data is dispersed across different registries maintained by different ministries. This is why every time a citizen has to access a new service, they are asked to collect all the documents to prove their identity and prove their claim on the process. The citizen faces the inconvenience of having to retrieve data trapped in paper files within the government system to unlock a benefit they are entitled to. The government can deliver a better experience to the citizen by bringing disparate datasets, scattered across various ministries, together. If the information embedded in these datasets is utilized together, data offers the potential to reduce targeting error in welfare schemes (The Economic Survey 2018–2019).

One caveat on data pertaining to the Indian economy is in order: A large part of economy is informal, and uses cash instead of normal banking channels. Apart from these legitimate gaps, there is also the reality of the parallel economy, of all economic activity not being accurately recorded, either to cheat on taxes or because it is inherently illegal. These constraints are well understood. The government has found ways to overcome them, and these ways are consistent, as they do not vary from one government’s term to the next. In the absence of consistency, the government itself will become the biggest loser in the process.

International Data

World Bank: As a repository of the world’s most comprehensive economic data regarding what’s happening in different countries across the world, the World Bank is a vital source. It also provides access to other datasets that are mentioned in the data catalog. World Bank Data is massive because it has got 3,000 datasets and 14,000 indicators encompassing micro-data, time series statistics, and geospatial data. Accessing and discovering the data one wants is quite easy. All that is needed is to specify the indicator names, countries, or topics and that will open up the Data. World Bank Data also allows one to download data in different formats, such as CSV, Excel, and XML.

World Health Organization (WHO): WHO keeps track of health-specific statistics of its 194 member states. The repository keeps the data systematically organized. It can be accessed as per different needs. For instance, whether it is mortality or burden of diseases, one can access data classified under 100 or more categories. The good thing is that it is possible to download whatever data one needs in Excel format. One can also monitor and analyze data by making use of its data portal. The application program interface (API) to WHO’s data and statistics content is also available.

Google Public Data Explorer: Launched in 2010, Google Public Data Explorer can help explore vast amounts of public interest datasets. One can view and communicate the data for respective uses. It makes data from different agencies and sources available. For instance, one can access data from the World Bank, the U.S. Bureau of Labor Statistics, the ­Organisation for Economic Co-operation and Development (OECD), the International Monetary Fund (IMF), and others. Different stakeholders access the data for a variety of purposes. One can deploy various ways of representing the data, such as line graphs, bar graphs, maps, and bubble charts, with the help of Data Explorer. The best part is that one would find these formats quite dynamic. It means that one will see them change over time. One can change topics, focus on different entries, and modify the scale. They are easily shareable too. As soon as one gets the chart ready, one can embed it on the website or blog or simply share a link with one’s friends.

Problem of Indian statistics: The data is always questionable. The lack of accuracy in the official data makes it much more likely that authorities will miss major swings in activity and be unable to react quickly to head off a crisis. It is also a problem for investors, who may be misled into thinking the economy is more robust than it really is. The proportion of the Indian economy that is based on the unofficial sector, such as household enterprises, makes it a nightmare to assess economic activity. The authorities were open to suggestions for improvement. This is evident from two recent measures that suggest that the government is inclined to reform the system and improve its credibility. In December 2019, it established a Standing Committee on Economic Statistics to get a sense of the scale of the problem. The heartening aspect of the composition of this committee is that it includes statisticians who have been critics of the manner in which inconvenient data has been dealt with by the government. The setting up of the committee signals an acknowledgment on the government’s part that there is a credibility problem with official data, which matters both to investors and to policymaking itself, and that it has to be addressed. Accuracy of data, including its generation and dissemination, is important, especially in today’s context, where the extent or even cause of the economic slowdown in India is not fully clear. The absence of reliable data makes formulation of policy responses difficult.5

Separately, in the same month, a draft legislation to set up a National Statistical Commission (NSC), with an independent secretariat, as the nodal regulatory body for all principal statistical activities of the country was placed in the public domain to invite feedback. The draft legislation has its roots in reforms recommended by an expert group about two decades ago. Some positive features of this draft legislation are that it says the government shall seek the NSC’s advice, implying a level of significance. There is a modicum of financial autonomy through an initial endowment grant by the government, a necessary measure. Yet, there remain doubts on the extent of autonomy. The presence of two government officials and an RBI deputy governor has triggered disquiet among some. There is a case for the chief statistician of India to be a part of the commission, but it’s best to avoid more nominees of the government. If the NSC is to make a difference, it needs to be truly autonomous.6

The need for timely and reliable statistics for policy formulation and planning cannot be overemphasized. There is reason to believe that with the progressive dismantling of the system of economic controls, the quality of data flows has weakened. There are already plans to revamp data compilation and capture the nuanced relationship between prices and real GDP (Kumar 2019). The challenge for the statistical system is not just to produce credible and robust national accounts estimates, but also to insulate itself from the political system. It is easier to correct the discrepancies in statistical data than to restore the credibility of the statistical system (Himanshu 2019). Unless the government has access to objective and robust data, its policy interventions are bound to be ill informed. Understaffed and underfunded statistical services cannot possibly have sufficient domain expertise to undertake substantively informed analyses in all the areas for which statistical data are required.

There have been questions for many years about whether Indian government statistics were telling the full story. The government itself has admitted there are deficiencies in its data collection. The government has been repeatedly telling the citizens that its data collection machinery is inefficient, inadequate, and above all not fast enough to capture the rapidly changing scenarios. A better way of building a robust data infrastructure may be to ensure that each major data collection activity is augmented by an analytical component led by domain experts, recruited from diverse sources, including academia (Desai 2019).

To summarize the above discussion, the Open Government Data initiative is an illustration of the spirit of data as a public good, which implies that the government must redouble its efforts in this direction (Economic Survey 2018–2019). Through Aadhaar, India has been at the forefront of the data and technology revolution that is unfolding. The government needs to view data as a public good and make the necessary investments. Going forward, the data and information highway must be viewed as an equally important infrastructure as the physical highways. Citizens must have a time-bound and easily accessible recourse to any data breaches. India stands at the cusp of a major opportunity, one where data and digital platforms can become an enabler of a meaningful life for every Indian. Maximizing the public good but also safeguarding against harm must be the mantra for the new, digital India.

But lately, Indian statistics and the institutions associated with them have come under a cloud for being influenced, and indeed, even controlled by political considerations. It is imperative that the agencies associated with collection and dissemination of statistics, such as the CSO and the NSSO, are not subject to political interference and their work, therefore, enjoys total credibility. The government must act to address the apprehension of credibility about data that has engulfed Indian economic statistics.

Data can serve the same purpose as the stones that enable one to cross the river. Concurrent with the data explosion of recent years, the marginal cost of data has declined exponentially while its marginal benefit to society has increased manifold; therefore, society’s optimal consumption of data is higher than ever. While the private sector does a good job of harnessing data where it is profitable, government intervention is needed in social sectors of the country, where private investment in data remains inadequate (Economic Survey 2018–2019).

Endnotes

  1. 1. Definitions, Archived Content, Statistics Canada https://www150.statcan.gc.ca/n1/edu/power-pouvoir/ch1/definitions/5214853-eng.htm, (accessed on February 4, 2020)
  2. 2. Data & Statistics, https://elon.libguides.com/data, (accessed on ­February 4, 2020)
  3. 3. Basic Definitions and Concepts. https://2012books.lardbucket.org/books/beginning-statistics/s05-01-basic-definitions-and-concepts.html, (accessed on February 4, 2020)
  4. 4. Aadhaar or Unique Identity Number (UID) is a 12-digit number based on biometrics-related information. The Unique Identification Authority of India (UIDAI), the issuer of Aadhaar card and Aadhaar number, https://www.ndtv.com/business/not-every-12-digit-number-is-aadhaar-says-uidai-how-to-check-aadhaar-card-validity-1761672, (accessed on February 4, 2020)
  5. 5. The Indian Express, December 30, 2019, Problem with figures, https://indianexpress.com/article/opinion/editorials/government-statistics-gdp-nsso-modi-govt-6190590/, (accessed February 5, 2020)
  6. 6. The Times of India, February 5, 2020. Autonomy for NSC: If proposed National Statistical Commission is to make a difference, its independence is imperative, https://timesofindia.indiatimes.com/blogs/toi-editorials/autonomy-for-nsc-if-proposed-national-statistical-commission-is-to-make-a-difference-its-independence-is-imperative/, (accessed February 6, 2020)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.97.208