Big Data Applied | 263
Let us start with statisticians and think what statisticians do with numbers. Statisticians often
collect data, they analyse them, they do regression and clustering analyses on them and they
build sophisticated data models around them. In order to carry out such tasks, they require a vast
knowledge in Mathematics and pattern recognition, and should be good with numbers.
The analysis of huge volumes of data and presentation of findings will also require computing
knowledge around Big Data Management, Analytics and Data Visualization.
Now let us take the above skills and combine them with a softer skill set involving the knowl-
edge of a particular industry or business domain. If one can combine these skills and capabilities
together, then one has basically got what most people mean when they talk about Data Science.
Thus, the longer and more complete definition of Data Science can be framed as follows.
Data Science is the application of techniques from various elds, such as Mathematics,
Statistics, Computing, Pattern Recognition, Visualization to Data (both internal and
external to the organization) to derive meaningful and actionable insights, delivering
greater value for the organization in question.
Data Scientists are essential people who are very good with data handling and have a good
semantic and contextual understanding of what data means in a given industry. It not only makes
them good at just determining the answers to certain questions but they also can literally formu-
late the most important questions in the rst place. And, that is what Data Science is all about,
where it is not just answering questions, but asking the right questions that needs to be answered
through patterns and trends involving data.
Points to Ponder
Very often, we come across abuses of the terms like ‘Data Science’ and ‘Data Scientist’.
We come across Hadoop specialists being referred to as data scientists. That is not really correct.
If somebody knows how to work with Hadoop, that is certainly a very useful skill in very high
demand at present, but being a Hadoop engineer is not the same thing as being a data scientist.
Likewise, developers who work with R programming language, which is required for computation
and analysis of data are not necessarily data scientists either. We may definitely find data scien-
tists who are very good at R, but if somebody is an expert with R and that does not necessarily
make him a data scientist.
It is the combination of all the skills we talked about that create a Data Scientist.
10.2.3 How Do We Do Dene ‘Data Science’?
The following are the key steps involved in any Data Science project (Figure 10.2)
Step 1—Organize: Organizing data involves the discovery of data, ingestion of data, physical stor-
age of data, formatting of data and incorporating the best practices of data engineering and data
management.
Step 2—Package: Packaging the data involves logically aggregating and correlating the underlying
raw data into a new representation and package.
M10 Big Data Simplified XXXX 01.indd 263 5/13/2019 9:56:47 PM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.151.164