Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

S. KampakisThe Decision Maker's Handbook to Data Sciencehttps://doi.org/10.1007/978-1-4842-5494-3_10

10. Hiring and Managing Data Scientists

Stylianos Kampakis¹

(1)

London, UK

We’ve talked about why you should hire a data scientist and the numerous benefits it will bring your business. We’ve also looked at why it’s important to let them do their job without you trying to guide them. But to do that, you need to hire someone you can trust.

In this chapter, we’ll be looking at precisely how to hire and manage a data scientist to get the best possible results. First, let’s try to understand data scientists a little better.

Into the Mind of a Data Scientist

A competent data scientist needs to have a lot of knowledge and a wide range of skills, but certain skills are more important than others.

As you can see in Figure 10-1, a data scientist needs hacking skills, math and statistics knowledge, and substantive expertise. While each of the skills is highly valuable individually, if they only have two of the three, they are not a data scientist—and in one particular case, it can actually lead to problems.

../images/490014_2_En_10_Chapter/490014_2_En_10_Fig1_HTML.jpg — Figure 10-1
The data science Venn diagram

Code Hacking Skills

Okay, so when I say code hacking skills, I’m not talking about breaking into NASA, but the fact is that a data scientist needs to have excellent computer knowledge and decent programming skills. This doesn’t mean that they need a computer science degree as there are plenty of excellent “hackers” out there who have never taken a single computer course in their life.

However, a data scientist needs to be able to think algorithmically, understand vectorized operations, and work with text files at the command line. Without these skills, it’s not possible to be a data scientist. In most cases, the actual quality of the code that data scientists produce does not have to be production level. That’s why we are using the term code hacking, instead of software engineering, as the latter means the code should be of very high standard. Data scientists use the code as a tool in order to reach solutions to problems. However, it won’t hurt if you find a data scientist whom might also have very good coding skills.

Mathematical and Statistics Knowledge

Once those hacking skills have been used to get the data into shape, it’s time to analyze it and extract insights. For this, a data scientist needs to have math and statistics knowledge so they can apply the right methods and models.

Again, a data scientist doesn’t need a PhD in statistics to be effective at their job, but they must know how to develop and work with mathematical and statistical models, including how to interpret the results.

Domain Knowledge

Domain knowledge is important for a data scientist: the need to understand the subject matter and not just the technical aspects of it. While this is not a hard requirement, lack of domain knowledge can cause all sorts of issues (we’ve already seen some examples, and more will follow). Domain expertise allows data scientists to apply the other skills to the data in such a way that they reach the desired goal. Data scientists have to teach themselves the peculiarities of a domain, every time they move to solving problems in a field they’ve not encountered before.

Two Is Not Enough

Only having two of these skills/knowledge is insufficient. A capable hacker with knowledge of math and statistics but no substantive expertise would be ideally suited for machine learning. However, their skills won’t make them a data scientist.

Likewise, if someone has substantive expertise and knows math and statistics, then they are probably looking at traditional research. This intersection is basically what much of academia is made up of—though it should be noted that young researchers are choosing to evolve and learn more about tech. However, without those hacking skills, they will never be a data scientist.

The real problem arises when you have someone who is a skilled hacker and has substantive expertise. It creates a dangerous person because they know just enough to create what seems to be a legitimate analysis but they have no clue how they reached their conclusions or what they’ve actually created.

Essentially, these are people who have the skills to apply analytics on a problem, but lack the theoretical foundation to make sure they are doing the right thing. So, they might run a statistical test, but could be making erroneous assumptions. However, since they believe they are basing their decisions on data, they have a false sense of confidence which can lead to serious problems.

So, remember, a competent data scientist will be a skilled hacker, have math and statistics knowledge, and also have substantive expertise in your chosen field.

What Motivates a Data Scientist?

Working effectively with someone means understanding who they are and what motivates them. Unfortunately, it can be difficult for people to understand data scientists, precisely because they have such a wide range of skills that they can’t be neatly slotted in a box.

A data scientist isn’t just someone who knows math and statistics and can work a computer like a pro. These are people who also have great communication and visualization skills, enabling them to communicate well with senior management as well as to tell a story effectively. They are also passionate about the business and curious about data. They constantly want to learn more and they want to solve problems and influence things without the authority that might usually accompany this role. They are strategic, proactive, creative, and innovative people who like to collaborate with others.

They definitely love to be mentally stimulated, and solving problems is their bread and butter. And they have an innate curiosity and drive to grow, which means they love learning new things and acquiring new skills.

The fact is that learning new skills is an integral component of today’s market. It’s the Red Queen’s race or hypothesis in action, which essentially says that an organism must constantly adapt, evolve, and proliferate just to survive.

What Will Disengage a Data Scientist?

If you give your data scientist a lot of repetitive work, you are going to end up with a very disengaged person on your hands. So, if they have to spend ridiculous amounts of time cleaning up data or doing anything else that is the equivalent of counting grains of sand, you are going to be dealing with someone who really doesn’t want to be there. And the results will suffer.

Likewise, if they get bored because the problems you are asking them to solve are too simple, you aren’t going to win any points. Of course, sometimes simple problems need to be solved, but if you are going to make the leap and hire a data scientist, make sure to take full advantage of their capabilities. Otherwise, it would be like hiring a Michelin star chef to make you a grilled cheese sandwich.

The peril is that the chef, out of boredom, will go off on a tangent and serve you a grilled cheese sandwich that’s been deconstructed and reconstructed until it has little in common with that simple sandwich that you wanted.

Also, another problem stems from communication problems. Friction in communication often arises when upper management doesn’t understand what the data scientist is doing.

A simple example is that the data scientist ends up doing a lot of data wrangling, which is utterly boring, because of bad communication with the developer in terms of how the data should have been stored.

To return to our Michelin star chef analogy, Heston Blumenthal is a world-renowned chef and his main “thing” is molecular gastronomy. He uses a range of tools and ingredients that are not always the norm in a kitchen. Say you hire him to cater for an event and he tells you what he needs, but because you don’t understand his process, you just stock the kitchen with all the usual tools and ingredients.

When he arrives to do the cooking, he ends up having to cook a traditional meal that is completely boring for him, resulting in food that might be decent but nowhere near the quality of what he could have done.

When a Data Scientist Is Looking for a Job

Right now it’s a seller’s market for good data scientists above junior level where the data scientists are the sellers—you know, seeing as they sell their skills and expertise. The fact is that there is a high demand for skilled people but the supply is nowhere near sufficient to meet it.

For example, an average data scientist in London will receive between one and ten recruiter messages or calls every week. This means they are in a position of power when choosing what projects to work on.

So, to improve your chances of hiring the right person, you need to understand what data scientists want.

What Does a Data Scientist Want?

The first thing is compensation, of course. After all, data scientists are human—even if their skillset and availability can make them seem akin to a mythological beast at times—and they not only need money to live, but they want their value to be appropriately compensated.

However, offering a mind-blowing paycheck alone will not be enough because other things matter too, such as the team they’ll be working with, the problem they’ll be tackling, the technology stack they’ll be using, and the relationship to academia.

The Team

Some data scientists prefer to work in a team where they like the people they are working with. This usually means a geekier, less formal culture or even working from home.

Of course, this won’t necessarily work for every company as you can’t exactly change your whole culture overnight for the data scientist you are hiring, but you can try to make concessions where possible to improve the working relationship.

The Problem

While we’ve already discussed the dangers of giving to a data scientist problems that are too simple, you’ll also find that many data scientists have a bias for particular types of problems. For example, some might prefer working on problems that have a social impact, while others might stick to problems in fields they like, such as medicine.

Other times, they might have preferences related to the problem itself. For example, some people prefer working with text data.

Often, though, you will find that it is closely connected to doing and learning something new, which is a trait that defines data scientists.

The Technology Stack

The technology being used will also play an important role. Some data scientists will prefer working with stacks they are already familiar with, while others will want to learn new things. However, you will find that most data scientists will avoid old technologies and legacy code like the plague.

Sometimes learning new technologies can be extremely important to a data scientist. It could be because some up-and-coming technologies are dominating the landscape, like Python, and they might want to transition from MATLAB to Python and R, for example.

Or maybe the data scientist wants to expand their skillset because some technologies offer better pay and are more interesting, such as those to do with deep learning.

Relationship to Academia

Many data scientists come out of academia and are used to reading/writing papers and going to conferences. This is an advantage for everyone because conferences can be an essential part of being on top of their game. So, one way to motivate a data scientist is to provide a stipend for conferences.

And, as I already said, it will help you too because your data scientist will learn about the up-and-coming developments in data science, they will learn about fields or applications they weren’t previously aware of, and they will catch up with other people in their field.

Some of the data science conferences you can expect a PhD-level data scientist to want to attend include

ICML¹—The International Conference on Machine Learning
NeurIPS²—Neural Information Processing Systems
SIGKDD³—ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
ECML PKDD—European Conference on Machine Learning and Knowledge Discovery in Databases
AI Stats⁴

Avoiding Traditional Limitations

One frequent problem with hiring data scientists is that employers often take the same approach they would with hiring for any other role in the business. And that usually involves asking if the person has experience in a particular domain.

However, many data scientists, myself included, specifically seek out domains they have not worked in before. It’s all about the drive to learn new things and stave off boredom because something you haven’t done before will certainly prove more interesting than something you are experienced in.

The important thing that you, as a potential employer, need to understand is that the techniques a data scientist uses have general applicability and the domain itself is of secondary importance. In other words, don’t miss out on hiring a great data scientist because they don’t have experience in your particular domain.

Data Science Is a General Toolbox

The techniques in data science can be applied to a variety of problems. Experience in one thing can translate to experience in something else. So, for example, a data scientist might be able to use the same algorithms to predict disease outbreaks but also how many people might click on an ad.

Think of it like a pastry chef. They learn a wide range of techniques, including working with different kinds of dough, working with sauces, making creams, and so on. And they have a lot of experience making cakes. However, because they are highly experienced with the techniques, they can easily switch from making cakes to making pies.

For example, a common mistake I see made by human resources goes along the lines of

“We’re looking for someone who has 5 years of experience in TensorFlow,” says the HR department.
“Okay, but TensorFlow has only been around for 2 years,” replies the candidate or even the recruitment firm.

When it comes to data scientists, their breadth of knowledge is generally a better indicator of their skills rather than just experience.

The problem is that many employers don’t understand a data scientist’s skillset—not you, of course, because you’re reading this book and now have a much greater understanding of data science and what a data scientist does. The result is that many data scientists get stuck in a particular domain because of this lack of understanding.

Let’s look at a data scientist who works in finance as an example. In finances, data science would employ tools like time series and predictive modeling. However, these tools aren’t limited just to finances. They are equally effective for a number of other fields, such as in sports bettering or even retail. In retail, for example, you can use a time series to predict demand.

However, a large number of employers will see that the data scientist has experience in finance and will automatically assume his or her skillset is only applicable to finances. And this means they could be missing out on an amazing data scientist.

So, don’t make the same mistake and give these people a chance, even if they don’t conform to the strict model you have in your head of what the perfect hire would be.

Discovering Young Talent

Data science is a little like sports in that talent plays a huge role, especially because it requires a combination of technical skills and soft skills, as we’ve previously discussed. This person needs to be able to read other people, to make connections others can’t see, and to communicate well, beyond the technical aspects of their skillset. They also have to be able to smoothly integrate their soft and technical skills, which isn’t always as easy as it sounds.

So, when you find someone with a clear talent for this work, grab onto them with both hands, even if they are at the beginning of their career. Remember, someone who is a junior data scientist today can be your future senior data scientist if you give them a chance and nurture them.

How can you find young data scientists with talent? First, you want to check out the top universities teaching the subject. Then, once you’ve identified potential candidates, look beyond their university work. You can look on sites like GitHub⁵ to discover any outside work they’ve done.

You should also consider creating a mentorship program. This will allow you to identify people with the talent for data science, some of whom you might not have realized have the aptitude for it until you work with them up close and personal.

A Few Typical Data Scientist Dilemmas

When it comes to data scientists, they are faced with a few dilemmas in terms of whom to work for. The options are a startup vs. a bigger company. Each comes with its own set of pros and cons.

First of all, there’s the startup. When it comes to a startup, things will certainly be more interesting for a data scientist. They’ll be dealing with new technologies and new problems, and they’ll have a lot more flexibility. However, the downside is that they’ll likely get paid less and stability can be an issue because, after all, many startups don’t succeed.

Then you have big companies. The main advantages are that the paycheck will be much more impressive, as will the benefits package. There’s also the matter of stability. A big company is unlikely to go belly up overnight and leave a data scientist in the lurch.

On the opposite end of the spectrum, though, the fact is that a data scientist might end up pulling out their hair in frustration because the work will be boring and as far from stimulating as you can get because they’ll be dealing with old technology and legacy code, which they’ll have to clean up. So, no, it’s not going to be fun. At all.

Freeze Your Data Scientist Recruitment Drive Now

If your company isn’t ready for data science, you need to stop trying to hire a data scientist. What does it mean to be ready for data science?

Well, first of all, you need to make sure your company has a data-driven culture in place. Everyone in your organization needs to understand the value of data, and they need to be willing to use it effectively.

If everyone thinks data is a waste of time or just something you’re doing because it’s the latest “fad,” then you are going to waste your money and time hiring a data scientist.

Don’t worry though, because a little later we’re going to talk about how to build a data science culture and you’ll know exactly what you need to do.

The next consideration is whether or not you have the data available. If you haven’t collected the data, then you might as well wait until you have it. Otherwise, you’ll be paying a data scientist to sit around and look pretty. It will also antagonize the data scientist because, as I’ve already mentioned, boredom is one of the things they hate the most. And a data scientist with nothing to work is a bored data scientist who will likely end up leaving your company faster than you can blink.

Lastly, you need to make sure the data scientist’s work will have an impact. Just hiring someone to make data look pretty so you can show off but that’s about it is a waste of resources. You need to make sure that you have a clear goal in mind—an important question that needs answering—or you’ll be wasting your time and that of the data scientist’s too.

When you do hire a data scientist, don’t use them just as a highly sophisticated reporting tool. They are a lot more than that and you need to listen to their ideas and trust their recommendations. Don’t make the mistake of thinking you know it all.

After all, would you tell your electrician how to rewire your house? Or would you tell your surgeon how to operate on you? Of course not. You’d let them do their jobs and take their advice, because they’re the experts. So, why would you do anything differently with your data scientist?

You also need to let them spread their wings. Don’t freak out if they want to tackle the problem using a completely new approach. Just because it’s not in line with what you had in mind doesn’t mean it’s not effective. In fact, trying something new is the best way to find the most effective solution.

Remember, doing the same thing over and over again and expecting different results is the definition of insanity. So, you need to give your data scientist the freedom to try something new.

Just put things a little more in perspective for you regarding why you shouldn’t be hiring a data scientist if you’re not ready; I remember a situation a while ago where a company hired a data scientist. However, the data scientist resigned after only 2 months, and they weren’t happy when they did so.

What happened was that the company didn’t have any data for the scientist to work on. They were still collecting it, but they jumped the gun, somehow thinking they needed a data scientist on board to look pretty while they were still gathering the information.

Even worse, though, was the fact that they didn’t have a clear goal. They just wanted insights, which is way too vague a goal for anyone to do anything with. Of course, with no clear goal, you can imagine that the data collection process was also not up to standards.

To be honest, I’m pretty sure the data scientist didn’t quit just out of boredom—though that was an issue. He probably knew he’d have to deal with people who had no understanding of what he did. He’d have to educate them, then force them to set a goal, and then try to turn a bunch of random data into something he could work with. It would have been more hassle than it was worth.

Data Science Tribes

Now we’re going to look at the different types of data scientists. I call them tribes because I feel it’s a very good word to describe the different groups of people within data science. You don’t have just one type of person because you don’t have just one way to become a data scientist.

It’s not like becoming a doctor where your only path is to go to med school, then do your residency, and get certified. When it comes to data scientists, there are several ways to get to the same end result.

This is why I believe we can basically split data scientists into three major tribes and three smaller tribes.

In terms of the major tribes, we have the computer scientist, the statistician, and the quantitative specialist who hails from some other field.

The smaller tribes consist of the self-taught data scientist, the software platform user, and the domain specialist.

It’s also important to note here that data scientists and data engineers are now considered to be completely different things, even if in the early days the two roles were frequently mixed up and even if data scientists usually have a computer science background but also have some data engineering skills.

The Major Tribes

So, let’s take a quick look at the tree major tribes, namely, computer scientists, statisticians, and other quantitative specialists.

Computer Scientists

Computer scientists are the people who have degrees in computer science and then did a master’s or PhD in machine learning. The advantages are that they usually have very good skills in coding, databases, and software, but the issue is that they usually ignore traditional statistical techniques in theory, which can be useful in some particular domains and problems.

Computer scientists are very good for issues in machine learning tasks such as predictive modeling. Quite often they’ll have experience in Kaggle competitions, which is a prime example of this kind of problem.

Note that Kaggle is a platform that runs predictive modeling and analytics competitions.⁶ Companies and users upload datasets and then statisticians, data scientists, and/or other specialists compete to create the best possible models for predicting and describing those datasets. It’s a form of crowdsourcing and relies on the idea that countless strategies can be employed in a predictive modeling task and one can’t know ahead of time which analyst or approach will be the best. Kaggle is now part of Google Cloud.⁷

You’re given a dataset and you just want to find a good algorithm to predict something. The fact that most computer scientists have solid coding skills means it’s easier to integrate their work with the rest of the platform, especially if you’re a small company or a startup. The majority will be using Python and won’t have any problem writing their own APIs. So, essentially, you can use their skills to do many things at once, as they can write the code and integrate it as well.

Also, they have solid database skills, so they can do a lot of data engineering work. If you have your own data engineers, it’s still worth it because it’s easy for these people to retrieve the data, wrangle it into something usable, and so on. This will save your developers time. They might also be able to make suggestions pertaining to the structure of the database, making your life even easier.

However, they have a lack of proper knowledge of statistics so for some particular problems where statistics is appropriate, such as research design, these people can’t really help you.

Statisticians

Statisticians are people who usually have a degree in statistics. They might even have a master’s or PhD, which can be in statistics or machine learning. They generally have a good theoretical grounding in the field, but they don’t have the coding or database skills.

Statisticians have very solid knowledge of statistics and theory. They are the best people if what you are interested in is research design or statistical modeling because you want to know the driving factors behind something.

They’re also the best option if you want to model advanced complicated problems while at the same time making sure that the modeling process is transparent, and you understand what’s happening.

They are also excellent at taking a critical look at the work of other data scientists because of the rigorous training in math and theory.

The problem is, though, that most of the time they don’t really have very good skills in coding and databases. The most likely language they’ll be using is R, which they will use as a tool. They probably don’t have any experience in other languages or in databases, which will make it more difficult and time-consuming to integrate their work within the system.

Also, some statisticians don’t have much training in predictive modeling. On its own, predictive modeling is rarely considered to be important in the context of statistics and is more aligned with machine learning.

So, check if they have any Kaggle experience. It also pays to ensure that they’re familiar with some sort of machine learning notions, such as cross-validation, if you’re interested in using their skills for predictive modeling.

Other Quantitative Specialists

Finally, we have what I call other quantitative specialists . These are people who come from disciplines heavy in math, such as physics, mathematics, actuarial science, econometrics, and so on. These people might have a master’s degree or PhD degree, but this won’t be in machine learning or statistics.

These are basically people who figured out that there are more (and better jobs) in data science than in their field of study, and they are trying to change careers. They are trying to capitalize their knowledge in math and/or coding in order to make that happen. You will see a huge variation in terms of skills and knowledge in this tribe.

These quantitative specialists are a bit special because they often have some solid skills and bring a diversity of thinking that’s very important in data science. Furthermore, if the problem you’re dealing with is in their specific domain, they are definitely amazing people to hire.

However, the problem is that they often lack rigorous training in machine learning or statistics, which can pose a serious problem in certain situations.

So, when it comes to other quantitative specialists , it tends to be subjective in the sense that it depends on the person. However, the main drawback is that these people often lack formal education in the field.

They tend to be self-taught and you need to understand this fact and check to see how effective they are. You can likely find examples of their work on GitHub or Kaggle, for example. And you really want to check beforehand. Please note that being self-taught is not a bad thing—far from it. However, you need to check to ensure their skills are on par with your needs.

These are often people who studied one field and discovered the pay isn’t that brilliant or finding a job was virtually impossible. Then they discover being a data scientist comes with a hefty paycheck, so a chemist might decide, for example, to create a few mini projects, upload them to GitHub, or take part in a few competitions on Kaggle in an attempt to find a job as a data scientist.

In my experience, of all the specialties these people have, I’ve found that the people who are the most effective are physicists. Physics is full of applied math, which makes it easy for them to read machine learning papers, especially since it’s, in essence, the same type of math. Furthermore, physicists need to be able to do a little coding themselves, so the skillset is similar to the one required in machine learning.

Something important to note is that domain specialists will have expertise in their respective fields, which might prove useful to you. For example, you might want to hire someone who has some experience in econometrics and a little bit of machine learning knowledge because your problem is related to econometrics in some way. Or, you might want to hire a physicist because you are working with sensor data or radar data.

However, you have to be careful with these other quantitative specialists because some of them will try to fake it. They’ll go through a few tutorials and online courses for 2 weeks and, suddenly, they believe they’re data scientists. And that’s just because they want to get the job. You have to be really cautious of these people because they can derail your whole project and cause a lot of problems.

This doesn’t mean that you shouldn’t consider people who have little experience but are honest about it and prove they are smart and willing to learn. They want to gain more experience and are intelligent people who have the potential to become outstanding data scientists.

However, in my opinion, it’s best to take these people on in a junior position with someone more experienced to oversee them. Of course, the person overseeing them should be a senior data scientist. If you don’t have a senior data scientist , hire one first. Then, you can follow their progress as a junior and see how they grow. Depending on their evolution, you can choose to promote them or not.

Convergence Point

It’s important to note that after about 5 or 6 years of experience, there’s usually a convergence point in all the tribes. In other words, when someone has gained a lot of years of experience and has participated in quite a few projects—let’s say between 10 and 30—and they’ve worked for different companies in various fields, they’ve usually picked up a wide range of skills. If they have a PhD, that’s just an added bonus.

So, they’ll have done some coding in both R and Python, and they’ll have picked up a little of this, a little of that, and, pretty much, a little of everything. They also have a very good awareness of what’s out there. This means that they know all the methods and techniques that are available, even if they don’t know how to implement them themselves.

This is important because they know how a problem should be solved. And even if they don’t know how to do it themselves, they’ll know who to speak to and who to hire to get the work done. Sometimes, it’s not important to have the skill per se, but to be aware of one’s limitations and have the knowledge of what solutions are available and how to access the people who can implement them.

So, if you see someone who started out as a physicist but has been in data science for 10 years and has worked for a lot of companies in a wide range of fields, then this is the type of person who will likely fall in this category.

Thus, when you find someone who has a lot of experience, you shouldn’t worry too much about their background. What’s more important is what they’ve learned over the years and whether they actually have the skills and the awareness of all the techniques and methods that are available.

What you do need to be careful about is when someone has a lot of experience, but they’ve worked on the same one or two projects for years and years. These are not the type of people I’m referring to. I’m talking about people who have actively tried to learn new things over the years, such as gaining experience with R, with Python, maybe a little research design, and so on and so forth.

The Smaller Tribes

So, now let’s take a look at the smaller tribes. When I say smaller tribes, I’m referring to the fact that there aren’t as many people in these tribes as there are in the larger tribes. Also, these people tend not to really be data scientists but mainly people with some analytical skills.

Thus, we have the self-taught people. These are usually people who studied something random or maybe even software development. They might have participated in a few Kaggle competitions and that’s about the limit of their experience.

While the quantitative specialists from the larger tribes might also be self-taught in certain respects, their discipline will have already taught them some skills which carry over to data science.

In this particular situation, I’m referring to someone who studied humanities, for example. An art historian who suddenly discovers their field doesn’t offer a lot of professional opportunities or an archaeologist who’s discovered that digging through the dirt isn’t quite as fun or financially rewarding as they had hoped, and then they suddenly decide to become a data scientist.

Then we have the software platform user , which is someone who just knows how to use a specific tool, such as a dashboard, or a software like Weka⁸ (which you see in Figure 10-2) or RapidMiner.⁹ These people can offer real value for money if you only have a simple problem. For example, if all you need is some reporting, then these people are the best option because they’ll definitely be cheaper.

../images/490014_2_En_10_Chapter/490014_2_En_10_Fig2_HTML.jpg — Figure 10-2
The Weka Explorer graphical user interface

They’ll also be a lot happier with the work you give them compared to someone with a PhD in deep learning, for example. Otherwise, it would be like hiring a master chef to fry two eggs when as line cook can do just as good a job without being irritated by the work.

However, someone like this won’t be able to help you with more complicated problems because they might actually fall into the danger zone we discussed previously and is shown in the Venn diagram at the beginning of this chapter. In other words, they lack a proper background in math and statistics, so they’ll be able to apply the tool to certain situations, but they aren’t the people you want to trust with building a model or predictive tool.

Last but certainly not least, we have the domain specialist . This is someone who has very advanced knowledge of machine learning but limited to a very small niche. One of the most common examples is someone who is specialized in deep learning and computer vision. These are people who have done undergrad work in computer vision or have a master’s or PhD in computer vision.

Now, if your problem is computer vision-related, then they are definitely the ones to hire because they will be amazing. But if you have a different issue, these aren’t always the best people to turn to. Obviously, it won’t be difficult for them to pick up new skills if they are doing machine learning or statistics on an advanced level. However, it might take them a few months to get up to speed with the knowledge required to solve different types of problems.

So, you have to be careful because if they fall in the danger zone, these domain specialists could cause more problems than you realize. It’s really a good time to keep in mind the saying that if all you have is a hammer, then everything looks like a nail. For example, someone who is an expert in deep learning and has been doing it their whole life might believe that they can solve every problem using deep learning, which is not accurate. This is why you need to be careful. After all, sometimes you just need a fly swatter to get rid of that pesky buzzing.

However, as I previously mentioned, you should consider being more open to people who are aware of their limitations and are looking beyond their usual domains because they want to try something new. These people know it will take them a few months to learn the particulars of the new field, but they are more than willing to put the work in. Sometimes, taking this route pays off in more ways than one because you’ll also earn their loyalty in the process as there aren’t as many employers out there who are as enlightened as you and willing to give them a chance.

Example: How To Evaluate A Data Scientist?

Whom would you rather hire to become your data scientist?

1.
Someone who has finished in a top 10 university with an MSc in Machine Learning?
2.
A developer who taught himself how to do data science?
3.
Someone with a degree in statistics and 10+ years of experience?

The truth is that there is no correct answer. Job performance can depend on multiple factors. A quick search on Google Scholar returns a huge number of results. For example, the search term “IQ and job performance” returns more than 200,000 studies, most of them citing a correlation between IQ and job performance of around 0.5, which is significant. The term “years of experience and job performance” returns more than 4 million results, and the term “job performance predictors” returns more than 1 million results, with papers citing factors from emotional intelligence to personality.

The truth is that it in every case is different. In data science and technology, it is easy to find cases of employees with great credentials, who fail to perform on their role. Cultural fit, personality factors, and personal circumstances can all play a role in an employee’s performance.

However, what you can do in order to make sure that employees perform at their best is to create the right environment. This is why in the appendix you are going to find some tools that can help you structure a data science project in a way that you can define clear outcomes and understand what skills someone should have to help you out with this project.

As a rule of thumb, when you are hiring for projects that are non-time critical, then cognitive abilities, IQ, and drive might be the most important factors.¹⁰ If, for example, you have a data science team in place that can help nurture someone with the right drive, then you might be able to get someone on a not very high salary, who will grow alongside the rest of your data science team.

On the other hand, when you are under time constraints, the best approach is to hire someone with the lots of experience and credentials. If the challenge you are facing is very well-defined, then hiring someone who has solved this exact challenge in the past can be a great idea.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 10. Hiring and Managing Data Scientists

Create new playlist

Sign In

Sign Up