Chapter 10


Building your team

Building a good team is always difficult. What makes it even more difficult for big data and data science are:

  • severe shortages of qualified talent;
  • lack of recruiter experience scoping and recruiting these roles; and
  • the difficultly of staffing innovative projects with which few or no candidates have experience.

The skill shortage for big data and data science has lasted several years. I’ve heard in-house recruiters complain about it, and I’ve seen it highlighted as a major theme at industry events. Service providers and consultants have seen the mismatch in supply and demand and many are rebranding existing staff to sell services they are, in fact, unable to deliver.

In this chapter, I’ll cover key roles related to big data and data science, as well as considerations for hiring or outsourcing those roles. To start, consider the mystically titled role of ‘data scientist’.

Data scientists

This relatively new job title has enveloped a dozen traditional job titles and taken on a life of its own. Whereas accountants may be ‘chartered’, physicians may be ‘licensed’ and even first-aid workers are ‘certified’, anyone can call themselves a ‘data scientist’.

The term ‘scientist’ has traditionally referred to creative individuals who apply any available tool to observe and interpret the world around them. The term ‘engineer’ would then be someone trained in a specific application. With the changes in available data sources and methodologies, such as the application of AI to unstructured big data stores, we found ourselves needing to move beyond our predefined (engineering) analytic methods, such as statistics and numerical optimization, and creatively apply a wide range of tools to a wide range of data sources: tools such as neural networks, support vector machines, hidden Markov models, calculus-based optimization, linear and integer programming, network flow optimization, statistics, and additional methods that have proven useful within the broad fields of data mining and artificial intelligence. We apply these methods to any data we can find, not only the familiar data within corporate files but also web logs, email records, machine sensor data, video images and social media data. Thus, the term ‘science’ became more appropriate for a field where practitioners were creatively moving beyond traditional methodologies.

Today we use the term ‘data scientist’ to encompass not only those experts who are creatively expanding the use of data but also anyone who ten years ago might have been called a statistician, a marketing analyst or a financial analyst. We have created a term so rich in meaning that it has become almost meaningless.

The Harvard Business Review wrote in 2012 that data scientists had ‘the sexiest job of the twenty-first century’.73 Glassdoor, a career portal, listed data scientist as the best job in America for both 2016 and 2017.74 It is thus not surprising that recent years have seen a flood of semi-qualified job candidates entering the field, further muddying the recruitment waters. Data from the job portal Indeed.com shows a levelling out of data science positions over the past few years (Figure 10.1), while the number of job candidates for such positions grew steadily (Figure 10.2), which is not to say that the number of qualified candidates has increased. This surge in job seekers emphasizes the importance of properly screening candidates.

Figure 10.1 Percentage of job postings including the term ‘Data Scientist.’

Figure 10.1 Percentage of job postings including the term ‘Data Scientist.’

Figure 10.2 Percentage of candidates searching for ‘Data Scientist’ positions.

Figure 10.2 Percentage of candidates searching for ‘Data Scientist’ positions.75

Despite its inherent vagueness, you’ll want to include the term ‘data scientist’ in your analytic role descriptions for purposes of keyword search, followed by concrete details of what you really need in the candidate. You may see this job title connected to any of the specific roles I’ll describe in the next section. Internally, focus your recruitment efforts on the specific competencies you need, rather than on the term ‘data scientist’.

Let’s look now at the specific job roles you’ll want to fill for your big data and data science initiatives.

Data and anaytics roles you should fill

Platform engineers

If you’re not utilizing Infrastructure as a Service or Platform as a Service offerings, you’ll need staff to get your specialized computer systems up and running, particularly the distributed computing clusters. Some common position titles related to these functions are ‘systems engineers’, ‘site ops’, and ‘dev ops’.

Data engineers

Preparing data for analysis is more time consuming than doing the analysis. You’ll need to Extract the data from source, Transform/clean the data, and Load it in tables optimized for retrieval and analysis (the ETL process). Specialized software will help, but you’ll still waste a lot of time and see huge performance losses if you don’t have specially trained people for this task.

Your data engineers should:

  • Have expertise in using multi-purpose ETL tools as well as data manipulation tools for the big data ecosystem (tools with names such as Pig, Storm, etc.).
  • Have expertise in designing data warehouse tables. Depending on your tooling, this may include OLAP cubes, data marts, etc. If the database tables are poorly designed, your reports and data queries may become unusable due to instability and lag. These specialists in database design will also be able to help colleagues write optimized data queries, saving development effort and reducing query execution time.

The role of data engineer has become very difficult to fill in some geographies, but if you don’t get specialist data engineers, others in your team will waste time covering this critical but specialized task, with at best mediocre results. I’ve seen it before, and it’s not pretty.

Algorithm specialists

Your most innovative projects will be done by experts using mathematics, statistics and artificial intelligence to work magic with your data. They are writing the programs that beat the world champion at Go, or recommend your next favourite movie on Netflix, or understand that now is the right time to offer the customer a 10 per cent discount on a kitchen toaster. They are forecasting your Q2 revenue and predicting the number of customers you’ll see next weekend.

The people you hire for these tasks should have a strong background in mathematics, usually a degree in maths, statistics, computer science, engineering or physics, and they should have experience writing and coding algorithms in a language such as Java, Scala, R, Python or C/C++. They should preferably be experienced in object-oriented programming. If you are developing a highly specialized algorithm, such as for image or speech recognition, you will probably want someone who has completed a PhD in that area.

There are a few key skills I look for in building the team of algorithm specialists. These skills may not all be present in one person, but they should be covered within your team.

  • Expertise in statistics. You’ll use statistics in A/B testing and in forecasting and there are statistical models and techniques you’ll want to consider for many other applications. Most team members will have a basic knowledge of statistics, but it’s good to have someone with specialized knowledge.
  • Expertise in mathematical optimization. You’ll want to cover the bases of multivariate calculus-based methods (e.g. quasi-Newton and gradient descent), linear and integer programming, and network flow algorithms. These are important tools for certain applications, and without them you’ll eventually end up pounding in screws with a hammer.
  • Expertise with a general algorithm prototyping tool. You’ll want someone who is trained on a tool such as KNIME, RapidMiner, H20.ai, SAS EnterpriseMiner, Azure ML, etc. and who can leverage the modelling and data processing libraries to rapidly experiment with a variety of diverse models, possibly throwing together a few ensembles (collections of models that together ‘vote’ on a result). For a certain classification problem, for example, they might compare results from a statistical regression vs results from a support vector machine vs results from a decision tree, quickly determining the most promising model for future development and eventual deployment.
  • Strong algorithmic coding skills. The code you eventually put in production should be well-designed and efficient. An algorithm can run very slowly or very quickly depending on how it is coded. For this reason, you want some team members who are especially proficient at coding production-ready algorithms in your production language. Someone on the team should also have a good understanding of computational complexity, which relates to the scalability of algorithms. If doubling the problem size makes your technique 100 times slower, then your technique will not remain usable as the problem size grows.

For the algorithm specialist role, look closely at the candidates’ degrees and Alma Maters. Some universities are much stronger than others. Be aware that countries differ in the effort required to earn a degree. To further complicate the matter, some universities may not be top-ranked overall but are world leaders in specific fields. You may be surprised to see the computer science programme at the University of Washington ranked above the programmes at Princeton and Harvard. Finally, keep in mind that the difference between two PhD graduates from the same school can still be wide enough to drive a truck through.

Keep in mind

Educational background and experience with well-known companies can be strong signals of candidate strength, but they should not dictate your hiring decisions.

For some roles related to algorithm development, particularly those requiring extreme innovation, we value high intelligence and creativity more than relevant experience. Several years ago, a friend interviewed at one of the world’s top hedge funds. The entire interview process consisted of five to six hours of solving brain teasers, with almost no questions related to the financial markets or even coding. This company was looking for raw creative intelligence in their algorithm developers, and they trusted that the right people could learn any relevant subject matter as needed. Although this may be a viable tactic when hiring algorithm developers, it’s not appropriate for roles such as data engineers and business analysts.

Business analysts

Most of the ‘data scientists’ that you hire will probably be what I would call ‘business analysts’. These analysts are business thought partners and answer basic but important data questions asked by your business units. They typically use basic technologies to gather data and then spreadsheets to analyse the data and deliver results. In other words, these guys are great with Microsoft Excel.

There are various schools of thought as to where these analysts should be positioned within the organization, with some companies grouping them in a centralized team and some embedding them within business units.

Centrally located analysts can more easily share knowledge and can be allocated on demand to the highest priority projects. Dispersed analysts can leverage the insights and quick feedback available as part of a business team. The decentralized model probably occurs more often in small to mid-sized enterprises, as it does not require executive sponsorship but is funded at department level and justified by an expressed business need for data insights.

In either case, encourage the business analysts to keep strong lines of communication among themselves, with the algorithm developers and especially with the data engineers. The business analysts will provide valuable business insights to the algorithm developers, who in turn can propose innovative solutions to front-line challenges. The data engineers should actively assist the business analysts with data extraction, or else the analysts will waste time writing sub-optimal queries.

Web analyst(s)

Customer online behaviour is a very important data source. You can choose from a broad selection of mature web analytics products, but whichever tool(s) you choose should be managed by a trained specialist who keeps current on developments in web analytics and related technologies (including browser and mobile OS updates).

Your web analyst will oversee web and app tagging and make sure that online customer activity is collected effectively. Some web analytics tools can also collect data from any connected digital device, not only browsers and apps, and the web analyst can assist with this data consolidation. The web analyst will create conversion funnels and implement custom tagging, and will monitor and address any implementation problems that may arise, such as data errors related to browser updates. They will assist merging internal data with web analytics data, which may be done within the organization’s databases or on the web analytics server.

Your web analyst will also be an expert in extracting data, creating segments, and constructing reports using available APIs and interfaces. For this reason, this person may be actively involved with A/B testing, data warehousing, marketing analysis, customer segmentation, etc.

Reporting specialists

You’ll benefit greatly if you hire or train staff skilled at creating top-notch graphs and tables. This requires a mixture of art and science and should be done by people who excel in, for example:

  • Selecting the table or graph most suited to the use-case. For example, trends will jump out from graphs much more quickly than from tables, but tables are better for sequential review.
  • Selecting the layout and format most appropriate to the data. For example, reports with time series data shown vertically are not intuitive.
  • Reducing visual clutter, freeing the recipient to focus on the most important data. This is rarely done well.
  • Leveraging principles of gestalt and pre-attentive processing.
  • Selecting shapes and colours that minimize confusion.

Stephen Few has written multiple books covering best practices for data visualization.60, 7679

On a technical level, the reporting specialists should be comfortable writing database queries to extract data from source systems, and they should be trained on your BI tool(s).

Leadership

Leadership is key to the success of your analytics programme. In the CapGemini survey referenced previously, close to half the organizations were already engaged in organizational restructuring to exploit data opportunities, and a third were appointing senior big data roles, recognizing that data opportunities spanned the breadth of their businesses.

My clients sometimes ask me to help scope requirements for and recruit analytics leadership. This ‘lead data scientist’ role is typically opened by the company for one of two reasons:

  1. The company is looking to launch a new department to leverage data science and/or big data, or
  2. The company has tried to launch such a department using existing management and has realized (the hard way) their need for fresh, specialized leadership.

I’ve conducted several hundred interviews for analytics roles over the nearly 20 years that I’ve worked in financial and business analytics, and I’ve screened even more CVs. The candidates with whom I’ve spoken have come from across the world, many having completed world-class technical graduate programmes or MBA programmes at schools such as Wharton, Chicago Booth or Oxford. It’s been a real privilege to find and hire many excellent people over the years.

Filling a lead analytics role, however, is particularly challenging because of the complex requirements the candidate must satisfy.

Possession of three unrelated skill sets

The lead role requires a strong blend of technical, business and communication skills; skills that often correlate negatively. Individuals excelling technically often have proportionately less interest in mastering communication with non-technical business colleagues and may prioritize technical innovation above business value.

Breadth and depth of technical skills

From an analytics perspective, the leadership role requires both familiarity with a broad range of tools and techniques and an experience-based understanding of what is involved with in-depth technical implementations. There is certainly space in an organization for specialists in areas such as statistics, deep learning, NLP, or integer programming, but for the lead role, the right candidate must have an overview of the entire analytic tool chest, allowing them to select techniques that best address business problems and to recruit specialized talent as needed.

The leader must also be familiar with relevant tooling, including database technologies, programming frameworks, development languages and prototyping tools, examples of which were given above. The technology space is already quite broad, and it continues to expand. Properly leveraging existing technologies can easily save months or years of in-house development.

Ability to deliver results

Initiatives will almost certainly fail if the analytics leader cannot:

  • understand tangible business drivers and KPIs;
  • identify appropriate data science techniques, tools, and applications, typically drawn from cross-industry studies;
  • execute the analytics projects in a lean manner; and
  • communicate vision so as to win buy-in from peers.

The hiring process for the lead role

There are three phases through which I typically progress alongside a company recruiting the lead role.

  1. Aligning with the recruitment team The internal recruiters are usually a pleasure to work with and are typically eager to learn about new profiles. The lead analytics role is almost always new to them in its skill sets, technologies, background and business experience, and so we work closely over multiple sessions to scope the role, identify appropriate distribution channels, and review candidates.
    It’s important to think about salary early in the process, as you may not realize the high premium this role commands in the job market. You’ll lose qualified candidates if it takes too long to bring salary expectations to market levels.
  2. Finding strong candidates This is perhaps the most challenging part. You are looking for someone to take complete ownership of your analytics programme, and, depending on your organizational structure, possibly of data governance. Create a set of general and detailed questions spanning the competencies you feel are most important for the position and give the candidate space in the interview to communicate their own passions, ambitions and experience.
    You’ll find it difficult or impossible to probe the candidates’ analytics expertise yourself, but you can focus on past achievements and the candidate’s vision for this position. Bring in your technology team to assess the candidate’s understanding of technology, and your business leaders to make sure they are comfortable with communication and business acumen.
  3. Landing the candidate The top candidates will have many job options. Offer a competitive salary and follow up closely with the candidate to quickly address any ancillary concerns.
    For lead data science roles, my experience is that strong candidates will be drawn most by the opportunity to work with interesting and abundant data and by the opportunity to contribute in creative and meaningful ways without heavy-handed interference.

Recruiting the data team

Because big data roles have only existed for a few years, many external recruitment firms struggle to understand the profiles they are being asked to fill. Some third-party recruiters I’ve spoken with are not able to distinguish between a data engineer and an algorithm developer. They are not familiar enough with the rapidly changing technology landscape to match skills and experience on a C.V. with requirements for a posting, let alone to assist you in writing specifications that best describe your needs. They may struggle to present the role in a way that is attractive to top talent and may end up recycling old job postings, demonstrating to candidates a disconnect with modern technology.

Generalist recruitment firms compete with internal recruiters at technology companies, who are actively poaching specialist recruiters. You should rethink your traditional methods of sourcing candidates, broaden your network of third-party recruiters and make conscious efforts to help internal recruiters understand the nature of the new roles as well as the preferences and quirks of target candidates. Send your recruiters to a good data conference to get them up to speed with concepts and terminology and to help them in networking.

Case study – Analytics staffing at ‘the most promising company in America’

Instacart, an online company providing same-day grocery deliveries, was founded in Silicon Valley in 2012 by a former Amazon employee. In 2015, Forbes called it ‘the most promising company in America’. By 2017, it had grown to over 1000 employees and a market valuation of several billion dollars.

Instacart uses machine learning for several important applications, such as to decrease order fulfilment time, plan delivery routes, help customers discover relevant new products, and balance supply with demand.

In a recent interview, Jeremy Stanley, Vice President of Data Science, elaborated on analytics staffing within Instacart. Their data people are divided into two categories:

  1. Business analysts, who use analytic methods to help guide strategy and business decisions.
  2. Machine learning engineers, who are embedded within functional teams to build software that will be deployed into production.

They only hire ML engineers with solid experience, but they have also trained internal software engineers to be ML engineers, a process that typically takes about one year. Although none of their business analysts have transitioned to the role of ML engineer, they estimate it would take two to three years of training to teach these business analysts the development skills necessary to write production-ready ML software.

They feel recruitment is most difficult at the top of the funnel (finding the candidates), but is helped by:

  1. Tapping the networks of current employees.
  2. Publicly talking about interesting projects (they recently wrote a blog post about a deep learning application).
  3. Giving back to the community by open-sourcing projects and data sets and by hosting competitions.

Their decentralized model pushes much of the hiring and mentoring to the data science VP, who estimates his time is evenly split between hiring, mentoring and hands-on project work.

Hiring at scale and acquiring startups

The hiring challenge is compounded when it needs to happen at scale. You may want to staff up rapidly after you’ve established the value of an analytics effort through a proof of concept. According to a recent McKinsey survey of 700 companies, 15 per cent of operating-profit increases from analytics were linked to hiring experts at scale.80

You can fill some positions by re-allocating internal resources, particularly those positions that require only general software development skills or a general analytics background. For more specialized skill sets, particularly within AI, companies often fill staffing needs by acquiring smaller, specialized companies, particularly startups. We saw this at eBay in 2010, when eBay quickly scaled its pool of mobile developers by purchasing Critical Path Software. We see it still within AI, with Google’s acquisition of DeepMind (75 employees at the time) and Uber’s acquisition of Geometric Intelligence (15 employees). Salesforce, which is pushing its AI offering in its Einstein product, acquired key AI staff in 2016 through its acquisition of the Palo Alto-based AI startup MetaMind, with the expressed goal to ‘further automate and personalize customer support, marketing automation, and many other business processes’ and to ‘extend Salesforce’s data science capabilities by embedding deep learning within the Salesforce platform.’81

Figure 10.3 Rate at which AI companies have been acquired 2012–2017.

Figure 10.3 Rate at which AI companies have been acquired 2012–2017.84

GE, a company with over 10,000 software developers and architects, recently launched an IoT software platform called Predix. They grew the Predix team from 100 employees in 2013 to 1000 employees in 2015, with plans to retrain their entire global software team on the new platform.82 This rapid growth was also fuelled by acquisition. They hired the co-founder of key technology provider Nurego as Predix general manager, subsequently acquiring the entire company.83

Figure 10.3 illustrates the increasing rate at which AI companies have been acquired over the last few years.

Outsourcing

You can bring external resources to supplement your in-house staff or you can outsource entire projects or services.

Outsourcing projects facilitates agile development and allows you to focus on your core strengths. In terms of agility, outsourcing allows you to quickly secure very specific expertise in technologies and data science applications. A third party may be able to start work on a project within a few days or weeks, rather than the several months sometimes needed for internal resources that would need to be re-allocated or recruited (both of which are difficult for proofs-of-concept).

Owing to their specialized experience, a small team of externals might complete a proof of concept within a few weeks, whereas an internal team without comparable experience could easily take several months and would be more likely to fail. This tremendous boost in speed allows you to quickly determine which analytic initiatives bring value and to start benefiting as soon as possible.

The daily cost of external resources may be several times higher than internal salaries, but when you consider the difference in development time, they may well be more cost-effective. When you move the technology from proof of concept to production, you will want to move the expertise in-house but will then have the business case to support the long-term investment.

Many organizations hire externals to supplement in-house staff, putting externals within their internal teams. Supplementing staff with externals serves three purposes.

  1. It provides quick access to otherwise difficult-to-hire talent.
  2. It gives you the flexibility to cut headcount when necessary (this is particularly valuable in countries with strong labour laws, such as within Europe).
  3. It impacts your financials, lowering headcount and providing options to move OpEx to CapEx, both of which may be interesting for investors.

Keep in mind

Bringing in external experts may be the best way to jump-start a project or do a proof of concept.

A word of caution on outsourcing: it can be quite difficult to find high-quality data science consultants. Quality varies significantly even within the same company. Since your projects will by nature be R&D efforts, there is always a chance they will result in little or no tangible benefit, regardless of the strength of the analyst. Thus, it is especially important to maximize your odds of success by bringing in the right people. If possible, look for boutique consulting firms, where the company owners are involved in monitoring each project.

In the end, if you’ve managed to assemble a strong internal team and a reliable set of externals to call on when needed, you’ve probably done better than most of your peers.

For small companies

If you are leading a smaller company or working alone, you probably won’t have the resources or the requirements for a full data team. With only a few end users, you won’t be as reliant on the skills of specialized data engineers. You also won’t have enough consumers of reports and dashboards to justify hiring a reporting specialist, and you’ll probably not have the resources to commit to a full machine learning project.

Your ‘minimum viable product’ for a data team in a small company would be to place the web analytics responsibility within your marketing team and to hire an analyst who can cover business analytics and reporting. The minimum skills for this analyst are:

  • A strong mathematical background, including an understanding of basic statistics.
  • Database skills, including experience working in SQL (standard query language).
  • Good communication skills, including the ability to create clear graphs and tables.
  • The ability to be a thought partner in solving business problems.

Although you typically won’t launch internal machine learning projects, at this stage you can still take advantage of the pay-per-use offerings of some of the larger vendors without needing to understand how they work. Examples include the image and text recognition software of Google Cloud Vision API, Salesforce Einstein and Amazon AI.

Takeaways

  • The term ‘data scientist’ is too broad to be useful in recruiting.
  • There are 6–7 key skills you should have in your team for big data and data science projects.
  • Recruiting analytics leadership is difficult, but important.
  • Traditional recruiters may lack the expertise to recruit the roles you need.
  • Consultants can be extremely helpful in starting new initiatives, but carefully check whether they have the relevant skills.
  • Larger companies are increasingly scaling their analytics talent through acquisitions.

Ask yourself

  • Which of your recruiters (in-house or external) understand the requirements for each of the seven data roles described in this chapter? If none, start speaking with new agencies.
  • Who is the most senior person in your organization with a vision for data and analytics? Many companies are appointing C-level leadership in data and analytics. How would such a role fit within your organization?
  • If you were to acquire a smaller, specialized company to quickly build your analytics capacities, what would that company look like? Think of location, size and skill set.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.10.32