Chapter 8

Developing and Accessing Big Data Competencies

In This Chapter

arrow Understanding the skills shortage in big data – and how this may affect your business

arrow Building or beefing up the necessary big data skills in-house

arrow Turning to external providers for your big data needs

Demand for big data expertise is growing every day, as more and more companies become aware of the benefits of collecting and analysing data. Unfortunately, the number of people trained to analyse this data isn’t growing in line with the demand. This creates a challenge for companies looking to hire expert people.

warning For companies of all sizes looking to unlock big data’s potential, there’s one big hurdle to overcome: stiff competition in hiring the necessary staff.

However, hiring in-house staff isn’t the only way to access big data skills. You can train up your existing people, work with external providers or partner with other interested parties. In this chapter, I look at all the options for tapping into big data skills. But first, I explore the skills shortage in a little more detail to see why it’s such a big deal.

Big Data and the Skills Shortage Challenge

At the end of 2015 there were expected to be 4.4 million big data jobs globally in governments and every sector of industry. Combine this with a shortage of people trained to carry out the analysis needed and that’s a lot of unfilled vacancies.

remember Big data skills are in high demand, which drives up wages and makes it difficult to attract good people without breaking the bank. Small and medium-sized companies are usually unable to compete with the big corporations when it comes to wages and benefits. This means you may need to get creative in order to access the skills you need – more on this in the section, ‘Thinking outside the box’ later in the chapter.

Data scientists are currently such hot property that the Harvard Business Review even called it the sexiest job of the 21st century! Perhaps this means kids will grow up with dreams of becoming data scientists instead of pilots or astronauts?

Unfortunately the role of a data scientist is often ill-defined within the field and even within a single company. People throw the term around to mean everything from a data engineer (the person responsible for creating the software that collects and stores the data) to statisticians who merely crunch the numbers.

warning Because there aren’t enough true data scientists out there to fill the need, less qualified (or unqualified!) candidates make it into the ranks. Many calling themselves data scientists are lacking the full skill set I’d expect.

For example, I’ve seen people who don’t have any understanding of big data technology or big data programming languages call themselves data scientists. At the opposite end of the spectrum are programmers from the information technology (IT) function who understand programming but lack the business skills, analytics skills or creativity needed to be a true data scientist. I believe a really valuable data scientist should possess the six skills set out in the next section.

Six Key Big Data Skills Any Business Needs

One question I get asked a lot is: What are the key skills required to work with big data? Usually my clients are asking me this question so they can select the right candidates for their organisation. I believe there are six key skills required so, ideally, you’re looking for data scientists who offer these skills.

tip Keep in mind that, rather than hiring people with these skills, you may be able to build on your existing employees and their skills. For example, an IT person who already covers the computer science side of things may be super keen to learn about analytics. Pair her up with a creative, strategic thinker who understands the business’s needs (this might be you), and you’re well on your way to having the skills you need without hiring anyone new. In smaller businesses on a tight budget, it’s a good idea to try to develop your existing people wherever possible.

Analysing data

Perhaps the most obvious skill needed is to be able to make sense of the reams of data that your newly deployed data collection strategy (see Chapter 10 for more on this) is piling up for you. Analytics involves the ability to determine which data is relevant to the question that you’re hoping to answer and interpreting the data in order to derive those answers.

You may have some of these abilities in-house already – business analysts, accountants and IT people are usually skilled at making sense of data in one way or another. This is a great starting point, but you really need people who have (or are willing to develop) a strong understanding of big data.

remember Key analytical skills include:

  • A knack for spotting patterns and establishing links between cause and effect
  • A thorough understanding of interpreting reports and being able to make sense of masses of data, both structured and unstructured
  • The ability to build simulated models that can be warped and tweaked until they produce the desired results, such as answers to your strategic questions (see Chapter 11)
  • A sound knowledge of industry-standard analytics packages, such as SAS Analytics, IBM Predictive Analytics and Oracle Data Mining, and a firm idea of how to use them to spot the answers to the questions you’re asking

Being creative

Creativity is vital when working with big data. After all, it’s an emerging science and there are no hard-and-fast rules about what a company should use big data for.

In this sense, creativity is the ability to apply technical skills and use them to produce something of value (such as an insight) in a way other than following a pre-determined formula. Anyone can be formulaic – you should be aiming for innovation that will set your business apart from the pack.

Creativity is especially important for any business hoping to make sense of unstructured data – data that doesn’t fit comfortably into tables and charts, such as human speech and writing.

remember Valuable creative skills include:

  • The ability to look beyond a particular set of numbers, beyond even the company’s own datasets to discover answers to questions – and perhaps even pose new questions.
  • A knack for pulling out key insights and solving problems. This may include solving problems the company doesn’t even know it has – for example, the insights spotted can highlight bottlenecks or inefficiencies in the production, marketing or delivery processes that the company was unaware of.
  • The ability to come up with new methods of gathering, interpreting, analysing and – crucially – profiting from data. This is especially important in smaller companies where first-choice options may not be possible due to budget constraints.

Applying mathematics and statistics

Ah, good old-fashioned number crunching. It may not be sexy, but it’s still got a big role to play. Despite the growing amount of unstructured data available, much of the information being gathered and stored for analysis still takes the form of numbers.

And even when dealing exclusively with unstructured data like emails, social media messages and so forth, the objective of the exercise is often to reduce that data into easily quantifiable information: Hard numbers are good. This means a person with a strong background in maths or statistics has a good grounding for big data-related work.

remember The following maths and analytic skills are particularly important:

  • At least a basic grasp of statistics. For data scientists involved with operational data science as opposed to strategic data science (I explain the difference later in the chapter, in the section ‘Understanding Two Very Different Types of Data Scientist’), a more in-depth knowledge of statistics and mathematics is desirable.
  • The ability to define appropriate populations and sample sizes at the start of a project, based on the goals set out in the data strategy and to clearly report the results at the end.
  • The ability to wrangle messy, unstructured data into figures that can be quantified, so that definite conclusions can be drawn from them.

Understanding computer science

Computers are the workhorses behind every big data strategy. If an eager, fresh-faced graduate from university had any exposure to the world of data science before throwing herself into the workforce, it probably was in the university’s computer science lab.

Such a broad category covers a whole range of subfields, such as machine learning, databases and cloud computing. It could cover everything from plugging together the cables to creating sophisticated machine learning and natural language processing algorithms.

remember The core computer science skills that relate to big data are:

  • A solid understand of database technology, cloud computing and distributed computing
  • A firm grasp of key open-source technologies such as Hadoop, Spark, Java and Python, which make up the foundations of most big data enterprises
  • The ability to design and program the algorithms that process data into insights

Grasping the business side of things

The idea that a company hires an egghead data scientist, who’s then locked away in a basement lab to work her magic on data fed to her through a slot in the door, is flat-out wrong and should occur only in television sitcoms.

Instead, a good data scientist should have a firm grasp of the company’s goals and objectives as well as an understanding of whether the business is heading in the right direction.

remember I think the following business skills are especially important:

  • An understanding of business objectives and the underlying processes which drive profit and business growth (for example, what makes the business tick and what makes it grow).
  • A thorough appreciation of the key performance indicators (KPIs) and metrics used to evaluate every aspect of the company, from financial measurements to people and performance.
  • The ability to evaluate what it is that makes the business thrive and stand out from its competitors. If the business doesn’t stand out, then you need an understanding of why it doesn’t and what needs to improve.

tip If you’re looking to hire a data scientist and your best candidate doesn’t have much business experience, you can always pair her with someone in the company who does.

Communicating insights

Of course, communication skills are important across all disciplines, but they’re especially important in extracting the maximum amount of value from big data.

You can have the best analytical skills in the world, but unless you’re able to present findings in a clear way and demonstrate how they will help to improve performance and drive success, all that analysis will go to waste.

remember The following communication skills are absolutely vital:

  • Great interpersonal and written communication skills. It’s important to be able to clearly communicate the results of the analysis to other people in the company, including key decision makers. Those people need to be able to understand the key messages quickly and easily.
  • The ability to add significant value to data. Simply presenting the data is what a statistician does. You need someone who can add value to that data through insights and analysis.
  • A good working knowledge of data visualisation and reporting data. Anyone can make a chart or graph; it takes someone who understands visual communications to create a representation of data that tells the story the audience needs to hear (see the next point).
  • A knack for storytelling. Because, in the end, data is useless without context – you need to tell the story behind the data in order to make it really valuable. For example, if your data shows an increase in sales over a five-month period, the underlying story is what caused that increase and what other factors were at play.

There’s more in Chapter 7 on focusing on insights and adding value to data.

Understanding Two Very Different Types of Data Scientist

In reality, of course, there are as many types of data scientist as there are people working in data science. I’ve worked with a lot and have yet to meet two who are identical.

But when I think about the similar skills, methods, outlooks and responsibilities required of data scientists, then group those together, I’m left with two quite distinct groups. I call them strategic data scientists and operational data scientists.

Just to be clear, individuals who fall into either of these groups doubtless have a lot in common. But in order to best examine whom these two types of data scientists are, and how they bring value to an organisation, it’s obviously useful to focus on the differences.

Broadly speaking, a strategic data scientist has a firm understanding of business performance and growth, strategic thinking and communication skills, but is less well versed in the technical, nitty-gritty of setting up database systems and defining or selecting algorithms.

On the other hand, the operational data scientist is more likely to come from a background of programming, statistics or mathematics and can use these skills to implement systems to probe and interpret the data and draw out the most relevant results.

remember In other words – and here you get to the crux of the difference and see why both are essential – the strategic data scientist sets the questions (or works with management to set the questions), and the operational data scientist provides the answers. Asking the right questions and arriving at the correct answers are both essential parts of the process. Both are equally worthless without the other.

Pair a great strategic data scientist with a great operational data scientist and you have an unstoppable team, capable of crunching its way to the most useful and innovative insights. You might occasionally stumble into someone who has the qualities to fill both roles exceptionally well – but in my experience this is rare!

remember Of course, it isn’t always essential to break down data scientists into these two types. Especially in smaller companies looking to employ just one data scientist, the distinctions become much harder to make. Here it’s particularly important to ensure any data scientist has the strategic business understanding as well as the data crunching skills. If she doesn’t, she needs to work closely with someone in the company who does.

Building Big Data Skills In-House

You may be thinking, ‘Crikey, wherever do I find someone with all these skills?’ It’s not as impossible as it may at first seem.

I think developing your existing people is a brilliant place to start. So, the first thing to consider is whether your existing people have the potential to meet some or all of these needs, with a little extra training and knowledge, of course. Over the past couple of years, a raft of big data-related courses have sprung up and some are even available for free. I give plenty of examples in the next sections.

If you’re heading down a recruitment path, then hiring a data scientist can seem daunting if you don’t have any experience in the tech field. But, with the advice and questions I set out in these sections, you’ll be better equipped for the task.

Developing the people you already have

Not every business can afford to spend a fortune retraining its staff. Luckily, there are alternatives.

remember Increasingly colleges and universities are putting courses online where they can be studied for free. Some of the courses offer certificates of completion or other forms of accreditation; some don’t. But the skills learned should be more important than a piece of paper.

If you’re a very small business, you could take one of these courses yourself. There’s no reason you couldn’t use that knowledge to develop your own data strategy and reap insights.

The next sections provide an overview of what’s available online from various schools, colleges and universities.

Data science

The University of Washington’s Introduction to Data Science is available online at Coursera (www.coursera.org/course/datasci). The course can be completed in 8 weeks if you put in 10 to 12 hours’ study per week, and covers the history of data science, key techniques and technologies such as MapReduce and Hadoop as well as traditional relational databases, designing experiments using statistical modelling and visualising results. Some basic programming knowledge is needed, but don’t worry, there are plenty of free courses where you can pick that up too, if you don’t already have it (read on).

tip Coursera’s courses usually run between set dates – if you want accreditation or certificates, you have to register before a set date and complete the course before a final deadline. However, if you’re just interested in the knowledge, you can download all the course materials – which come as videos and reading material – to browse at your leisure.

Harvard also makes its Data Science course available for free online. All lectures are uploaded as videos shortly after they take place, and materials and homework assignments are uploaded to the open source knowledge repository, Github (http://cs109.github.io/2014/). Some basic Python knowledge is required.

Statistics

Stanford has a Statistics One course, which is also available on Coursera (www.coursera.org/course/stats1). The course assumes very little background knowledge and describes itself as ‘a comprehensive yet friendly’ introduction to the subject.

Those looking for slightly more in-depth or specialist knowledge may be interested in Stanford’s Algorithms: Design and Analysis course (www.coursera.org/course/algo). The course covers the fundamental principles behind algorithmic design – design paradigms, randomised algorithms and probability, graph algorithms and data structures. Programming knowledge is essential – you’ll be expected to know at least one language, such as C, Java or Python.

Programming

Speaking of programming, a basic level of familiarity with at least one language is recommended for anyone really interested in data. Python is a good choice as it’s designed for very fast processing of very large datasets and is widely used in big data enterprise.

The following all offer free courses in Python designed for absolute beginners with no programming experience. There’s also a Beginning Programming with Python For Dummies by John Paul Mueller (Wiley) if you’re looking for a little bedtime reading!

Visualisation

University of California, Berkeley offers its Visualization course available for free online, which covers the techniques and algorithms used to create effective and well-designed graphical representations of data. You’ll need some familiarity with one popular graphics program (such as OpenGL or GDI+) as well as one data application (Excel will do). Whichever you choose is up to you as the assignments can be submitted in any format. The course is available at http://vis.berkeley.edu/courses/cs294-10-sp11/wiki/index.php/CS294-10_Visualization.

Recruiting new talent

If data is going to be a core part of your business and you have a little wiggle room in your recruitment budget, then hiring a data scientist is a worthwhile investment.

The skills I set out in ‘Six Key Big Data Skills Any Business Needs’ earlier in the chapter can help you put together a list of what sort of person you should be looking for. If you can find a candidate with all six traits – or someone who has most of them along with the ability and desire to grow – then you’ve found someone who can deliver incredible value to your company. You may also want to partner a data scientist with other employees who really excel in certain skills (such as communication or business acumen).

tip Data scientists are in high demand, so when your company is ready to make the leap into hiring one, it pays to make sure you get a good one, not someone piggybacking on the hype. The following questions (which loosely tie in with the key big data skills) can help you make sure you get the right person for the job:

  • Does the candidate have solid programming skills? A data scientist needs the skills to not just view and analyse the data, but to manipulate it as well. A statistician who reviews and interprets a set of data is very different from a data scientist who can change the code that collects the data in the first place.
  • Does the candidate excel at producing analytics for computers or humans? (And which do you need?) If your end result is a machine learning algorithm to, for example, choose which ads to show on a website or automatically top up your stock when it reaches certain levels, your analytics are for computers. There’s more on using data this way in Chapter 12. If, on the other hand, a human will make a choice based on the analytics, your analyst needs a different set of skills – chiefly, being able to tell a story through data and providing good visualisation of that data. Chapter 11 gives more information on using data to make better decisions.
  • Can the candidate provide concrete examples of when she’s improved a business process through her work? As with any position, you hope to see real-world examples of successfully implemented improvements to a business process.
  • Is the candidate a good communicator? Stereotypes would have you believe that it’s okay for scientists and techy types to be introverts with poor communication skills, but that’s not really an option with a data scientist. She needs to be able to communicate effectively with people who don’t speak the same language, tell a story through data and use visual communications effectively.
  • Can the candidate be creative and open-minded? Big data is a rapidly changing and expanding field that requires a certain open-mindedness. To innovate, a good data scientist must be able to look beyond what came before. If a candidate has implemented the same processes or procedures at multiple companies, ask yourself seriously if she’s able to innovate and try something new.
  • Does the candidate have a scientific mind-set? As the name suggests, data scientists should be scientists that apply the scientific model to data. This means being able to experiment with data to find models and algorithms that are useful for businesses and can be used to predict future events.
  • Does the candidate have a solid business understanding? It’s one thing to understand the science and mathematics behind analysing huge datasets; it’s another thing entirely to truly understand how that data affects profitability, user experience and employee retention – or any other factors important to the business. Someone with a background in business will be better at spotting trends that will benefit your business.

remember I’ve seen many companies try to narrow their recruiting by searching for only candidates who have a PhD in mathematics, but, in truth, a good data scientist could come from a variety of backgrounds – and may not necessarily have an advanced degree in any of them. Focusing on these questions and the six core skills will help you find someone who can help you turn data into actionable results for your business.

Thinking outside the box

Because supply outstrips demand (for the time being at least), it can be hard for smaller businesses to find good data scientists. This means you may need to consider alternative ways of tapping into big data skills.

Consider unusual sources where you might be able to recruit help, either on a permanent basis (for example, recruiting talent) or on a temporary basis (such as getting help to analyse data for a one-off project).

tip A university with a data science department, or any kind of data institute for that matter, is a good place to start. You could offer an internship, taking on some students to help with an analysis project, or you could see if the university is open to a joint project of some kind. If you have data to crunch, the university may very well be up for crunching it! In return you could mentor students on the key skills needed to survive in business or offer interview training and practice.

Remember too that your focus should always be on the skills I outline in ‘Six Key Big Data Skills Any Business Needs’ earlier in the chapter. It may be easy to find someone with statistical and analytical skills who falls short on business insights, but your own people could help supplement those skills. Thinking outside the box is about finding creative ways to pull the necessary skills together in whichever way works for you.

Sourcing External Skills

If training your staff or hiring new people aren’t viable options, you can still make the leap into big data. A great way to supplement missing skills – particularly when it comes to the statistical, analytical and computer science aspects – is to turn to external providers who can handle your data and analytics needs.

When it comes to third-party providers, hiring a big data contractor is usually the most common option. But there are alternatives, such as partnering with other interested parties or crowdsourcing analytic work. I look at each of these options in turn in the next sections.

Tapping into service providers

There are more and more big data providers and contractors springing up who are able to source or capture data on your behalf and analyse it (or work with data you already have). Some big data providers are household names, like Facebook and IBM, but you certainly aren’t limited to big blue-chip companies. There are tons of smaller providers out there adept at working with small and medium-sized firms.

Unfortunately finding a big data service provider is nothing like finding a plumber; you can’t just go on Checkatrade.com for a list of big data tradespeople in your area. Nor are there any real professional accreditations to look out for, like the Gas Safe accreditation for plumbers.

remember So, how do you find a good provider? Like many things in business, networks and contacts can be a huge help. If you have contacts who have worked with a data firm, ask your contact whom she worked with and whether she’d recommend the firm. If not, it’s a good idea to look at some big data case studies online and in books like this to get a feel for who’s doing excellent, innovative work in the field. Failing that, take a look online and start sifting through big data company websites to find one that piques your interest.

You may prefer a provider who has knowledge and experience of working with similar companies in your specific industry. In fact, I’d say industry-specific providers are becoming the norm as opposed to generalists. While the big blue-chip providers may have enormous datasets and impressive armies of analysts, they aren’t necessarily the best option if you’re looking for very specific information. In the last couple of years lots of smaller, more affordable providers have popped up and many of them have in-depth, industry-specific knowledge.

tip The six key big data skills and the recruitment questions I set out in ‘Six Key Big Data Skills Any Business Needs’ earlier in the chapter are just as helpful in finding external contractors as in-house staff. They’ll give you a good grounding for discussions with potential contractors and should help you narrow down your choice. But here are some extra tips to help you find the ideal firm for you:

  • Wherever possible, it’s a good idea to already have a draft data strategy (see Chapter 10) before you approach contractors. This helps you identify what you’re trying to achieve, which in turn feeds into your initial discussions with providers.
  • If you really don’t know where to start, either in terms of developing a strategy or finding providers, then a big data consultant like me will be able to help you devise your big data strategy and find the right company to carry out the data capture and analysis.
  • You need to work with someone who understands exactly what you’re trying to achieve in the business. A contractor with a good understanding of your goals, your unanswered questions and the challenges you’re facing is much more likely to get you the data and insights you really need.
  • Ask for very specific examples of whom a contactor has worked with, how the projects unfolded and, crucially, what results those clients saw as a result.

example Dickey’s Barbecue Pit restaurant provides a useful example of an excellent partnership between a business and its big data contractor. (I talk about Dickey’s in more depth in Chapter 7, and if you’ve not read that chapter yet, take a quick trip there to grab some food from the barby!) The company has a full-time IT staff of 11 people, including two dedicated analytical staff, but also works closely with a data partner, iOLAP. iOLAP delivered the data infrastructure behind Dickey’s big data operation, which runs on a Yellowfin business intelligence platform combined with Syncsort’s DMX data integration software, hosted on Amazon Redshift servers. Dickey’s CIO Laura Rea Dickey explains: ‘Even though our team is probably a bit larger than the traditional in-house team for a restaurant business – because it’s where our focus is – it requires a partner. We have been very lucky in choosing the right partner. We have an account contact in our office at least 20 hours a week and we’re working very closely with them at all times – it’s closed the gap of what would have been a skills shortage for us if we didn’t have a partnership like this.’ The nearby sidebar on women’s cycling offers a completely different spin on setting up analytics.

Partnering to succeed

Earlier in the chapter I mention that universities as a great source of big data talent. There are other ways to creatively partner with third parties.

When it comes to big data, budget is usually the number one sticking point – the thing that stops companies from embarking on a big data strategy. Partnering can offer ways around the budget/resources issue and still help you achieve your goals.

tip Consider whether there’s an opportunity to create an industry group with other companies facing similar challenges to your own. You may not be keen to share detailed data with these companies (nor would they want to with you in all likelihood), but you can certainly pool resources to get data analysis done on a large scale without necessarily sharing your private data with competitors. Remember that data can always be aggregated or made anonymous to remove specifics that you don’t want shared.

Crowdsourcing talent

If none of the options in the preceding sections work for you, then you might consider crowdsourcing your big data project. Crowdsourcing is a way of using the power of a crowd to complete a task. (If you haven’t heard of crowdsourcing before, you’ve probably heard of crowdfunding platforms, like Kickstarter, which operate on a similar basis – using the power of a crowd to achieve a funding goal.)

remember A few crowdsourcing platforms now allow thousands of data scientists to sign up for big data projects. Businesses can then upload the data they have, say what problem they need solving and set a budget for the project. It’s a great option for companies with a small amount to spend or those that want to test the waters. But it’s also a regular resource for big firms like Facebook and Google. Some firms are even known to recruit full-time analysts from crowdsourcing platforms. This gives you an idea of the quality of talent you could tap into.

example Kaggle is one such crowdsourcing platform. The San Francisco-based business awards cash prizes to its teams of citizen scientists that competes to untangle big data challenges of all shapes and sizes. Chief scientist at Google and Kaggle investor Hal Varian describes it as ‘a way to organise the brainpower of the world’s most talented data scientists and make it accessible to organisations of every size.’

At a time when demand for data scientists far outstrips supply, Kaggle has an estimated 150,000 data scientists ready to go to work for businesses like yours. They also offer the Kaggle In Class service – an academic spin-off of the main brand that offers free data processing tools and simulated challenges. It’s intended for use in schools and colleges struggling to meet the challenges of training the first generations of professional data scientists.

As it stands today, Kaggle is one of the more forward-thinking innovations in big data and has done much to raise awareness of the power that crowdsourcing data analysis can bring to businesses and organisations of all sizes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.204.201