Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 4

Unpacking the Many Types of Data

In This Chapter

Understanding the difference between structured and unstructured data

Delving into internal and external data

Deciding what’s right for your business

Historically, data used to be a very ordered and neat thing. Even before computers, humans used data to impose order on business actions and processes – think of accounting ledgers or stacks of paper transaction records. Computers, and particularly spreadsheets and databases, gave you a way to store and organise data like this on a large scale in an easily accessible way; instead of ploughing through paper archives, business information was available at the click of a mouse. This absolutely revolutionised business processes (imagine doing your tax return or sending out a customer mailing with only paper records to help you). For a long time this form of structured data reigned supreme; anything that wasn’t easily organised into rows was simply too difficult to work with and was ignored.

Now, though, advances in storage and analytics means that the masses of messy, unstructured data out there can finally be harnessed to the advantage of businesses big and small. It’s no longer all about the humble spreadsheet and database.

But what exactly do I mean by structured and unstructured data? And what sort of data is typically available in-house in the average business? What can you source externally? In this chapter, I explain the key features of structured and unstructured data (semi-structured too, for that matter), plus internal and external data. I also give examples of the different types of data and weigh up the pros and cons of each.

Deploying Order: Structured Data

Structured data refers to any data or information located in a fixed field within a defined record or file. This includes data contained in relational databases and spreadsheets. As the name suggests, structured data refers to data that’s organised in a predetermined way, usually in rows and columns.

Structured data gives names to each field in a database or spreadsheet and defines the relationships between the fields. For example, if you look at a standard customer database, the defined fields include name, address, contact telephone numbers, email address and so on. Within those fields, conventions may also be set (for example, the telephone number field will only accept numeric information). These conventions can also include drop-down menus that limit the choices of the data that can be entered into a field, thus ensuring consistency of input (for example, a Title field may only give you certain options to choose from: Mr, Ms, Miss, Mrs, Dr and so on.).

Despite its fixed nature, this data can be queried and used in lots of different ways, such as understanding how many units of a product you sold in a given month and how that compares to the same month last year. Or, in the case of customer records, if you pull all the names and email addresses of customers who bought a certain product (if you wanted to offer a discount on a related product, for example), then you’re already mining structured data very successfully.

Structured data is often managed using Structured Query Language (SQL) – a programming language originally created by IBM in the 1970s for managing and querying data in relational database management systems.

SQL represented a huge leap forward over paper-based data storage and analysis, but, not everything in business fits neatly into a predefined field.

At present, structured data provides most current business insights. Despite this, it’s often considered old hat and a bit dull, especially in comparison to its rock star cousin, unstructured data. Like the Eagles sang, there’s a new kid in town – unstructured data is the Johnny come lately, and everybody loves him. Good old structured data can be overlooked as a result (I can’t help but picture an unloved spreadsheet in the corner of a party, miserably mainlining Twiglets and hoping someone will talk to it). But I think it’s a big mistake to ignore structured data as it still has plenty to offer businesses – particularly when it’s combined with unstructured data.

Digging into the pros and cons of structured data

Think about a library for a moment. A library has a database of all the books it holds with everything neatly categorised into fields such as book title, author name, ISBN (International Standard Book Number, the identification number given to a book by the publisher) and the library’s own identification number (pinpointing where the book can be found). The database will also record when the book is checked out and by whom. This system is very easy to use and maintain: When a new book arrives, its details can be easily added to the system; when a customer asks about a particular book, the librarian can look up the information without specialist analytical help. Nor does the system require specialist storage or a huge data warehouse, instead sitting happily on the library’s computers.

This is the beauty of structured data – anything that can be put into rows and columns is incredibly easy to categorise, store and access. Most of the time it can be used by anyone in the organisation without any in-depth technical knowledge.

The main advantages of structured data are:

It’s easy to input.
It’s easy to store.
It’s simple to analyse and mine for information.
It’s often the least expensive data option.

It’s not all rosy, though. The downsides of structured data are as follows:

It provides a rather limited view that rarely tells the whole story.
It’s only a small proportion of the total data available in the world.

The fact is, structured data is less rich than unstructured data (which I talk about in the upcoming section, ‘Coping With Messy Data: Unstructured or Semi-Structured Data’), and, used alone, it’s difficult (if not impossible) to get a really full picture of what’s going on. Often you need to use other data sources to get a better understanding. For example, structured transactional data will tell you that there was a 40 per cent increase in online sales in June, but it won’t tell you why that happened or how happy customers were with their purchases. To get a fuller picture, you’d need to combine that data with other information, such as a customer survey, demographic data, social media conversations and maybe even weather data.

And, while structured data is (for now) still the most commonly used, it represents just a fifth of all the data available in the world: 20 per cent of the world’s data is structured; everything else in unstructured. Therefore, if you use only structured data as the basis of your business decisions, you’re missing out on a lot of information!

Examples of structured data

If you know that structured data is all the data that can be easily categorised into rows and columns, you can build a picture of the many examples of structured data that the average business has access to. Examples include:

Customer data
Sales data and transactional records
Financial data, such as cash flow
Number of website visits
Any kind of machinery data points, such as temperature logs in a refrigerated storage unit or number of products manufactured on a production line

Here’s an example of the power of structured data: Walmart handles more than a million customer transactions each hour and imports those into databases estimated to contain more than 2.5 petabytes of data – that’s equivalent to 2,500 terabytes or 2.5 million gigabytes of data. (To put that in perspective, it’s estimated that all the content from US academic research libraries equals just two petabytes, which makes Walmart’s databases look pretty astonishing.) The company is able to combine this structured customer data (particularly who bought what, when) with a variety of sources including customers’ mobile phone location data, Walmart’s internal stock control records and external weather data to create tailored sales promotions. So, if you bought any BBQ-related goods from Walmart, happen to be within a three-mile radius of a Walmart store that has the BBQ cleaner in stock, and the weather is sunny, you might receive a voucher for money off the cleaner delivered to your smartphone!

Your own structured data may not be nearly as impressive as Walmart’s massive databases, but it can still provide an excellent starting point for gathering insights, especially if you combine that data with other sources to get a more detailed picture of what’s going on.

Coping With Messy Data: Unstructured or Semi-Structured Data

Unstructured data is all the data you can’t easily store and index in traditional formats or databases. It represents all the data that can’t be so easily slotted into columns, rows and fields. It includes email conversations, social media posts, video content, photos, voice recordings, sounds and so on. It’s usually text heavy, but may also contain data such as dates, numbers and facts or different types of data such as images. These inconsistencies make it difficult to analyse using traditional computer programs.

Up until relatively recently, technology just didn’t have the grunt to store, never mind analyse, anything other than structured data. Everything that didn’t fit into databases or spreadsheets was usually either discarded or stored on paper or microfiche in filing cabinets or storage facilities. Now, thanks to massive increases in storage capabilities and the ability to tag and categorise such data, not to mention advances in analytical tools, you can finally make use of this data.

Semi-structured data is a cross between unstructured and structured data; it’s data that may have some structure that can be used for analysis but lacks the strict structure found in databases or spreadsheets. In semi-structured data, tags or other types of markers are used to identify certain elements within the data, but the data itself doesn’t have a rigid structure. For example, a Facebook post can be categorised by author, date, length and even sentiment, but the content is generally unstructured. Another example is a Microsoft Word document; as I draft this chapter, the document metadata details my name as the author and when it was created and amended, but the content of the document is still unstructured. It might be possible to automatically analyse the content of the document, but not using traditional analytical methods – I would need a specialist text analysis tool.

The process of turning unstructured data into semi-structured data used to be quite laborious. For example, in the case of a video of a cat playing with a ball of string, the video would have to be watched and heavily tagged according to certain terms: cat, cute, ball, funny and so on so that people searching for funny cat videos could find it easily. Now videos can be automatically categorised (no pun intended!) using algorithms; computer programs can watch the video, automatically detect what’s in it (sometimes even who is in it, thanks to face recognition software) and produce tags automatically.

Unstructured and semi-structured data are like the popular kids at school: Everyone is talking about them, and they represent the sexy new frontier in big data. There’s no denying that this type of data and the advances accompanying it are truly exciting for businesses. The trick is to not get swept up in the excitement and lose sight of the value that traditional structured data still holds.

Many companies are starting to use unstructured data analytics to complement their traditional data analysis in order to get richer and improved insights and make smarter decisions. I always advise clients that combining this messy and complex data with other more traditional structured data is where a lot of the value lies.

Understanding the pros and cons of unstructured or semi-structured data

It’s estimated that 80 per cent of business-relevant information originates in unstructured or semi-structured data. Think about that for a second: The overwhelming majority of data out there relevant to the average business is unstructured or semi-structured in nature. It massively outweighs structured data when it comes to sheer volume.

Not only is there more of it, but it tends to be richer in insights, too. So, while structured data – all those neat rows and columns of information – usually tells you the who, what and when; unstructured or semi-structured data can help you get to the heart of trickier insights, like why, what do they have in common or when will it happen again.

The advantages of unstructured and semi-structured data include:

There’s absolutely loads of it – significantly more than structured data.
It provides a richer picture than structured data.

Exciting though unstructured and semi-structured data is, there are some downsides (pretty big ones too):

It’s harder to store.
It’s much more complicated to analyse and, therefore, to extract insights from.
Because of both these factors, it’s usually more expensive to use than structured data.

Messy data like this is complex stuff to work with, usually requiring specially designed software and systems. For one thing, it tends to be bigger than structured data, meaning you need bigger and better storage. It’s also trickier to organise and mine for insights – not impossible at all, just harder to do. As a result, the costs can add up. It can also be easy to fall prey to mission creep, getting so excited by the possibilities of data that you lose sight of the value for your business. It’s therefore really important to come up with a clear and robust data strategy before you start delving into the data possibilities. Head to Chapter 10 for more information on creating a data strategy.

Examples of unstructured or semi-structured data

Thousands of examples of unstructured and semi-structured data are out there, but, broadly speaking, they fall into the following categories:

Photos and images
Videos, including CCTV
Audio conversations
Website text
Text – including emails, documents, blog posts, social media posts and so on.

I take a closer look at many of these in Chapter 5.

Brands are starting to mine these new types of data as part of their everyday marketing activities. An example of this comes from a friend of mine who runs conferences for a living. One of the conferences he ran was for a well-known electronics manufacturer. Just before the conference started, he shared a picture on Twitter of the main stage, ready for the first speaker. The picture included the brand’s sign and logo behind the stage, but he didn’t mention the company explicitly using a hashtag or their Twitter handle. The next week he kept seeing targeted ads online for that particular brand. The company knew he was talking about them because their analytical software searches for text and photo that is related to the company and their products. In this case it didn’t result in him buying a hoover, but you can see the very powerful potential for businesses looking to get closer to their customers.

These aren’t just whizzy marketing tools, the wider applications are enormous. US border control is experimenting with a new avatar system at border control points in the US. After getting off a plane, visitors are greeted by a virtual border agent who asks questions such as ‘Where have you arrived from?’ ‘What is your destination’ and ‘How long are you staying for?’ Your answers are then cross-checked against information already in the system (for example, from the flight operator) to spot any inconsistencies. The system also monitors factors such as eye movement, pupil dilation, gestures, changes in your voice pattern and so on. to see if you’re telling the truth.

Don’t fall into the trap of thinking tools like this are for big corporations only. Plenty of analytical tools are relatively inexpensive and easy to use. Twitrratr, for example, helps you monitor how people talk about your company on Twitter, separating out the positive and negative tweets that reference your brand or product.

Discovering the Data You Already Have (Internal Data)

Internal data accounts for everything your business currently has or could access. This could be structured in format (for example, a customer database) or it could be unstructured (for example, conversational data from customer service calls). Internal data is your private or proprietary data that is collected and owned by your business – crucially, you control access to the data, no one else does.

Internal data includes data that you already have, plus any data that you don’t yet have but have the potential to collect. For example, you could run a customer survey in order to gather information about customer habits and what they think of your product. This would be internal data, owned and controlled by your company.

Again, like structured data, internal data isn’t considered very sexy or innovative and a lot of businesses run giddily towards external data that they currently don’t have. But I think that’s a big mistake.

There’s real value in your internal data because it’s naturally tailored to your business or industry. Sure, you may need to look at some external data alongside it (in fact, I’d encourage you to) but never overlook internal data altogether.

Weighing up the pros and cons of internal data

The main advantages of internal data are:

There are no access or ownership issues to contend with
It’s cheap to use (maybe even free)

Because you own your internal data, you’re never at the whim of a third party that can stop supplying it any time or charge the earth for access. You can use the data without problems whenever you need to. For really business-critical information, access and ownership are key issues and not something to take lightly.

Another feather in the cap of internal data is the price – it’s usually cheaper to work with than buying access to external data. That’s not always the case (say, if you had lots of information held on microfiche, the costs of digitising that data could be pretty hefty) but it’s generally true. For this reason, internal data is a good place to start when you’re weighing up your data possibilities.

The downsides of internal data are:

It may not provide a full enough picture to achieve your strategic goals (although it might!)
You have to maintain and look after this data
You are legally obliged to make sure personal data is secure

It costs money to properly maintain and secure data, keeping it up-to-date and protected from criminals. This is particularly true of personal data, which is a big legal issue these days. And of course laws do change, so you’ll need to keep abreast of your changing legal obligations. Therefore, while using internal data is generally cheaper than buying in external data, you do need to factor in costs for maintenance and protecting that data in a responsible way. There’s more on data governance in Chapter 9.

Examples of internal data

Examples of internal data include:

Customer survey data
Conversational data from calls to your customer service team
Employee satisfaction survey data
HR data
Sales data
Financial data, such as cash flow and profit/loss statements
CCTV (closed-circuit television) video data
Customer record data
Internal documents
Website data, such as number of visitors, or how customers move around the site
Stock control data
Sensor data from company machinery or vehicles

This is by no means an exhaustive list, and there are many, many more examples of internal data. Some of these will not be relevant to your business, and some factors not listed here may be absolutely critical to your industry (imagine, for example, the data gathered by a company that calibrates engineering equipment).

You may think big data bras is a figment of the imagination, but you’d be wrong. Online retailer True&Co is using data to help women find better fitting bras. Statistics show that most women wear the wrong bra size, and so the website has stepped up to try to solve that problem. Customers fill out a fit questionnaire on the site and, based on the responses, an algorithm suggests a selection of bras to choose from. The company also uses customer feedback and the data it collects to influence the design and development of its own in-house brand of bras. This is a brilliant example of a company generating and mining its own data to gather insights that improve its product, increase sales and create happy customers.

With all this internal data within touching distance, how do you know where to start? Are some forms of internal data better than others? In short, no (sorry!). There’s no pecking order as each business has different needs. Instead, to work out which internal data is most useful to you, you’ll need to work out what it is you’re trying to achieve in your business. Once you know your goals, you can work out which data can help you get there. There’s more on this in Part 4.

Accessing the Data That Is Out There (External Data)

External data is the infinite array of information that exists outside your business. This can be publically available or privately held. It can also be structured or unstructured in format.

Public data is data that anyone can obtain – some of this might be available for free (for example, from government websites), and some you might need to pay for. Private data is data owned by a third party and that isn’t available for public consumption – you usually have to source and pay for this data from another business or third-party data supplier.

There are plenty of ready-made datasets, both public and private, that are available to suit a range of needs (census data being a good example). Sometimes, though, what you need isn’t available as an off-the-shelf solution. In this case, you can pay a third-party provider to gather the data for you.

Delving into the pros and cons of external data

The pros of external data are

It offers access to information much wider and richer than anything you could create yourself.
It’s often fresher and more up-to-date than anything you could replicate in-house.
Someone else is responsible for storing and maintaining that data and keeping it up-to-date.
You don’t have to worry about security and data protection issues.

Companies like Walmart, Amazon and Facebook have the ability to generate (and manage) huge amounts of data. And that’s great for them but, generally speaking, it’s way beyond the capability of the average business. External data sources, however, give any business the capability to access and mine big data – without many of the hassles that come with storing and managing that data on a day-to-day basis.

For a small business, it’s often nice not to have the burden of maintaining data and worrying about keeping it secure. By buying (or accessing free) data from an external provider, the provider bears that responsibility so you don’t have to.

And the cons of using external data are

You don’t own the data.
You may have to pay for access.

The major disadvantage of not owning the data you use is that you’re reliant on an external source. There’s a risk that the provider could stop supplying the data or put their costs up significantly. This is an especially big risk if you’ve become reliant on that data for key business functions, such as marketing activity.

Ultimately, the risks and the costs of accessing external data need to be weighed up against the risks and costs of not using that data. Would you have to go to the trouble of creating it yourself? Would your business suffer if you didn’t use that data? Would it stop you meeting your strategic goals? You may find that, overall, the benefits far outweigh the risks.

Examples of external data

Examples of external data include

Weather data
Government data, such as census data
Economic data
Social media profile data
Social media text and activities, such as tweets, likes and shares
Google Trends or Google Maps data

A beautiful example of using external data comes from a website called FallingFruit.org. The site aims to remind urbanites that agriculture and natural foods do exist in the city. The site combined public information from the US Department of Agriculture, municipal tree inventories, foraging maps and street tree databases to create an interactive map of trees. City folk can use the site to see which fruit trees might be dropping fruit right now in their own neighbourhoods.

When it comes to external data, it’s hard to know what is actually available and where to look. As a starting point, I recommend checking out Chapter 15 which lists my top ten free data sources. I also strongly recommend working with a good data consultant. A consultant can help you hone in on the best external datasets and providers to suit your business needs. This doesn’t need to cost the earth and it’s absolutely money well spent, saving a lot of time and potentially wasted resources.

What Type of Data is Best for Me?

A lot of the big data hype focuses on unstructured data and the allure and promise of external data, often at the expense or dismissal of internal or structured data.

The truth is, no type of data is really ‘better’ than any other type. Unstructured data isn’t necessarily better or more valuable than structured. What’s best for one business may not be best for yours. The key is to start with a strong data strategy (see Chapter 10), establish your key strategic questions (see Chapter 11), and let those guide you to the best data for you, whether it’s structured, unstructured or a combination.

The same goes for internal versus external data – there’s no rule as to what’s right or wrong. There will always be internal data that’s easier to collect and analyse but that may not provide you with everything you need to grow your business.

A taxi company can collect data on where its drivers are around the city. This is easy to do and it’s really useful information, so it makes perfect sense to collect it and use it. But, if you’re the owner of the company, knowing where drivers are isn’t the whole picture. You may also want to know where most people want to be picked up from when it’s raining so you can position your drivers accordingly. For this, you’ll need to combine your own information on pickups with weather data. You could create your own weather data that you control but that would probably be a waste of resources because weather data is so easy to access externally for free (see Chapter 15 for the best free sources). This external weather data is also pretty low risk to use, since it’s government data – the UK and US governments have made strong commitments to data transparency so it’s highly unlikely that they will withdraw access or start charging for it. In this example, there’s a really strong case for making use of this external data.

The waters get a little murkier when you start looking at purchasing external data from a commercial provider. There are big advantages to buying in external data versus creating your own, since the provider bears all the burden of securing and maintaining that data, saving you valuable time and resources. But you need to consider the risk of price hikes or the withdrawal of that data, particularly if the data is absolutely critical to your business. Some companies prefer to create their own data, for example by collecting more detailed personal information on customers. But the risks then become about maintaining and securing that data, rather than ensuring access.

Ultimately, whatever type or combination of data you choose, there will always be some risks and pros and cons. You need to weigh up what’s right for your business.

Finally, a word on not putting all your eggs in one data basket. I always advise my clients that a combination of datasets is better than relying on one for business decisions; otherwise you can end up with a very limited picture. As a rule, two datasets is better than one … but three is ideal. With three different datasets, you can create a much richer picture by looking at the data from different perspectives. This means you can verify insights rather than plunging ahead with decisions based on false assumptions.

With this in mind, the ideal data for you may actually be a combination of different types of data. To meet your strategic goals (which I talk about in Chapter 10), you may well need some structured internal data (like sales data), plus some structured external data (for example, demographic data), alongside some unstructured internal data (such as customer feedback) and unstructured external data (for example, social media analysis). Really smart businesses, the ones that will thrive in the future, are those that combine data to get the most useful insights for them.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 4: Unpacking the Many Types of Data

Create new playlist

Sign In

Sign Up