5

Big Data Privacy, Ethics, and Security

With every new disruptive technology comes new issues. With Big Data, these issues are privacy, ethics, and security. As more and more data is collected, stored, and analyzed by companies, organizations, consumers, and governments, regulations are needed to address these concerns. Big Data brings big responsibility.

In a TED talk at TED Global in 2012, Malte Spitz stated that if the Stasi had known about people's activities what our governments know today, the Berlin Wall might never have come down.1 With the information now available, governments can find the mavens and leaders within society. With this information, you can control a society.

The PRISM leak by Edward Snowden in the summer of 2013 showed that privacy in the Big Data world is indeed an endangered species. In the United States, the National Security Agency (NSA) used PRISM as a source for raw intelligence in its analytics reports. The NSA had direct access to information on the servers of many sites, including those of Google, Apple, and Facebook. In addition, Snowden showed that the U.S. government had access to over millions of private text messages from Chinese citizens, that British Intelligence spied on its leaders during the G20 summit in the United Kingdom in 2009, and that British Intelligence has secretly gained access to the fiber-optic cables that connect Europe with the United States.2,3,4 On a daily basis, this adds up to 39,000 terabytes of data.5 These scandals woke up the public, and privacy suddenly had a whole new meaning.

No matter how many advantages Big Data brings to organizations and governments, business cannot overlook the privacy factor involved in collecting all that information. They will have to map their Big Data privacy needs on time and ensure that they are addressed before any problems arise. In terms of privacy, the problem is not so much the technology or the possibilities it offers, but rather the vast amounts of (anonymized) data. By nature, Big Data is not privacy friendly, as even anonymized data can reidentify individuals as long as there is enough data available.

Traditional privacy regulations will prove not sufficient for protecting the consumer in the Big Data era. New policies related to privacy, security, intellectual property, and data ownership will need to be developed to meet the changing needs of businesses that are developing a Big Data strategy. Companies and governments will have to develop such Big Data policies to protect consumers and ensure organizations are not vulnerable to serious data breaches.

Closely linked to privacy is the ethical side of Big Data. Everyone creates data using a plethora of different devices, products, or applications. Who owns your data and what is done with that data are important areas of discussion. Most of the time, consumers are not aware of what organizations do with their data, and this can have disturbing consequences for both consumers and companies. To ensure that consumers understand what is done with the data and what they can expect, I will propose four ethical guidelines for organizations to adopt.

Finally, security issues are another important aspect of Big Data. Such Big Data can keep a country safe, as the U.S. government claimed regarding the PRISM program or prevent customers or employees from performing fraudulent actions, but Big Data can also in and of itself be a security threat.6 When massive amounts of data are created, stored, and analyzed, criminals will want to illegally obtain that data for various reasons. In 2013, many large organizations experienced distributed denial-of-service (DDoS) attacks and went offline for some time. Even more serious, some had their customer data stolen. Organizations will have to implement the necessary security measures and have a crisis plan ready in case they do get hacked.

BIG DATA PRIVACY

The 2006 Data Detention Directive of the European Union states that all telecom and Internet service organizations need to store data from their customers for a minimum of six months to a maximum of two years.7 This includes who calls whom, who texts whom, who sends whom an email, which websites are visited, which apps are used, and where you are. They know where you sleep. They know everything about you.

So, will Big Data mean the end of privacy? In an interview in 2009, ex Google CEO Eric Schmidt said: “If you have something that you don't want anyone to know, maybe you shouldn't be doing it in the first place.”8 He meant that in the future, privacy as we know it might cease to exist. Already, Big Data is causing some serious privacy issues that are important to address. I want to focus on how Big Data affects the privacy of consumers and how organizations have to deal with these issues to survive. How governments can and will deal with our privacy is beyond the scope of this book.

Big Data is everywhere and, as more products incorporate sensors, even more data will be collected. Thanks to the quantified-self movement, many (free) apps are already collecting a lot of information about users. It is estimated that 60 percent of Americans track their weight, diet, or exercise routine.9 However, in the world of Big Data, nothing is “for free.” Individuals who use free products or services pay by giving out data about themselves; for many consumers, it is unclear who owns that data.

Many services consumers use today began as free and innocent. Consumers did not see any harm in using services like Google or Facebook or other online applications that appeared in the last decade. Consumers are so used to the idea that everything online is “for free,” that they are unwilling to pay for these services. They are so addicted to these services that they cannot relinquish them despite potential security issues.

In the past decade, however, organizations have slowly but surely moved to storing and using more and more data to profile consumers, while constantly adding additional “free” services. So, organizations that consumers thought were connecting them to their friends and providing information as a free utility are now targeting them with highly personalized advertisements using data collected about them from these very services.

The effect on privacy or ethical issues is still unclear. Consumers are becoming aware of the problem. The result is a movement in which users prefer to pay for a service with money instead of data. A great example is App.net, where users pay a monthly fee to use the services without advertisements.10

What can/do organizations do with all that data? Each field filled out, each click, each piece of information on how often consumers use a product/service, when they use it, or how they use it gets translated into data-driven product/organization improvements or is used to serve up increasingly targeted advertisements. As services are becoming more expensive to maintain or build, the advertising becomes bigger. Investors expect a return on their investments, especially in the case of public companies. A good example is Facebook, where the advertising space in the news feed is becoming larger and larger.11 As long as users do not feel that this is an intrusion into their private space, Facebook can continue to show more advertising. Facebook's Graph Search will use even more personal information from its users and buys information from third parties to improve its targeted advertising.12,13

It is fine for consumers to pay for a service with data. It is a valid business model that has been used by many organizations for decades. As consumers become more aware of it and are able to protest against it, organizations need to educate customers about how the data is used. Consumers have to become more careful about their online Big Data footprints. They need to understand the costs of these “free” services.

Consumers should actually start to think of each data point as an economic transaction between the user and the service provider, and organizations will have to be transparent about this. They need to pay more careful attention to the terms and conditions and privacy statements of each website. They should also check their privacy settings on social networks on a regular basis.

Fortunately, an increasing number of websites are advising users about how to adjust privacy settings to appropriate levels.14,15 Still, on the other hand, many organizations make it difficult for consumers to understand what happens with their data. Some companies may think that having a privacy policy is sufficient, but many consumers do not understand what a privacy policy actually means, if they read them at all. Organizations should not only inform customers of their rights, but should also protect these digital immigrants from ignoring or signing the privacy policy without indicating an awareness of its meaning and an acceptance of its terms.

Therefore, organizations should be very clear about what data is collected and what they do with it. They should educate their customers in a clear and concise manner. Who reads a privacy statement of 11,000 words that requires about 45 minutes to read?16 Even the average privacy policy, which is only 2,500 words long, is almost never read.17 Companies need to be transparent about what data they collect, why they collect it, and what they do with it, so that users can decide whether they want to use the service or the product.

In addition, companies could give users the opportunity to use a service without any data being collected or stored. In that case, the user would pay for the service with money instead of data. This is a validated approach, as shown by the success of App.net, which had over 100,000 paying users in 2013.18 Give consumers a choice, and they will chose your organization.

Reidentification of Anonymous People

Another threat that Big Data poses for consumers is reidentifying individuals using anonymous data. This could drive people away from your organization if not addressed correctly. To reidentify individuals in large datasets, all you need is a laptop, Wi-Fi, and various datasets linked to each other. With these tools, anyone can start digging for personal identifiable information (PII) hidden in the dataset. It looks simple; it is, however, rather difficult but unfortunately not impossible, as research from the Whitehead Institute showed19 when it reidentified 50 individuals who had submitted personal DNA information to genomic studies such as the 1000 Genomes Project.20

The researchers of the Whitehead Institute noted that both surnames and the Y chromosome are passed on from father to son, so they started analyzing public databases that housed Y-STR data and surnames.21 They linked public datasets to the dataset collected by the Center for the Study of Human Polymorphisms (CEPH) to identify 50 men and women out of data that was deidentified.22 With more and more public datasets becoming available, could the reidentification of individuals pose a real threat to the use of Big Data and open datasets? What does it do to your organization if anonymous persons are reidentified using datasets from your organization?

Reidentification of individuals could lead to privacy issues because information that should not have been released can become publicly available. The reidentification of Massachusetts Governor William Weld, who collapsed on stage while receiving an honorary doctorate from Bentley College, caused a stir.23 In 2010, using a dataset released by the Massachusetts Group Insurance Commission to improve healthcare and control costs, MIT graduate student Latanya Sweeney was able to reidentify Weld using some simple tactics and a voter list.24,25 Eventually this study led to the development of the deidentification provisions in the American Health Insurance Portability and Accountability Act (HIPAA).26

Reidentification of individuals can have serious consequences if, for example, private health information is recovered that could lead to discrimination, embarrassment, or even identity theft. Or, imagine how medical records could influence a child custody battle. That is why the HIPAA includes 18 specific identifiers that must be removed prior to data release. Unfortunately, it does not stop people from trying to reidentify individuals in large datasets.

Another well-known example is the reidentification of a dataset from Netflix done by Arvind Narayanan.27 The study used public datasets as part of a contest that was organized by Netflix to improve its movie recommendation engine.28 Narayanan and his team were able to reidentify people in the anonymous database. This study lead to a privacy lawsuit against Netflix that subsequently canceled a second contest in 2010.29 Additional examples are available of researchers reidentifying individuals in large datasets.30 As long as it is done by researchers with good intentions, it seems right. Imagine, however, if hackers with bad intentions start doing the same thing? It could be very harmful for your customers and catastrophic for your organization.

Organizations would do well to perform a threat analysis on a dataset before releasing it to the public; that means checking for datasets available online that can be used to reidentify the people included. However, as Narayanan explains, it is not a 100 percent secure solution, as future datasets could still cause a problem for anonymity.31 In order to solve this problem in situations such as the Netflix contest, Narayanan proposed two rules: (1) use of a fabricated small set of data for the first round for contenders to develop a code and algorithm that protects anonymity, and (2) have the finalists sign a nondisclosure agreement (NDA) before releasing the full dataset to them.

With all this said, however, how likely is it and how much effort does it take to reidentify individuals in so many datasets? Dr. Latanya Sweeney reported in 2007 that 0.04 percent (4 in 10,000) of individuals in the United States who appear in datasets that have been anonymized according to HIPAA standards can be reidentified.32 If put in perspective, this risk is slightly above the lifetime odds of being struck by lightning (1 in 6,250).33

So, perhaps consumers should not worry too much about reidentification, as long as the necessary precautions are taken into account as defined by the HIPAA. Perhaps consumers should see it as a risk that is part of life. If we do not want to accept this risk, we should perhaps abandon the use of public datasets completely? However, as Daniel Barth-Jones (an epidemiologist and statistician at Columbia University) explains: “If we stop using and analyzing deidentified data, important social, commercial, and educational benefits, as well as innovation opportunities, might be lost.”34,35

Apart from the small risk of being reidentified, it is also rather difficult to determine the characteristics of individuals in public datasets. As Barth-Jones wrote in a study in 2011, “each attack must be customized to the particular deidentified database and to the population as it existed at the time of data-collection.”36 In addition to that, Paul Ohm, associate professor of law at Colorado Law School, assures us that trustworthy re-identifications is labor-intensive.37,38 It is time consuming, requires serious data management and statistics skills, and simply lacks the easy transmission and transferability seen in computer viruses.

Of course, this does not mean that organizations should stop paying serious attention to reidentification risks. Technology is always improving, including techniques for reidentification. As consumers leave more data traces online, it will become easier to reidentify individuals if measures are not taken accordingly. Measures, such as forcing Facebook to shut down its facial recognition feature as imposed by the European regulators in 2012, will be necessary.39 There will always be companies trying to push against privacy regulations, and hackers will always do their best to find information they can use. Therefore, organizations should constantly reassess and strengthen de-identification and reidentification management techniques to ensure that public datasets can also be used in the future to drive innovation and develop great services for the public.

Why Big Data Privacy Will Be Self-Regulating

Technological progress has always led to heated discussions about the threats it poses to society.40 When the printing press was invented in the fifteenth century, monks viewed easily available books that were unapproved by the church as a threat to their control over learning. The esteemed Swiss scientist, Conrad Gessner, feared that the information overload caused by the printing press could confuse people and prove harmful to them.41 Comparable worries were expressed when newspapers became more available in the eighteenth century.42 The French statesman Malesherbes feared that newspapers would isolate readers.43 When in the nineteenth century, education became more generally accessible, some saw it as a risk to mental health.44 Similarly, critics thought radio would distract children, while television was expected to hurt radio, conversation, reading, and the patterns of family living.45

Since the appearance of the Internet, we heard more of such doubts: Email would damage our intelligence, Twitter could harm our moral values, Facebook might even increase the risk of cancer, and Google would make us stupid.4648 Now, the era of Big Data brings fears that we will lose our privacy.

Although Big Data technology unquestionably makes it possible to follow our every move at any time and place, it does not mean that organizations can do whatever they want with that data. We know that Gen Xers and baby boomers are very careful about their privacy. Gen Y is also conscious about privacy; “just because they want to be in public, does not mean they want to be public,” as stated by Dana Boyd in her talk at the TechKnowledge conference in 2013.49,50

So, how can the privacy of consumers be protected? Well, three different groups are needed to achieve this goal: governments, organizations, and consumers. Let's first look at governments. In most countries, existing privacy laws date from the 1970s and 1980s, when the World Wide Web did not exist, and we were still using landlines to call each other.51 These outdated privacy laws contain many ambiguities, so governments around the world are in the process of updating them.52 This is a positive step, but unfortunately writing new laws takes time and, more often than not, they do not achieve their goals. A good example is the cookie law in The Netherlands, which completely missed its goal and had to be amended within one year.53 The original law prohibited companies from placing cookies on a visitor's computer unless the visitor had explicitly agreed to it. Many companies did not adhere to it, and consumers got tired of the number of additional clicks needed to enter a website. Understandably, governments cannot keep up with the speed at which technology changes. As such, laws are outdated the moment they are passed. Still, we also cannot, and should not, stop technological progress to match the speed of law making. In addition, PRISM has shown us that governments themselves do not protect consumer privacy very strictly.

So, we cannot rely on governments, but fortunately two other options are available for regulating Big Data privacy. Organizations cannot survive without consumers, but consumers can survive without organizations. People are becoming increasingly creative and independent of organizations as shown by the website instructables.com, which enables users to create their own products.54 In addition, the 3D-printing market that is developing will eventually enable consumers to print whatever they want in their living rooms. Therefore, organizations will have to observe certain guidelines that promise to ensure the privacy of consumers or consumers will simply walk away.

An organization that decides not to play by the rules and respect the privacy of its customers could go bankrupt if things go wrong. The power of consumers increased with the growth of social media networks because a protest can be so easily organized. Within days, a large group of consumers can become connected and decide to boycott any organization that does not abide by the rules. Switching costs are low nowadays, so consumers can simply change to companies that better protect their privacy. If no alternative is available, there will always (eventually) be an entrepreneur who will fill that gap.

The real problem is organizations that misuse collected data and are not caught, because they can continue to invade the privacy of their customers. This could result in major catastrophes that cause damage to the lives of consumers. However, the moment a catastrophe occurs and an organization does not respond correctly, it is likely that the organization will be out of business rather soon. In the end, it is up to the consumers to control organizations and demand that they stick to ethical guidelines.

New technologies are always the result of trial and error, so we will unfortunately face a few catastrophes in the future. The challenge is to limit these as much as possible. Consumers and organizations should ensure this and governments should assist with regulations when needed. In the end, however, consumers will decide how society looks and which organizations are allowed to participate. The first organization that goes out of business due to privacy issue violations will serve as a warning for others. It will have a self-regulating effect on other companies. So, if an organization wants to survive in the future, it will have to adopt some ethical Big Data guidelines.

BIG DATA ETHICS

Big Data enables a company to check, control, and know everything. But to know everything entails an obligation to act on behalf of and to protect the customer. Such an obligation requires that organizations do everything possible to protect (sensitive) data sets and to be open and clear about what is done with that data. And, although it is possible to know everything, not every person within an organization is entitled to have access to sensitive information. In The Netherlands, it became clear that sensitive electronic health records could be accessed by anyone in a hospital, even administrative clerks, who could therefore check what his or her neighbor was doing in the hospital, or by an intern interested in why a fellow student was treated in a psychiatric institution.55 These important privacy breaches should be prevented if the right ethics are in place within an organization.

Big Data Ownership

With the digital universe expanding so rapidly, such issues become even more apparent. We need to know how the data is protected, as well who owns the data. Everyone is responsible for this growth; every day consumers like, tweet, comment, share, blog, and publish information on the web. They do this via smartphones, tablets, laptops, and computers. Often, they are unaware that the information is shared.

Many people use apps that automatically share data on social networks, such as data on exercise, sleeping, and diet.5658 However, not many people ask about who owns all that data. The same goes for everything shared on Facebook, Twitter, Foursquare, or any other social network, or in e-mails using Google or Live, or in personal documents in the cloud.

The moment something is placed online, be it in the cloud or on social networks, it is copied, re-tweeted, cached, backed up, and almost impossible to remove again. It will remain there forever and, over time, the ownership of that data fades. And, although Google gives users the option of deleting data, it still owns that data or, as is the case with Facebook, might not really delete the data at all.59

Big Data is often described as the oil of the future. If that is the case, we can make a comparison with the “old oil.” Most countries with large oil deposits have earned a lot of money by bringing that oil to the surface and selling it to the world. They own the oil that is within their boundaries. The same might be said for data collected by large companies, such as Google, Facebook, or Twitter, that store data in their own data warehouses and, thereby, own that data. Consumers may forget that they gave all that data to these companies voluntarily. After all, nothing in this world is free, and users pay for these “free” tools with their data.

Do consumers nowadays know exactly who accumulates what data? Do they know what it is used for? Are they told explicitly when this data is sold and to whom? Do they know who has access to all this data? Although it might be stated in a company's terms and conditions, these are hardly read.60 So, consumers have no clue about what happens with their data. As a result, international laws, guidelines, and awareness campaigns are necessary to protect consumers.

The Data Portability Project is about this approach.61 Its objective is to give consumers the ability to reuse their data across interoperable applications, while controlling their privacy and respecting that of others. The project aims to make consumers aware of what happens to their data and to direct them to organizations that respect data rights and privacy.

Perhaps, the big question is not so much who owns the data, but who has the capacity to analyze, visualize, and resell it. Raw data is, after all, not very usable. The ability to turn data into information and knowledge is the crux. It is all about who can put data to work.

Much is still unclear in the world of data and information ownership. Unless international standards and laws resolve this, organizations have to ensure that consumers know what they to expect. Clearly, we need some ethical guidelines.

Ethical Guidelines

The possibilities of Big Data are enormous, but a company that moves into the Big Data era carelessly will have massive ethical and privacy issues. Organizations should be master of the technology—not the other way around. The goal is to develop better ways to use the unprecedented computing power to our advantage without intruding on the privacy of others or violating ethical standards

Ethics, however, is not the only concern that needs to be discussed and resolved. According to Kord Davis, a former analyst at Cap Gemini and author of Ethics of Big Data, it is important that we also understand and agree on rules regarding privacy, identity, ownership, and reliability of Big Data.62,63 Davis believes that it will be a long and evolutionary process of trial and error. Companies will push the boundaries, and governments will also go too far (think PRISM).

In addition, another problem is that each government will create its own laws regarding Big Data; some will be stricter than others. These different regulations and privacy laws in countries will become an expensive hassle for organizations.64 Some countries may have no restrictions, while others will impose strict restrictions.

The best solution would be a broad-based, global set of privacy and ethical Big Data guidelines, but this will be a difficult and long process. So, together, organizations will have to learn the limits and understand how much privacy consumers want to keep and how much they want to give up in exchange for the free stuff.

Therefore, I am proposing the following four Big Data ethics guidelines

  • Radical Transparency
  • Simplicity by Design
  • Preparation and Security
  • Privacy as Part of the DNA

Radical Transparency

Organizations should tell their customers in real-time what sort of data is being collected and stored, and what they will do with it. Consumers want to be kept informed and to have at least the feeling of being in control. Always allow customers to delete data if it is not stored anonymously or if they simply want to remove it. If you want to offer a free service, be honest and transparent about it, so that your (potential) customers know what they can expect and what happens when they use the service. If possible, also create a paid version that does not collect any data but still provides access to the service. It could even mean an additional revenue stream for your organization.

Simplicity by Design

Customers should be able to simply adjust privacy settings, so they can determine what they want to share, when they want to share it, and with whom. This process should be simple and understandable, even for digital immigrants.65 Do not hide the information about how to change privacy settings; instead guide consumers on how to adjust them to their liking—instead of to yours. Privacy regulations should be simple, straightforward, and understandable. A good example of how not to do it is Facebook, which changes its privacy policy every few weeks/months. Although any setting can be adjusted, it is not always easy to find your way around it. In addition, in 2013, Facebook's privacy policy contained more words (5,830) than the U.S. Constitution (4.543, not counting the amendments).66

Figure 5-1 Privacy and Ethics Framework

image

Preparation and Security

As more data is collected and stored, your organization becomes more valuable to criminals, who want to have a share of it for illegal purposes. Organizations need to develop a crisis strategy in case the company gets hacked and data is stolen, which seems to be happening quite regularly lately. Just look at how Facebook, Evernote, and LinkedIn were hacked in 2013.6769 Or, even better, test your data scientists and IT personnel with fake hacks, as explained in the next section.

Privacy as Part of the DNA

When your organization embraces transparency, simplicity, and security, your customers will embrace you. Ignore these principles, and your customers will eventually ignore you. It is a simple fact. So hire a Chief Privacy Officer or a Chief Data Officer, who is responsible for data privacy and ethics. Make this CDO accountable for whatever data you collect, store, share, sell, or analyze, as well as for how you collect it. Big Data privacy and ethics are too important not to be discussed at the C-level.

Proper usage of Big Data strategies, including combining and analyzing the correct datasets and using them in decision making, will help grow your organization. Doing it the correct way will help you sustain that growth for the long term. Therefore, when starting to develop a Big Data strategy, devote a large part of your time and energy to these four principles, and it will pay off in the end.

If we review all the different aspects of ethics discussed, we see that they are all connected to each other. If you want to take Big Data privacy and ethics seriously, they cannot exist independently (see Figure 5-1).

BIG DATA SECURITY

In the book SuperFreakanomics by Steven Levitts and Stephen Dubners, the authors state that if suicide bombers want to go undetected, they should buy life insurance.70,71 As the authors guide readers through terrorism profiling, they report that an absence of life insurance is a predictor of terrorism. Smart terrorists will therefore buy a policy to help avoid discovery. The British Intelligence Agency uses such criteria—some known and others unknown—to identify possible terrorists.72 This kind of analysis of numerous dissimilar criteria across the entire population can only be done with Big Data technologies and algorithms. Thus, thanks to Big Data countries are more secure. In addition, the U.S. Government reportedly uses Big Data to analyze the online behavior of millions of Americans and non-Americans. According to officials, data from PRISM helped prevent 50 possible terrorist attacks.73 In 2013, The Guardian revealed that the U.S. Government agency NSA collects data directly from servers belonging to American companies, such as Google, Apple, Microsoft, and Facebook.74 According to the article, the PRISM program allows the government to collect emails, search histories, review file transfers, read live chats, and so on. Whether it is true or to what extent is beyond the scope of this book, but it show that Big Data technologies have the capability to help governments analyze what happens online in order to protect their countries. Of course, this is nothing new, as governments have been collecting data about citizens for many years.

There are two main areas that Big Data can affect to improve security. In the coming years, it will have a big impact on the way security is managed and handled worldwide. Some methods will be logical and others might be controversial, but Big Data will definitely affect the way we look at security.

Organizational Security

Organizations are swimming in security data. In 2012, at an RSA Conference panel discussion, Ramin Safai, Chief Information Security Officer at Jefferies & Co., said his investment bank has 5,000 employees and captures 25 GB of security-related data every day.7577 Buried in that data, they usually find 50 items that require closer examination, two of which eventually demand real attention. According to a whitepaper by EMC, 47 percent of enterprises collect, process, and analyze more than 6 TB of security data on a monthly basis.78 Collecting the data is not the problem, and, as such, Big Data impacts organizational security in different ways.

It can be used to detect fraud or criminal activities and monitor risks among the employees of an organization. Within large corporations, it is especially difficult to monitor all employees’ actions. However, with the right Big Data tools, organizations can watch employees without infringing on their privacy. Tools can analyze full text emails or scrape communication channels looking for anomalies or patterns that indicate fraudulent actions. Only when the tool indicates an issue needs real attention should managers become involved. After all, organizations do want to protect their (intellectual) property and prevent an aggrieved employee from making sensitive data public.

Big Data can also help prevent fraudulent actions by customers. Criminals always try to cheat and make money or receive services without paying for them. Examples include insurance, tax, and unemployment benefits frauds. Take insurance as an example. With Big Data, organizations can prevent, predict, identify, investigate, report, and monitor attempts at insurance fraud. Using massive amounts of historical data, organizations can determine what is normal activity and what is not, and then match that data with actions happening in real time. When combined with pattern analytics, it can help identify outliers who require (immediate) action. The fraud prevention industry is big business itself. The Insurance Information Institute estimated that insurance fraud accounts for $30 billion in annual losses in the United States.79

To prevent organizations from being hacked and to keep the collected data secure is the most important task of organizations. Unfortunately, there will always be criminals who are after sensitive data, such as credit card information, bank accounts, and passwords, or who want to steal digital money.80 In 2013, a lot of attention was paid to the hacking of organizations, such as Facebook, Adobe, LinkedIn, and Evernote, where massive numbers of passwords were stolen. With the right Big Data tools, organizations can become much better at detecting abnormalities on the network or finding intruders.

Organizations should create an intelligence-driven security model that incorporates a 360-degree view of the organization and all risks that it faces. Together with the right Security Information and Event Management (SIEM) solutions, organizations can receive real-time analysis of security alerts generated by network hardware and applications.81 Several security intelligence and analytics measurements can be used to ensure that all data within an organization is secure.

In order to stop a cyberattack, it first needs to be noticed. One of the benefits of Big Data security technology is that it allows organizations to monitor exactly which files, applications, documents, and users are traveling through the company network. It also allows organizations to monitor what data is going out and what is coming in, as well as from where and when. All this data can be used to find potential cyber threats active on the company network in real time. Technology can identify applications or users that access the network without having been approved by your organization. The right tools allow you to monitor abnormal and inconsistent communications from and to unknown sources for irregular periods of time that are transferring unusual amounts of data.

If the system detects any anomalies, it is important to take action immediately and determine the size of the attack. If necessary, shutdown systems to prevent further attacks. It is important to respond fast, as any additional second loss could result in more damage. When the attack is stopped, it is time to assess the damage. What security measures have been breached? What data was targeted? What data was lost?

Finally, inform your customers about what happened. It is important to be open and transparent, to explain in simple and clear language what transpired, what actions have been taken, and what plans are underway to prevent such attacks from recurring.

Being under cyberattack can be extremely harmful to an organization, especially because customers need to be able to trust companies to secure their data. Therefore, if a cyberattack is not dealt with correctly, it could result in a loss of customers. Train your employees how to deal with security threats, test them with fake attacks, and ensure that you have a crisis plan ready in case something does go wrong. If it does, try to document as much as possible and learn from it to prevent future attacks. Evaluate your actions and communicate that as well to your customers. If the organization deals with an attack correctly, trust might be restored afterward.

National Security and Public Safety

In 2012, the World Economic Forum identified Big Data as a very powerful tool for public safety and national security.82,83 The hyper-connected world poses more and more risks that could have serious political, social, and economic implications.84 The key is to address the ongoing arms race between cyber criminals on the one hand and the corporations, lawmakers, and governments that oppose them on the other, as stated by Rod A. Beckstrom, President and Chief Executive Officer of the Internet Corporation for Assigned Names and Numbers (ICANN) at the 2012 forum.85,86

Governments are responsible for ensuring that civilians are safe, especially during large events, and Big Data offers many possibilities for reaching this goal. Using different tools and with the cooperation of different organizations via a Big Data solution, it is easier to keep crowds under control and make events safer. Social media analysis is a great tool to do this. For example, governments use different Twitter analytic tools to scan and analyze tweets for security threats, and then take action accordingly. Big Data can also be used to monitor the movement of the crowd during an event and to prevent too many people from gathering at one place. In this way, they can prevent disasters such as the Love Parade in Germany in 2010.

Next to crowd control management systems, CCTV cameras are being used more and more. It is estimated that over 300 different cameras might record an individual throughout the course of a single day.87 The United Kingdom is a big fan of security cameras. With more than 4.2 million CCTV cameras in place, it has more cameras than China.88 There are approximately 100 million security cameras worldwide at the moment, and they are used to control important economic areas/buildings/highways/events.89 Smart cameras can even be used to notify organizations in real time when a security breach is noticed.

Internet protocol (IP) cameras that are directly connected to the Internet will account for approximately 60 percent of all camera sales in 2016, and the percentage of HD security cameras will increase to 50 percent in 2014.90,91 The percentage of HD CCTV smart cameras is still small, but it is anticipated that in 2016 the number of HD CCTV cameras will reach 3.7 million in the United Kingdom.92 These are not ordinary cameras, but cameras that can hear and detect dangerous situations, isolate and follow movements, and identify who is recorded in a split second.93 In The Netherlands, such cameras are protecting the border already.94 These cameras register each and every vehicle that crosses the border and checks it in a database of wanted vehicles. Within a split second, a signal is given whether or not to stop the car. Big Data technologies enable simultaneous monitoring of all those cameras, only requiring a response when an incident is noticed.

CROWD CONTROL MANAGEMENT95

Big Data will have also a major impact on the way public services, such as police, health organizations, and fire departments operate. In The Netherlands, a remarkable, and for The Netherlands unique, initiative took place in December 2012. During the week before Christmas for the last nine years, a Dutch radio station called 3FM organizes Serious Request, an annual charity event. Every year the event is held in a different location. In 2012, the event took place in Enschede, in the Twente region. That year, the Twente police and the Safety Region Twente developed a Crowd Control Management tool to ensure the safety of all visitors. In six days, approximately 500,000 visitors came to the center of Enschede, and thanks to the Crowd Control Management tool, no incidents occurred.

 

What Did They Do?

Three different tools monitored what was going on in real time in the center of Enschede.

Twitcident: Developed in conjunction with the Delft University of Technology, Twitcident is a tool that can sift through massive amounts of local tweets to find information about emergencies happening in real time.96 The tool detects, filters, and analyzes tweets during massive public events and presents the data in a structured way so first emergency responders can use it. Twitcident provided fast and reliable information about the real-time situation in the center of Enschede, including the mood of the crowd and information about people in the crowd.

During Serious Request, Twitcident worked with a list of 533 search terms that resulted in 113,000 different combinations that were monitored by the system. In total, around 1.1 billion tweets were scanned. This resulted in 12,000 tweets that were marked suspicious and were checked manually in the Crowd Control Room.

UrbanShield System: This system provides a real-time situational overview of a complete area within a city.97 This system is based on a Geographical Information System and uses GPS to show the real-time location of all first responders in an area. All police officers, firefighters, and city and private security guards who are part of the system are shown on a map. If a potentially threatening situation is noticed via the cameras on the street or via Twitcident, the closest first responder can be alerted to take immediate action.

Blue Mark: This tool can count the crowd. During large public events, it is important to know the size of the crowd in specific locations to ensure that not too many people stay at one square within the city.98 Blue Mark uses sensors in people's smartphones to monitor the number of people and how they move through town. Each smartphone broadcasts a digital signature on a regular basis that can be counted using Bluetooth or Wi-Fi. No private information, such as account or phone identify was collected, so privacy was protected.

 

Crowd Control Room

Where used, these three tools achieved a multiangle, real-time, high-over picture of the situation in Enschede around the event and on the different city squares.99 From the Crowd Control Room located in the city hall, officials managed the situation and, when necessary, moved into action. When messages came in via Twitter of active pickpockets, the cameras were able to locate the thieves and using the UrbanShield system, the nearest police officers were warned to take action. Within no time, the criminal was arrested and removed from the scene without anyone noticing. During this event, the tools were not automatically integrated, but that could change in the future.

TAKEAWAYS

Big Data has a big responsibility to guarantee the privacy of customers and the protection of their data. With the vast amount of data available to organizations and governments nowadays, anything and anyone can be monitored, traced, and analyzed, without the customers even knowing it. Consumers, however, are becoming more vocal, and organizations that do not respect their privacy and do not take data security seriously will be significantly affected. Switching costs between organizations have never been so low, so if consumers do not like how they are treated by an organization, they will move to a competitor.

For organizations, it is therefore important to comply with Big Data ethical guidelines. They should be transparent about what data is collected when and why, as well as what use they make of the data and how customers can delete data if they choose. Simplicity should become the standard when developing privacy policies, making it easy for everyone, even the digital immigrants, to understand, as well as making processes within the organization to change privacy settings easy to understand and available. They should ensure that the data is kept secure and anonymous, so that digital criminals do not get a chance to steal information. Data should be kept as secure as money in banks. Finally, an organization's Big Data strategy should specifically mention that the privacy of customers will be protected at all times. Everyone within the organization should breathe privacy and do what it takes to protect the privacy of the customers. Privacy should become part of the DNA of the company culture.

Although Big Data has the potential to become Big Brother, it can be averted if regulations are in place, organizations stick to ethical guidelines, and governments do not spy full-time on all citizens. It does not have to become like George Orwell's 1984. However, it is not guaranteed, and governments, organizations, and consumers will have to always pay attention and work hard to ensure Big Data privacy, ethics, and security.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.186.244