Chapter 2

Drilling into the Big Data Gold Mine

Data Fusion and High-Performance Analytics for Intelligence Professionals

Rupert Hollin

Abstract

Threats to local, national, and global public security are continually evolving, and for those tasked with preventing and responding to these threats, the amount of potentially useful data can often seem overwhelming. What compounds this Big Data issue is the fact that we are living in a time of global economic austerity in which national security and law enforcement agencies need to become better at exploiting information while managing the demands of ever-shrinking budgets. This chapter looks at how, by using the latest software tools and techniques for data fusion and high-performance analytics, agencies can automate traditional labor-intensive tasks, gain a holistic view of information that originates from multiple sources, and extract valuable intelligence in a timely and more efficient manner.

Keywords

Big Data; Fusion; High-performance analytics; Visualization

Introduction

We are living in the age of Big Data. The volume, velocity, and variety of data to which agencies have access can often exceed their ability to store, process, and analyze that data to support accurate and timely decision making.
Across the world, law enforcement and intelligence agencies are facing an increasingly complex range of threats from a myriad of different sources; Nation States, groups and individuals with completely different motivations and a growing arsenal of modus operandi. Whether tackling cross-border organized crime or lone wolf terrorist threats, the need to continually adapt countermeasures continues. Agencies have invested heavily in collection platforms and technologies to capture and analyze information, the volume of which is increasing exponentially. When faced with the many types of data now available, the potential for information overload is high. This is especially true for intelligence professionals charged with turning this data into actionable intelligence to help counter these threats, increasing the risk of missing key pieces of information leading to reputational damage at the least, and at the worst failure to prevent a real and credible threat.
Since the attacks on New York and Washington in September 2001, we have seen growing urgency across the Western world and beyond to tackle the ever-present threat of terrorism, a need that has been underlined in the intervening years by high-profile incidents such as the London suicide attacks of July 7, 2005, Anders Breivik’s twin attacks in Norway in 2011, and the Boston Marathon bombings in the United States in April 2013. Developments in Syria and Iraq indicate that this trend is likely to continue. Add to this the increase in organized criminal activities such as arms trading, human trafficking, and drug smuggling, coupled with tough financial constraints, and it is clear that agencies need to improve their effectiveness whilst driving economic efficiencies.

The Age of Big Data and High-Performance Analytics

Terrorism, crime, and other threats to national security can be properly addressed only through the availability of timely reliable intelligence. Successful operations need to be able to use information from a range of sources including human intelligence, signals intelligence, and open-source intelligence. The growing volume and unstructured nature of available data requires committed investments in data exploitation (see Figure 2.1). Gaining value from these data is critical to ongoing success.
image
Figure 2.1 Anticipated growth of data, 2009–2020.
In the past, most agencies worked with data that they held internally within their organization and the intelligence they were generating themselves in the form of intelligence reports or information gathered from the investigations they were carrying out. In this tightly controlled environment, data growth was largely manageable.
The situation today has changed radically because data volumes have increased in all elements of our lives. The same applies to agencies that now have data available to them that increasingly originate from or are held outside their own organization. In the Internet age, to gain a truly holistic view of activities, new and emerging technology solutions such as social media, automatic number plate recognition, telecommunications, and financial and other sensor data have an increasingly important role in agency life.
Analysts need to call on all of the information available to them and assess it to obtain a clear picture of how a situation is evolving over time. This is increasingly difficult in an environment where adversaries are exploiting the Internet, social media networks and other digital channels to great effect. Analysts often have to review huge volumes of information, looking for that golden nugget of relevant information that could bring an investigation to a positive conclusion.
At the same time, much of the data being generated no longer take the form of easy-to-manage structured data saved in a tabular format within relational databases. Instead, a significant proportion of data are unstructured in the form of word documents, transcripts, witness statements, or Internet content, which presents another key Big Data challenge. These data can be extremely valuable—especially given the latest advances in text analytics to automatically generate keywords and topics, categorize content, manage semantic terms, unearth sentiment, and put all of that into context (see Chapters 1113).

Technology Challenges

The downside to all this is that, given the vast type, quantity, and formats of available data, it is unlikely that users know what is relevant or what should be searched for. What about all the important information hidden in the data that has not been questioned because it appears unrelated—the “unknown unknowns”?
Overwhelmed by the scale and complexity of the data being generated, most agencies’ legacy databases simply cannot cope. There is a growing acceptance that siloed systems are a thing of the past and that such silos prevent users from seeing the bigger picture that their data could potentially reveal. To help address this challenge, agencies must consider using the latest data storage and processing technology, such as Hadoop, which uses lower-cost commodity hardware to reliably store large quantities of data (see also Chapters 911) and supports the power of parallel processing enabling rapid access to the data. The inability of an organisation to access the entire Big data store in a timely manner and identify useful and relevant data inevitably leads to missed opportunities and potentially poor intelligence.
Once the data is available in your ‘data lake’ the ability to carry out comprehensive searches across data sources is a critical part of an analysts Big Data arsenal, but also assumes that there is a starting point. This might be a known organization or individuals who can lead the analyst to further valuable sources of intelligence. When faced with organized threats, national security agencies are frequently in a position to ask such targeted questions about an organization, its methods of operation, who its members are and with whom they interact. Exploring the data available and gathering new data is the key to success. However, in these examples, that success is achieved through asking specific questions about the vast quantities of data available to build the intelligence picture, identifying the known unknowns and then gathering further data to fill in the gaps.
The next step to truly unlocking the Big Data gold mine requires the adoption of high-performance analytics (HPA) solutions, creating the ability to process vast volumes of data quickly and efficiently, to uncover hidden patterns, anomalies, and other useful information that can be used to make better decisions. High-performance analytics has the potential to enable law enforcement or national security agencies to integrate and analyze huge volumes and varieties of data, structured and unstructured, to review and anticipate changes in criminal and terrorist activity. For example, Figure 2.2 shows an analysis that was performed on casualties inflicted by different types of explosive attack. The figure indicates a gradual reduction in numbers of wounded or killed people in all types of attack owing to better preventative measures, but also that improvised devices were still responsible for several hundreds of wounded people. These data can be graphically presented to users in a matter of seconds and manipulated to predict future possibilities.
High-performance analytics delivers the potential to help agencies quickly access and analyze every bit of relevant data and thereby move from a pay and chase approach, in which agencies put in place technology to react to events that have already happened, to a more proactive predict and prevent environment.
The real benefit of applying HPA to Big Data is that agencies do not need to know what they are looking for before they start. Instead, analytical techniques will model the data and push information of interest back to them, drawing attention to relevant content, effectively pushing the needle out from inside the haystack. This information can then be processed through standard analysis and investigation processes to determine whether it is viable intelligence, effectively converting Big Data into actionable intelligence.

Building the Complete Intelligence Picture

Although human intervention is always needed to provide the necessary element of domain knowledge and expertise, advances in text analytics capabilities help analysts by pre-sifting data. Sophisticated linguistic rules and statistical methods can evaluate text, minimizing inconsistency and ambiguity. The latest text analytics technology can automatically determine keywords and topics, categorize content, manage semantic terms, unearth sentiment, and put things into context. By applying text analytics, agencies can start to extract intelligence from unstructured data and turn it into a more structured format that can then be analyzed in conjunction with existing structured data.
image
Figure 2.2 SAS visual analytics illustrating preferred attack types deployed by a terrorist cell and related casualties/deaths (2008–2011).
This is a critical element of the Big Data processing cycle for agencies, because they can exploit all of the data they have, not just the structured content. As all investigators will highlight, often it is the text, such as witness descriptions, that contains the most valuable intelligence. It is all about approaching Big Data holistically with a full suite of capabilities to achieve the best results.
Advanced analytics extracts insights from Big Data, supporting the analyst beyond the ability just to ask specific questions or run specific queries. Multiple analytical techniques can be applied to large data volumes to uncover the key nuggets of intelligence (see Chapters 8 and 1013).
Front end, operational technology also has a key role in addressing all of these issues and challenges. Investigators assigned to specific cases need to be working from a single integrated information technology (IT) platform that provides excellent visibility into all the critical information, eliminates double-entry, and provides a streamlined process workflow, helping save time and drive faster responses to threats.
image
Figure 2.3 SAS for fusion centers: using link charts to visualize relationships between persons, vehicles, and information reports.
Agencies sharing information in a fusion center environment, for example, need to be able to search and exploit data effectively using analytic techniques and also to leverage technology to reveal patterns, anomalies, key variables, and relationships in the data, such as those shown in Figure 2.3, ultimately leading to new insights and better answers, faster.
They urgently require systems that present teams with the relevant information in one place and then allow them to use analytics to effectively pinpoint and evaluate the information that is critical to the case.
This is where providers such as SAS (SAS, 2014) can help deliver solutions that allow investigators to identify and share intelligence more effectively, analyze data, uncover hidden patterns and networks, and ultimately reduce threats.
It is critical that any system that is implemented be flexible enough to be tailored to fit an agency’s operational processes, and not the other way around. One of the biggest risks and greatest hidden costs involved in the purchasing of any IT solution is the need to change existing processes to fit around a new system.
It is vital that the chosen system be able to be tailored on an ongoing basis to meet the changing and future needs of a particular organization. This ensures that the agency can mitigate risk by evolving and adapting to address new or emerging legislation, ultimately improving the overall return on investment.

Examples

Scenario 1: Fusion and Michigan State Police

The Michigan State Police (MSP) use SAS in support of statewide information sharing and to support the operations of the Michigan Intelligence Operations Center (MIOC).
The Michigan State Police Department is working closely with the Department of Homeland Security to establish the MIOC for Homeland Security in East Lansing, Michigan, with an operational node in the Detroit area. It is required that all MIOC personnel, regardless of physical location, have full access to MIOC software solutions.
The MIOC for Homeland Security provides information to patrol officers, detectives, management, and other participating personnel and agencies on specific criminals, crime groups, and criminal activity. The MIOC for Homeland Security supports antiterrorism and other crime-specific objectives by collecting, analyzing, and disseminating information to a number of communities of interest. The success of the MIOC for Homeland Security is based on the secure collection and sharing of criminal intelligence information among concerned partners within the state regardless of the type of threat.
Using the SAS solution for fusion centers, MIOC serves as the primary criminal intelligence processing system in the state of Michigan and provides access to over 600 law enforcement agencies as well as 21,300 certified police officers and numerous state and federal departments as they identify and prevent criminal acts directed toward citizens from both domestic and international threats.
Information can be provided on specific criminals, crime groups, and criminal activities, and criminal intelligence information can be shared among concerned partners within the state, regardless of the type of threat.

Scenario 2: National Security and Intelligence Solution in the Middle East

Digital information was stored in databases on different unlinked systems with limited search capabilities, which meant that finding information was labor-intensive and time-consuming.
SAS provided an intelligence solution that migrated several national systems into one centralized security information warehouse. This provided a single unified view of information from different agencies including:
• Immigration data: 25 million records per year
• Policing: 9 million records per year
• Traffic, driver, and vehicle details: 20 million records per year
• Hotel reservations: 3 million records per year
This system proved particularly successful in identifying and tracking down suspects based on their hotel booking history. This National Security Agency now has a unified view of information from all participating agencies, reduced time needed to find information, improved response rate to threats, reduced training costs, and the ability to take in more information from further agencies in the future.

Conclusion

Faced with vast amounts of data in a variety of formats, agencies struggle to transform them into usable, consistent intelligence. They struggle to see the bigger picture because of information silos and a lack of analytical and visualization tools at their disposal.
However, agencies have become increasingly clear about the value of Big Data in preventing and solving crime and threats to national security. We have seen an ongoing shift in mindsets, in which data are increasingly seen as an opportunity rather than a problem, and the latest technologies are available now to enable agencies to start taking advantage of that opportunity.
Without the right tools, pinpointing relevant data in Big Data that might potentially be useful can be resource-intensive and unaffordable. With the right solutions, irrelevant information can be sifted out and areas of interest highlighted.
Today’s latest analytics technologies enable faster, better decision making by improving analysis of the huge and ever-growing volume of data. The ability to scour Big Data using High Performance analytics and data management will become increasingly crucial in enabling intelligence professionals to reveal hidden insights and produce better decisions. A prerequisite for processing and analyzing all available information is the ability to extract it, preferably automatically, from many different sources and formats and to apply suitable data quality processes. The final step is then to use the insight gained in support of operational effectiveness.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.62.168