Chapter 4

Big Data and Law Enforcement

Advances, Implications, and Lessons from an Active Shooter Case Study

Kimberly Glasgow

Abstract

What has been called “Big Data,” particularly social media data, has been heralded as fundamentally transformative in all spheres of human activity from economic, social, political, and legal processes to the basic workings of scientific research. Working from a case involving crisis response to a threat to public safety, this chapter explores key issues and complexities in harnessing Big Data to meet law enforcement needs and to inform and protect the populace. From supporting real-time situational awareness during an active shooter event to mining responses and supporting investigatory or intelligence activities, the promise of Big Data must be coupled with the right science, analytic methods, technologies, and social knowledge to understand and shape this potential information deluge.

Keywords

Active shooter; Crisis management; Information sharing; Law enforcement; Situational awareness; Social media

The Intersection of Big Data and Law Enforcement

We live in an age in which the challenge of protecting the public from crimes, disasters, and other dangers remains ever-present. Whereas violent crime rates in the United States (US) have dropped in the past decade (Federal Bureau of Investigation, 2012), active shooter and mass casualty incidents appear to be trending upward (Blair et al., 2014). Law enforcement agencies have regularly pursued new methods, data sources, and technologies that hold promise to improve public safety, such as public surveillance cameras (La Vigne et al., 2011).
More recently, Big Data sources and analytics are beginning to be explored in the public safety arena. Computer science, physics, bio-informatics, economics, and political science are among fields that have already seen progress through adopting Big Data, but have encountered pitfalls as well (Boyd and Crawford, 2012; Lazer et al., 2009), particularly if the data they are engaged with are “digital traces” of online activity. Many businesses have embraced Big Data as critical to gaining market advantage, yet still struggle with developing analytics that provide actionable insights (LaValle et al., 2011). Law enforcement can learn from these experiences as it seeks to adapt the use of Big Data to its unique challenges and constraints.
What do we mean by “Big Data” in a law enforcement or public safety context? The sheer size of the data in question—gigabytes, terabytes, even petabytes of data—could be sufficient to deem it “big”. However, a more nuanced and useful definition might be “data whose size forces us to look beyond the tried-and-true methods that are prevalent at that time” (Jacobs, 2009). Furthermore, data that are complex, heterogeneous, or ambiguous in nature may demand moving beyond tried-and-true methods sooner than a larger but well-understood, well-structured, and predictable dataset. The velocity at which data arrive can pose another problem.
Potential sources of Big Data for law enforcement and public safety are varied. Some are familiar sensor-based feeds, such as public surveillance cameras or traffic cameras that can produce huge amounts of video data. Technological innovations such as computer-aided dispatch and other electronic record management efforts produce volumes of data that can be mined in hopes of reducing crime (Byrne and Marx, 2011), perhaps in conjunction with relevant data from the U.S. Census, the Federal Bureau of Investigation, the National Institute of Justice, or comparable sources. Such data sources have well-understood structures and properties and may have been intentionally built to support law enforcement needs. Both the sensors and the databases in question are likely to be law enforcement assets. This greatly simplifies the challenges of working with this type of Big Data. Forensic analyses of computer hard drives or cloud data stores as part of criminal investigations can also involve extremely large quantities of data (Garfinkel, 2010). In this case, the focus is on gathering evidence pertinent to a specific investigation of a known suspect from systems used by that suspect.
Another potential source of Big Data for law enforcement is social media (see Chapter 11). Social media has been widely adopted in the US, with nearly three-quarters of online adults reporting they use one or more types of social media regularly (Duggan and Smith, 2013). Through social media, people can freely and easily create, post, and share online content in many forms, including text, images, audio, and video. They can converse with others, build and maintain social networks, plan, organize and execute events, exchange knowledge and commentary, rate and recommend, interact in educational or scientific endeavors, and engage in a host of other social activities. Although thousands of social media platforms exist, a far smaller number have been widely adopted and are likely to be broadly relevant as information sources in a public safety context.
Social media can be an unparalleled real-time source of information about the thoughts, feelings, behaviors, perceptions, and responses to events for large numbers of individuals. In particular, the microblogging platform Twitter has been observed to provide a timely source of direct observations and immediate reactions to events such as natural disasters (Starbird et al., 2010), human-caused disasters such as the London riots (Glasgow and Fink, 2013), as well as campus shootings and other violent crises (Heverin and Zach, 2012) with strong public safety implications.
Social media have been recognized as a potential tool for local governments during crisis events both as a way of keeping the public accurately informed and as a source of situational awareness. Some law enforcement agencies have begun employing social media actively. One notable example is the Boston Police Department. Shortly after the initial explosions near the finish line of the Boston Marathon in April 2013, during the early stages of the investigation of the bombing and the manhunt and throughout the following weeks, the Boston Police Department used Twitter to communicate with the public. They provided updates on police activities and the status of the investigation, announced road closures, requested public assistance with the investigation, and expressed sympathy for the victims (Davis et al., 2014). In general, police departments that use Twitter have been observed to predominantly tweet information on recent crimes or incidents, department-related activities, traffic problems, and crime prevention tips (Heverin and Zach, 2011).
Law enforcement has significant experience and familiarity with sensors such as cameras. Social media, the output of humans as social sensors in their communities, may seem arcane and unfamiliar in comparison. The scale of publicly shared social media and the inherent technical complexities of acquiring, processing, and interpreting it can seem daunting. A social media post containing a keyword of interest such as “shooting” could be an accurate eyewitness text description of a crime accompanied by a photograph of the event, global positioning system coordinates, and a precise timestamp. Alternately, it could be a sarcastic comment, a joke, song lyrics, an uninformed opinion, a different meaning of the term (“shooting hoops”), a falsehood, hearsay, or some other form of self-expression unrelated to a crime. Billions of social media messages are posted each day, which further complicates the challenge of finding the right information for law enforcement.
Beyond searching for specific relevant posts in a sea of data, it may also be important to uncover trends or patterns in large collections of social media data, to detect anomalies or understand connections between individuals or groups.

Case Example and Workshop Overview

These examples clearly indicate the use of social media for law enforcement but do not tap into the broader Big Data analytic potentials of social media. To examine the issues of using Big Data to support law enforcement and public safety, this chapter describes a focused case example. The case was an active shooter event in an enclosed public space, a suburban shopping mall, during business hours.
On January 25, 2014, authorities received reports of shots fired at a shopping mall in Columbia, Maryland. A young man had entered the mall, shot and killed two employees of a skate shop, and fired on additional patrons of the mall before taking his own life. He clearly met the Federal Emergency Management Agency definition of active shooter, “one or more suspects who participate in an ongoing, random or systematic shooting spree, demonstrating the intent to harm others with the objective of mass murder” (FEMA, 2013).
Law enforcement personnel arrived at the mall within 2 min. In all, hundreds of officers from Howard County, Maryland and allied agencies, special weapons and tactics teams from throughout the region, and explosives experts from several agencies responded to the event. They effectively secured the large and complex scene of roughly 1.6 million square feet, over 200 stores, multiple floors, and numerous entry and exit points. They searched the facility, evacuated thousands of mall patrons safely, ensured medical attention was provided to the injured, confirmed there was only one perpetrator, and identified and removed an improvised explosive device left by the perpetrator at the scene. During and after the incident, the Howard County Police Department (HCPD) actively used social media, particularly Twitter, to communicate directly with the public, providing informational updates and guidance and correcting misinformation (Police Executive Research Forum, 2014). Law enforcement handling of the incident was viewed positively, a sentiment reflected in this Twitter message:

Again. I cannot reiterate this enough. If you are a police department follow @HCPDNews to learn how to manage a crisis. #ColumbiaMall

A few months later, Johns Hopkins Applied Physics Laboratory (JHU/APL) sponsored a workshop on social media as a data source during emergencies and disasters. The event was a collaboration among 17 expert participants from HCPD, the Division of Fire and Rescue Services, the Office of Emergency Management, the Public Information Office, and the National Institute of Justice and a team of JHU/APL researchers, data scientists, engineers, and computer scientists. It explored how social media Big Data could provide insights during a crisis and how these insights could be applied in incident response and other law enforcement and public safety contexts. In addition, methods for measuring the effectiveness of official messaging in incident response were examined. Based on gaps and needs identified by experts in the course of response to the mall shooting or developed through professional experience in policing and public safety, JHU/APL staff developed prototype analytics and tools to illustrate potential approaches to resolving these gaps. This exercise advanced the art of the possible and illuminated potential challenges.
Initial sessions of the workshop focused on information sharing. A panel discussion on the mall shooting incident and response was conducted and a timeline was presented. Brainstorming sessions explored high-level topics regarding response to the mall shooting:
• What would you want to do that you could not?
• What would you want to know that you did not?
• What would you want to convey/communicate that you did not?
• Were the tools you used limiting in any way?
These discussions brought forth both incident-specific observations and broader needs of the law enforcement and public safety community. To help spur creativity for how social media and Big Data approaches could contribute to these challenges, JHU/APL demonstrated a small set of social media tools and technologies that helped illustrate the art of the possible.
The output of the brainstorming and discussion sessions was synthesized and used to prioritize goals for quick-turnaround prototyping of potential analytics and tools. These approaches were applied to large-scale social media data gathered for the incident and demonstrated to law enforcement and public safety participants at the end of the workshop. Feedback was collected after the initial information sharing and brainstorming sessions and at the end of the workshop. We worked through the active shooter case guided by the experience of law enforcement and other public safety officials who responded to the incident. We examined actual social media data from that local area during the time frame of the incident. This combination generated unique and powerful insights into key issues, promising strategies and potential pitfalls in using Big Data to help meet law enforcement and public safety needs.
At a high level, desirable features of a system to support the use of social media for law enforcement included:
Usability and accessibility-oriented features:
• Easily tailored, flexible, or customizable
Searching, filtering
User roles
• Available when and where needed
• Easy to use
Consistent with existing concepts of operations
Support a variety of audiences
• Enable communication and outreach to the community
• Enable proactive monitoring of official social media communications and their effectiveness
Information-oriented features:
• Have mechanisms for assessing accuracy or validity of data
• Provide actionable information
Alerting
• Include both real-time and historical data
• Able to handle multiple media and data types
Video and images, as well as text
Maps and geographic information
• Support analyses of social information
Groups, networks, or organizations
Implicit in such a system are numerous technical challenges to be faced if hopes of making sense of Big Data or developing situational awareness are to be realized. These challenges go beyond coping with the scale and velocity of the data. They may require approaches drawn from machine learning or other fields to identify relevant signals in the noise of social media Big Data. Aggregations of large amounts of social media data may enable significantly different approaches or novel analytics that would not be part of the typical law enforcement repertoire.
Although crisis response to an active shooter incident was the motivation for the workshop, law enforcement and public safety experts quickly uncovered additional needs and opportunities for leveraging social media Big Data. The days after the incident were times of elevated risk. There was real potential for additional copycat attacks or other disturbing and potentially violent public incidents to be triggered or influenced in some way by the mall shooting. Two such incidents did occur at the mall during the following week. The need for timely, effective communications with the local populace spiked shortly after the first 911 call and continued long after the physical incident was resolved. Besides providing updates on the situation at the mall and the status of the investigation, public information officers had to monitor social media to manage and mitigate rumors and false information, such as persistent inaccurate claims of a romantic relationship between the victims and the shooter. Investigators digging into the background of the shooter found indicators of troubling online behavior more than a year before the shooting. Building and maintaining situational awareness in the present, monitoring and interacting with the public, examining past activities in investigative or intelligence contexts, and alerting or predictive capabilities can all contribute to crisis response and to a broader spectrum of law enforcement and public safety situations. Each of these thematic areas was explored in the workshop and will be described further. Other potential areas of interest, such as policy considerations for the use of social media by law enforcement, were outside the technical scope of this workshop, but have been addressed elsewhere (Global Justice Information Sharing Initiative, 2013). Budget and other resource constraints are also out of scope, but affordability and sustainability are practical considerations, particularly for smaller departments.

Situational Awareness

I need to see the battlefield, not just the fight I’m in.

Law enforcement and other first responders need to ascertain critical factors in their environment to guide appropriate decision making and response to a dynamic situation. Failures to attain situational awareness during incident response can have catastrophic consequences (Endsley, 1995). Information overload and lack of awareness of key information are major impediments to attaining situational awareness. The scale of social media data is part of the problem. For example, the social media platform Twitter has tens of millions of new posts created every hour. The difficulties of finding the right data in this Big Data—data that are geographically relevant, topically relevant, temporally relevant, and associated with relevant individuals or groups—are even greater.

Looking into the Past

For homicides [in particular], we need to go back historically.

Investigations are conducted by law enforcement to establish the elements of an offense, identify the guilty party, and provide evidence of guilt (O’Hara and O’Hara, 1988). Determining what happened and who did it is different from maintaining situational awareness based on interpreting live information as it streams by. Analysis of Big Data from social media could identify potential witnesses, victims, and persons or locations of interest. It can surface leads or contribute to evidence collection or criminal network identification. Used appropriately as another tool in the toolbox for investigative and criminal intelligence work, social media data can contribute to public safety (Global Justice Information Sharing Initiative, 2013).

Interacting with the Public

There was a tipping point, where we became the credible source of information, not the media.

Community policing has always recognized the importance of partnership and communication with the community. Members of the community are valuable sources of information and insight and can help solve problems of crime and social disorder (Community Policing Consortium, 1994). With the advent of social media, law enforcement communication with the public has moved from broadcast mode, typically mediated by the news media to interactive dialog and engagement. It requires listening, answering, and monitoring the public’s understanding as well as sharing information. The luxury of having to prepare an official statement in time for the 5 pm news has been replaced by expectations of timely, even near-instantaneous social media updates. Events can quickly jump from local to regional, to national importance. Abilities to track and assess official social media communications and their effectiveness across large and varied audiences and to inform or correct inaccurate information in a focused fashion are important in times of emergency, as well as during day-to-day operations.

Alerting and Prediction

Can we identify threats before they become a reality?

The desire to predict criminal behavior is powerful in law enforcement and in the general public. Significant efforts have been made to predict those who are likely to commit crimes in the future: for example, to re-offend if granted parole (Berk and Bleich, 2013). Other work has focused on predicting where crime is likely to happen (geographic hot spots) or predicting identities of perpetrators (Perry et al., 2013). Crime data, demographic data, economic data, and geographic information are commonly used in these efforts, which have met with mixed success and tactical utility, with effectiveness varying depending on specific circumstances. Using social media data in a predictive or alerting capacity also poses challenges. However, detecting indicators in social media of threats or disturbances related to an upcoming public event or large gathering or more generally identifying relevant anomalies in baseline usage of social media could help law enforcement and public safety efforts to respond faster potentially intercede before an actual incident.
Tackling the Big Data generated by social media can contribute to each of these themes, but no single tool, technology, method, or algorithm is sufficient. Over the course of the workshop, a variety of methods and techniques were applied, individually or in concert with others, to build prototype capabilities. Such capabilities can help transform unwieldy quantities of information into manageable sources of information, insight, and opportunities for action. Multiple capabilities were integrated into a dashboard for ease of use and interaction with analytics. A selection of these capabilities will be described to illustrate how social media Big Data could be marshaled in support of public safety needs. First, background information on the social media data used in the workshop will be presented.

Twitter as a Social Media Source of Big Data

Twitter is a popular and widely used social media platform for microblogging, or broadcasting short messages. Twitter has hundreds of millions of users worldwide, and they broadcast over 500 million messages, known as tweets, per day. Tweets may include text, images, and links. A public tweet can be seen by anyone with Internet access, not just followers of the sender or people with Twitter accounts.
Twitter users have developed the convention of hashtags, a type of marker included in a tweet. Hashtags are words, phrases, or abbreviations that are preceded by the hash symbol “#,” such as #mallshooting. Users may choose to incorporate well-established hashtags in their tweets to provide a topical label, or they may spontaneously invent a new hashtag for a new event or idea. Use of a hashtag makes a tweet easily discoverable by anyone interested in that topic. Hashtags can be used to express emotion (#anguished) or evaluation (#ridiculous).
In addition, there are several distinct tweet-based social behaviors common in Twitter. “Retweeting” is directly quoting and rebroadcasting another user’s tweet, often an indication that the message was considered noteworthy or important enough to share. Other behaviors include mentioning another user in one’s tweet (that is, talking about that user) or directly addressing one’s tweet to another user, as if talking to that person, albeit in a public forum. Thus, people can carry on conversations in Twitter involving two to many dozens of individuals. Twitter provides additional affordances, such as the ability to follow other users or “favorite” specific tweets.
Tweets are complex objects. In addition to the message content of the tweet, each tweet has many pieces of associated metadata, such as the username of the sender, the date and time the tweet was sent, the geographic coordinates the tweet was sent from (if available), and much more. Most metadata are readily interpretable by automated systems, whereas tweet message content may require text processing methods for any automated interpretation of meaning.

Social Media Data Analyzed for the Workshop

Twitter was a clear choice for social media to analyze for the mall shooting. It was actively used by the local police and fire departments during the incident, as well as by those in the mall and surrounding areas. Twitter is widely used, and usage surges during crises and major events. Querying Twitter’s Search Application Program Interface (API) for public tweets originating within a 5-mile radius around the location of the shooting from January 25 to February 25 returned 3.7 million tweets from 24,000 unique users, amounting to 4 terabytes of data. During just the hour of the shooting, over 10,000 tweets were sent from this small area. Clearly this scale of data is beyond law enforcement capability to monitor without assistance from tools and technology. Because Twitter’s free API returns only a sample of the actual tweet activity, additional methods were employed to retrieve more tweets. Nonetheless, the data retrieved almost certainly are less than the actual amount. Over 300,000 of these tweets were tagged with geographic coordinates, allowing the location they were sent from within Howard County to be precisely pinpointed. Twitter data from this focused location-based search were complemented by data from the Twitter Decahose, a 10% feed of all tweets.
Like many social media platforms, Twitter allows users to share images, either by embedding the image in the tweet or by including a link to the image. From a sample of tweets that contained links to images, over 1000 images were retrieved. These images included photos taken by survivors sheltering in place within the mall. The images provide a basis for determining whether advanced methods could be applied to images themselves and not just to text. Automatic identification of photos containing objects such as firearms is one application.

Tools and Capabilities Prototypes during the Workshop

Social media matters for law enforcement because it enables instantaneous, unmediated connection and communication with the public, serves as a source of information and leads for situational awareness or investigations, and can contribute to measures of effectiveness and outcomes relating to public safety. A number of capabilities to support law enforcement use of social media Big Data were prototyped during the workshop. Highlights of this work will be described.

Word Cloud Visualization

Our public safety experts advised that it was essential to have an easy way to attain a big picture or summary view of what was happening in social media, tailored to the specific situation faced by law enforcement. For the workshop, a word cloud visualization capability was developed that summarized the content of tweets. Word clouds are a simple, appealing way to show the most frequent words in a body of text (Feinberg, 2009). More popular words are larger, and layout and color can be used to provide more visual interest. Rather than generate a single, static display, we created a word cloud visualization which was updated on the fly based on the set of tweets that met the user’s search query for the geographic region the user had zoomed in on. After applying standard natural language text processing techniques such as tokenization (rendering the content of the tweet into distinct words or symbols), stemming (reducing words to their more basic forms by removing suffixes, etc.), and stopword removal (eliminating common but uninformative words such as “and” and “the”), the resulting word cloud provided a simple snapshot of what people in that area were saying about a topic of interest. Because the most popular words could be far more frequent than the next few words, it is often necessary to scale the sizes of words in the visualization: for example, by computing a weighted average of the count and log value for the frequency of the word (see Figure 4.1 for an example).

Dynamic Classification of Tweet Content

Finding social media data about a topic of interest may seem as simple as typing a term into a search box, but experience shows that such an approach is riddled with false positives, hits that contain that term but are about something else. Given the scale of social media data, public safety officers could easily be swamped attempting to review search results full of irrelevant social media posts, and the output of analytics based on such inaccurate data would no longer be credible. For example, a sample of tweets in English from Howard County in the days preceding the mall shooting that contained forms of the term “shoot” were more likely to be about other topics (basketball, photo shoots, drug use, etc.) rather than actual shootings. Roughly three-quarters of these tweets were false positives.
image
Figure 4.1 Word cloud visualization of social media from an active shooter event.
To address this problem, we applied a machine learning technique to automatically classify tweets that were genuinely about a shooting. Using machine learning, a classifier can be automatically built for a category, given a set of labeled training examples (for example, “shooting” and “not shooting” tweets). Presented with a new unseen text, the classifier will predict to which category the text belongs (Sebastiani, 2002). We created classifiers to identify shooting-related and fire-related tweets. These classifiers used a support vector machine implemented through LIBSVM (Chang and Lin, 2011). Based on results from testing data, both classifiers were accurate. For the workshop, we performed dynamic classification on tweets returned by a search, to improve the relevance of results. Such an approach helps separate wheat from chaff from a user’s perspective and can improve the usefulness of any follow-on analytics or visualizations that use search results as their input, such as a word cloud. A classification approach can be particularly useful to support situational awareness, investigative, or alerting needs.

Content-Based Image Retrieval

We have kneejerk tendency to want a guy in blue to put eyes on.

Trained officers with a camera in hand might be the ideal source for photographs of crisis or natural disaster events to aid in developing situational awareness and to support mobilization and deployment of appropriate resources. However, they cannot be anywhere they might be needed at a moment’s notice. Tapping into the social media image output of people who are in the vicinity of an event, whether they are eyewitnesses, bystanders, passers-by, or victims, multiplies the sensors available to public safety dramatically. The challenge lies in culling the relevant images. During the active shooter event, a tweet describing people “in the [store name] stockroom because the malls on lockdown” was posted accompanied by a photo. It is equally possible for a relevant image to be posted with ambiguous text (“oh my god”) or with no text at all. Because social media users publicly share millions of images and videos each day, automated approaches to handing these data are needed. Content-based image retrieval methods analyze the image itself, commonly identifying features such as colors, shapes, and edges. They may be used to detect the presence of objects within the image, such as vehicles or people, discriminate between photos of indoor and outdoor scenes, or perform similar tasks.
Our case example involved an active shooter. To test the viability of identifying relevant images in social media for this case, we trained a classifier to detect social media images containing firearms. A convolutional neural network pretrained on images from ImageNet, a large image corpus (Deng et al., 2009), was used to extract features from the social media images (Sermanet et al., 2013). Many of these social media images are lower-resolution or poorer-quality photos than those typically used in image classification tasks. GentleBoost (Friedman et al., 2000), a type of machine learning algorithm, was then applied to predict the probability that an image contained a firearm, given its features. Trained on images labeled as containing AK47s, the classifier successfully identified previously unseen social media images with firearms. After sorting 1000 images based on the classifier score, those containing firearms were far more likely to rank highly, whereas low-scoring images were extremely unlikely to contain firearms. Eighteen of the top 20 highest-scoring images included firearms, whereas none of the bottom-ranking 450 images contained a firearm. Included among the top 30 images was a photo taken by the shooter of himself with his weapon, shortly before he began his attack. Although the photo did not actually appear in social media until after the attack was over (the shooter had set a delayed publication time for the post), and thus in no way could have helped predict or prevent the attack, the potential for image classification techniques to help law enforcement seems clear. Similar to text classification, image classification can support situational awareness, investigative, or alerting needs when dealing with Big Data.
This type of application differs considerably from the Next-Generation 911 system, which will modernize existing 911 capabilities to handle photos, videos, and other media types in addition to calls (Research and Innovative Technology Administration (RITA)—United States Department of Transportation, 2014). In the Next-Generation 911 context, images would be submitted to a public safety access point, much as phone calls are placed to 911 now.

Maximizing Geographic Information

Knowing where a social media post was sent from, and thus where the sender was located, can be critical for interpreting the relevance and utility of the information and sender for crisis response. Knowing which tweets were coming from inside the mall during the active shooter event had obvious value. Although tweets containing latitude and longitude information can easily be placed on a map, most tweets do not contain this information. Leveraging other information in the tweet, whether that information appears in tweet content or other metadata associated with the tweet, such as user location information, can provide a way to approximate location when coordinates are not explicitly stated. Translating a location description, such as a street address or place name, into a position on a map is known as geocoding. We enriched tweets that lacked latitude and longitude information with the results of a geocoding service and used this information to plot and visualize tweet density, finding hot spots of social media activity during and after the shooting. Each of the four thematic areas can benefit from geographic information, although it may be particularly valuable for situational awareness and investigative applications.

Detecting Anomalies

Anomalies are aberrations, exceptions, or unexpected surprises in data. Detecting anomalies translates to opportunities for action in a broad range of domains from credit card fraud to cyber intrusions, to malignant tumor diagnosis (Chandola et al., 2009). To detect anomalies in law enforcement and public safety contexts, we examined two types of anomalies: anomalous changes for specific topics of known relevance and for generic, nonspecific changes.
A number of established hashtags in the county are commonly used in public communications and public safety contexts. We created a visualization to summarize how many tweets contained relevant hashtags over time. In addition, we developed a capability to find contextual anomalies, large changes in frequency that are outside expected daily, weekly, monthly, or other patterns (see Figure 4.2). This method was also applied to the output of the “shooting” and “fire” text classifiers, in which it successfully detected actual shooting and fire events being discussed in social media. Applied to raw counts of tweets within the geographically bounded region, anomalous shifts in generic tweeting frequency can be detected. These could be indicators of events of an unspecified or unanticipated nature. In summary, basic monitoring and situational awareness can be enhanced with the potential to alert when anomalies are detected.

Influence and Reach of Messaging

The public is bypassing the media, and talking to us directly.

In the hours after the shooting, individual tweets from @HCPDNews, the police department’s official Twitter account, were retweeted—shared or propagated by others—hundreds or even thousands of times. These tweets spawned additional responses, such as mentions. Each of the retweets or mentions can trigger a cascade of further social media activity. Accurately detecting and measuring the influence, spread, or “contagion” of information or users who are sources of information in social media is complex (Romero et al., 2011). For incident management, it is essential to make sense of the flurry of activity surrounding their social media communications to the public, determine whether their messaging is effective, and shape future actions based on this knowledge. For the workshop, we explored two approaches to illuminate influence and spread of messaging. We used a dynamic graph visualization capability to show the network of activity that emerged in response to tweets from @HCPDNews. A heat map of these tweets was also plotted on Google Earth. This showed that the incident and @HCPDNews’ messaging about it were not of purely local interest but had spread outside the region, attracting national and global attention. An important consideration in this work is determining the criteria for inclusion. Retweets are clearly relevant and relatively easy to identify, whereas tweets that paraphrase the originals have murkier provenance. We also prototyped mechanisms for focused interactions with sets of social media users. One example is the ability to send a tweet proactively to Twitter users known to be inside the mall, providing them with clear information on the actions of the police, what to expect, and how to respond.
image
Figure 4.2 Detection of anomalies in social media activity.

Technology Integration

No single technology, technique, or approach is enough to meet these varied needs. We developed numerous methods and used them synergistically. A suite of open source technologies was leveraged to create the social media crisis response dashboard (see Figure 4.3). Used together, they could help support public safety needs. For example, a graphical user interface used a set of REST1 services to access a Lucene2-indexed SOLR3 database of tweet data that could be queried geographically, via zooming in on a map, temporally, and through text. These results could be displayed as pins on a map (given geographic coordinates), as a word cloud, or in a heat map. Dynamic classification tags were used to improve query results. Anomaly detection time series information could also be displayed. We also explored using cloud data stores for social media data management.
image
Figure 4.3 Prototype dashboard for social media in crisis response.

Law Enforcement Feedback for the Sessions

Law enforcement and public safety experts who participated in the workshop were uniformly positive in their evaluation of the effort and its results, particularly given the short time frame (less than a week). They found the workshop worthwhile, well organized, clear in its goals, a good use of their time, and a valuable learning experience. A number of them commented on how effectively social media information could be refined and presented to support their work, and on the desirability of future collaborations to help bring such capabilities into practice.

Discussion

The potential for social media Big Data and affordances to reshape law enforcement and public safety, as it has been shaping business, politics, science, and basic human social interaction, has been explored in the context of crisis response to an active shooter event. We explored key issues and needs of law enforcement and public safety. Situational awareness, monitoring and interacting with the public, investigation and criminal intelligence, and alerting or predictive capabilities emerged as major themes.
The prototyped dashboard illustrated capabilities in all of these areas. Integrating open source technologies and libraries of algorithms, we parsed, enriched, classified, summarized, and visualized social media text and images. Through analytics tailored to law enforcement needs, we helped tame potential torrents of Big Data into focused, manageable, interpretable information to promote understanding and help guide action.
These efforts are best thought of as an initial foray into this space, not a turnkey solution. To meet public safety needs, Big Data must be tackled at many levels. Key concerns include:
• Access, storage, and management of large, heterogeneous datasets
• Development, use, and evaluation of analytics and metrics
• Exploration ability to query, sort, filter, select, drill down, and visualize social media information
• Linkage to action, including interaction with the public
We encountered a range of potential challenges. First is the challenge of geography. Knowing where someone was or where something happened can be essential in public safety; yet, most items in the social media Big Data source we used, Twitter, are not geotagged. Advances in methods are critical to associate or infer location information, potentially from mentions of landmarks or locations (geocoding) in the text (Fink et al., 2009), associated images, or past patterns of activity.
This ties into the challenge of relevance. To get the right information from Big Data, one must ensure the data not only come from the right location, but that they are about the right thing. Keyword-based methods must be augmented by more advanced language or image processing techniques to improve precision and recall, capturing more wheat while discarding the chaff. Both supervised and unsupervised machine learning methods can contribute to this challenge.
Law enforcement experts expressed the perspective that exhaustive data from their jurisdiction were more important than a larger dataset sampled over a broader area. This highlights another challenge: the challenge of completeness. Social media providers allow various degrees of access to the publicly shared information they host. They may limit the amount of data that can be accessed through their APIs or the types of queries that can be asked. It can be difficult to gauge how complete or representative a dataset is. It will be the case that for different types of law enforcement or public safety applications, those data do not need to be exhaustive to be informative. A sample may still successfully provide tips or leads or inform about trends.
Big Data derived from social media is leading to “a revolution in the measurement of collective human behavior” (Kleinberg, 2008), requiring advances in theory, computation, modeling, and analytics to cope. For law enforcement, this final challenge holds tremendous promise for improving our ability to serve and protect the populace. Further partnerships and collaborations among researchers, technologists, and public safety professionals will hold the key to meeting this challenge.

Acknowledgments

The author would like to thank the dedicated members of the Howard County Police Department, Division of Fire and Rescue Services, the Office of Emergency Management, and the Public Information Office, who generously contributed their knowledge, insight, and experiences. Without them, none of this work would have been possible. The author would also like to acknowledge the tremendous efforts and contributions of the other members of the APL team: C.M. Gifford, C.R. Fink, J.M. Contestabile, M.B. Gabriele, S.C. Carr, B.W. Chee, D. Cornish, C. Cuellar, Z.H. Koterba, J.J. Markowitz, C.K. Pikas, P.A. Rodriguez, and A.C. Schmidt.

References

Berk R.A, Bleich J. Statistical procedures for forecasting criminal behavior. Criminology & Public Policy. 2013;12(3):513–544.

Blair J.P, Martaindale M.H, Nichols T. Active shooter events from 2000 to 2012. FBI Law Enforcement Bulletin. 2014 January 7, 2014.

Boyd D, Crawford K. Critical questions for big data: provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society. 2012;15(5):662–679. doi: 10.1080/1369118X.2012.678878.

Byrne J, Marx G. Technological innovations in crime prevention and policing. A review of the research on implementation and impact. Journal of Police Studies. 2011;20(3):17–40.

Chandola V, Banerjee A, Kumar V. Anomaly detection: a survey. ACM Computing Surveys. 2009;41(3):15:1–15:58. doi: 10.1145/1541880.1541882.

Chang C.-C, Lin C.-J. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology. 2011;2(3):27:1–27:27. doi: 10.1145/1961189.1961199.

Community Policing Consortium. Understanding Community Policing: A Framework for Action. Washington, DC: Bureau of Justice Assistance; 1994.

Davis III. E.F, Alves A.A, Sklansky D.A. Social Media and Police Leadership: Lessons from Boston. New Perspectives in Policing. March 2014 Available at: http://www.ncdsv.org/images/HKS_Social-media-and-police-leadership-lessons-learned-from-Boston_3–2014.pdf.

Deng J, Dong W, Socher R, Li L.-J, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009. 2009:248–255. doi: 10.1109/CVPR.2009.5206848.

Duggan M, Smith A. Social Media Update 2013. Pew Research Center’s Internet & American Life Project. 2013 Available at: http://www.pewinternet.org/2013/12/30/social-media-update-2013/.

Endsley M.R. Toward a theory of situation awareness in dynamic systems. Human Factors: The Journal of the Human Factors and Ergonomics Society. 1995;37(1):32–64. doi: 10.1518/001872095779049543.

Federal Bureau of Investigation. Crime in the United States 2012: Violent Crime. Uniform Crime Report Crime in the United States, 2012 (Online). Retrieved: July 16, 2014. 2012 Available at: http://www.fbi.gov/about-us/cjis/ucr/crime-in-the-u.s/2012/crime-in-the-u.s.-2012/violent-crime/violent-crime.

Feinberg J. Wordle-Beautiful Word Clouds (Online). 2009 Available at: http://www.wordle.net.

FEMA. Fire/Emergency Medial Services Department Operational Considerations and Guide for Active Shooter and Mass Casualty Incidents (Online). 2013 Available at: http://www.urmc.rochester.edu/MediaLibraries/URMCMedia/flrtc/documents/active_shooter_guide.pdf.

Fink C, Piatko C, Mayfield J, Chou D, Finin T, Martineau J. The geolocation of web logs from textual clues. In: International Conference on Computational Science and Engineering, 2009. CSE ‘09. vol. 4. 2009:1088–1092. doi: 10.1109/CSE.2009.584.

Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The Annals of Statistics. 2000;28(2):337–407.

Garfinkel S.L. Digital forensics research: the next 10 years. Digital Investigation. 2010;7:S64–S73. doi: 10.1016/j.diin.2010.05.009.

Glasgow K, Fink C. Hashtag lifespan and social networks during the london riots. In: Social Computing, Behavioral-Cultural Modeling and Prediction. 2013:311–320 Springer, Berlin, Heidelberg.

Global Justice Information Sharing Initiative. Developing a Policy on the Use of Social Media in Intelligence and Investigative Activities: Guidance and Recommendations. Global Justice Information Sharing Initiative; 2013 Available at: https://www.iadlest.org/Portals/0/Files/Documents/DDACTS/Docs/DevelopSocMediaPolicy.pdf.

Heverin T, Zach L. Twitter for city police department information sharing. Proceedings of the American Society for Information Science and Technology. 2011;47(1):1–7. doi: 10.1002/meet.14504701277.

Heverin T, Zach L. Use of microblogging for collective sense‐making during violent crises: A study of three campus shootings. Journal of the American Society for Information Science and Technology. 2012;63(1):34–47.

Jacobs A. The pathologies of big data. Communications of the ACM. 2009;52(8):36–44. doi: 10.1145/1536616.1536632.

Kleinberg J. The convergence of social and technological networks. Communications of the ACM. 2008;51(11):66–72. doi: 10.1145/1400214.1400232.

LaValle S, Lesser E, Shockley R, Hopkins M.S, Kruschwitz N. Big Data, analytics and the path from insights to value. MIT Sloan Management Review. 2011;52(2):21–32.

La Vigne N.G.L, Lowry S.S, Markman J.A, Dwyer A.M. Evaluating the Use of Public Surveillance Cameras for Crime Control and Prevention Technical Report. Retrieved: July 16, 2014. 2011 Available at: http://www.urban.org/publications/412403.html.

Lazer D, Pentland A.(S.), Adamic L, Aral S, Barabasi A.L, Brewer D, Christakis N, Contractor N, Fowler J, Gutmann M, Jebara T, King G, Macy M, Roy D, Van Alstyne M. Life in the network: the coming age of computational social science. Science (New York, N.Y.). 2009;323(5915):721–723. doi: 10.1126/science.1167742.

O’Hara C.E, O’Hara G.L. Fundamentals of Criminal Investigation. fifth ed. Springfield, IL: Charles C Thomas; 1988.

Police Executive Research Forum. Critical Incidents in Policing Series: The Police Response to Active Shooter Incidents. Washington, DC: Police Executive Research Forum; 2014 Available at: http://www.policeforum.org/assets/docs/Critical_Issues_Series/the%20police%20response%20to%20active%20shooter%20incidents%202014.pdf.

Perry W.L, McInnis B, Price C.C, Smith S.C, Hollywood J.S. Predictive Policing: The Role of Crime Forecasting in Law Enforcement Operations. Rand Corporation; 2013 Available at: https://www.ncjrs.gov/pdffiles1/nij/grants/243830.pdf.

Research and Innovative Technology Administration (RITA)—United States Department of Transportation. Next Generation 9-1-1 Research Overview (Online). Retrieved July 31, 2014. 2014 Available at: http://www.its.dot.gov/ng911/.

Romero D.M, Meeder B, Kleinberg J. Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter. In: Proceedings of the 20th International Conference on World Wide Web. 2011:695–704. doi: 10.1145/1963405.1963503 New York, United States.

Sebastiani F. Machine learning in automated text categorization. ACM Computing Surveys. 2002;34(1):1–47. doi: 10.1145/505282.505283.

Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y. Overfeat: Integrated Recognition, Localization and Detection Using Convolutional Networks. 2013 arXiv Preprint arXiv:1312.6229.

Starbird K, Palen L, Hughes A.L, Vieweg S. Chatter on the red: what hazards threat reveals about the social life of microblogged information. In: Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work. 2010:241–250.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.151.71