Chapter 19. Information Control and Big Data Visualization

To some, the phrase Big Data results in a knowing nod. To others, a roll of the eyes. Regardless of your viewpoint or the hype, Big Data itself poses big problems, and big urgency, for businesses and organizations that need to harness complex datasets in a manner that brings about genuine, usable insights. In this chapter we explore several examples of how groups have leveraged the strengths of immersive virtual reality systems to gain increased understanding from large, often disparate pools of information.

What Is Big Data?

Our physical world contains an incomprehensible amount of digital information, and it is growing at an equally incomprehensible rate. The 2014 IDC/EMC Digital Universe study that quantifies and forecasts the amount of data produced annually indicates that by 2020, the digital universe will grow by a factor of 10—from 4.4 zettabytes to 44 zettabytes. This represents a doubling of the amount every two years (Turner et al., 2014).

Equally impressive is the growing number of sources for this computed data. These include massive retail databases; machine log data; sensors in our environment (the Internet of Things); web traffic; health data; social media sites such as Facebook, Twitter, and Instagram; mobile devices; traditional media; business applications; scientific research projects; utilities and smart grids; and on and on.

Big data is popularly characterized using three defining properties or dimensions known as the 3Vs: volume, velocity, and variety. Volume obviously refers to the amount of data being generated within an enterprise, organization, or research project. Velocity refers to the speed at which the data is being generated. Variety refers to the number of different types of data. According to the 3V characterization model, the great challenges of big data management and analysis stem from the expansion of all three properties, as opposed to just volume alone (Laney, 2001).

With all of this information being generated and stored, the great challenge now is how to carefully exploit the data in a manner enabling the organization or researchers to extract value to increase efficiency, grow profits, or otherwise uncover hidden trends, important features, or interrelationships. The result can aid corporate decision-making (assuming a company understands how to use the resulting data), help defense and intelligence organizations identify emerging threats, determine which ads you are most likely to click on, and provide insights into various science and engineering problems (Cukier, 2010).

This challenge of exploiting big data can be broken down into several areas, including curation (organization and integration of data from multiple sources, annotation, presentation, and preservation), storage, querying, and sharing, all of which directly impacts efficient analysis.

Big Data Analytics and Human Vision

Big data analytics is the process of collecting, organizing, and analyzing large sets of data to better understand the information contained therein. Typically this is carried out using specialized software tools and applications for predictive analytics, data mining, text mining, forecasting, and data optimization (Beal, 2014), the totality of which represents the computational side of the process. But there are aspects to data analytics where humans greatly excel beyond the capabilities of computers, most notably in perceiving and interpreting patterns across multiple variables and groups, identifying anomalies, and interpreting the content of images. This means visualization.

Currently it is standard practice in big data analysis to present results in the form of colorful graphs, charts, plots, and pseudo-volumetric representations based on fundamental cognitive psychology principles such as color and size to denote differences and importance, connections to identify patterns, and similarities. Such output can be highly useful, but it is still limited in terms of what can be effectively communicated because of preset visualization types.

As has been shown throughout this book, the human brain is capable of extraordinary information processing tasks as long as the data is in a form that leverages the strengths of the human perceptual mechanisms. Of these mechanisms, vision is our dominant sense, with roughly a quarter of our brain devoted to processing visual stimuli and which provides the highest-bandwidth perceptual channel into our cognitive systems (Reda et al., 2013). Ideally, developing the tools and methods to more effectively harness this pathway is a major goal.

In the remainder of this chapter we will look at several impressive examples of applying immersive virtual reality to the analysis of big data problems. It is important to point out that unlike general scientific visualization applications for immersive displays (which are widespread), these examples are different in a number of ways.

Foremost, scientific visualization generally refers to analysis of large amounts of data produced by numerical simulation of physical processes. These examples deal with real or measured data, as well as data from multiple sources.

Visualization of Longitudinal Study Data

In 2015, Epic Games (makers of the Unreal 4 gaming engine) and The Wellcome Trust biomedical research charity launched a contest known as the Big Data VR challenge. The goal of the challenge was to develop methods of using immersive virtual reality systems to unlock new ways to manipulate and interrogate the huge data sets that are now generated by many scientific studies and to facilitate greater information understanding (Cowley, 2015).

Competing teams from around the world were paired with live scientific research projects for a four-month period. In the winning effort, two London-based companies, Masters of Pie and Lumacode, Ltd (together known as team LumaPie), were partnered with the University of Bristol’s Avon Longitudinal Study of Parents and Children (ALSPAC) titled, “Children of the 90s.” Within that study, which has lasted more than 20 years, researchers have intensively tracked a broad selection of variables for more than 14,000 pregnant women, their spouses, and offspring. The variables include diet, lifestyle, socioeconomic status, parent-child contact, BMI, pulse, waist size, weight, and more. Tens of thousands of biological samples were also collected from the participants over the years, including urine, blood, hair, toenails, milk teeth, and DNA. The totality of enormous scientific dataset represents the most detailed study of its kind in the world investigating the environmental and genetic factors that affect a person’s health and development (ALSPAC, 2015).

In developing the visualization layout, the application designers leveraged the inherent ability to spread the data out around the user, such as is depicted in Figure 19.1. Data elements were represented by 3D primitives whose size, orientation, color, and position could depict field values with a simple glance, such as an elongated pyramid or spheres textured with heat maps. Those data elements were used to populate circular and arc-shaped DNA-like coils, allowing more data to be represented in a given area than would have been possible on simple straight lines. The coils themselves could be set in motion to rotate around the user, allowing researchers to quickly scan the data far more efficiently just by moving their head rather than scrutinizing spreadsheets of alphanumeric data (Masters of Pie, 2015).

Image

Credit: Image courtesy of Masters of Pie, Ltd

Figure 19.1 This image depicts the information layout of the LumaPie Avon Longitudinal Study of Parents and Children (ALSPAC) data visualization application. Note how the space was effectively utilized by laying out data in a manner that completely surrounds the user.

Filters and modifiers were developed to sort data and facilitate pattern and trend recognition (see Figure 19.2), as well as to easily export the results of a particular study back into raw alphanumeric form for sharing with other researchers. During investigations, the user employs a virtual laser (driven via commercial handheld controllers) to point to specific data elements. When the laser would hit one of the 3D primitives, a label would appear displaying the alphanumeric values of that particular piece of data (Masters of Pie, 2015).

Image

Credit: Image courtesy of Masters of Pie, Ltd

Figure 19.2 This image is an end-to-end data flow diagram detailing filter and modifier functions for the ALSPAC visualization and analytical application.

An interesting aspect to the overall solution was the multiuser system design. While the main user drives a particular investigation and has complete access to all the raw and highly confidential data, a second user, either local or remote, is able to view a filtered version of the specific visualization underway but only the key data relevant to his query, to protect ALSPAC participant privacy. Direct interaction between the two users is facilitated using a live-chat capability (Masters of Pie, 2015).

Results

LumaPie’s solution has received rave reviews for its ability to rapidly facilitate discovery of trends and patterns in the data even by untrained analysts. The system is a fully functional immersive virtual environment built from the data itself. Users have full control within the VR space, enabling intuitive interaction and manipulation of the data (Cowley, 2015). The application leverages the unique human ability to quickly recognize patterns in color, size, movement, and 3D spatial position to produce tangible outputs (ALSPAC, 2015).

Visualization of Multidisciplinary Mining Data

The establishment of an underground mining operation is a massive, complicated, and expensive undertaking that generates tremendous amounts of multidisciplinary information. Consider what is involved in just the exploration stage. Everything starts with detailed studies and surface reconnaissance, including aerial photos, aeromagnetic and gravimetric surveys, geological surface mapping, sampling, geochemical studies, and more. Once a deposit of interest is located (referred to as discovery), the next stage is the mapping of underground structures and dimensions of the deposits, as well as the content and distribution of the ore. This is then followed by drilling to further investigate and sample the mineralization in depth. It is only after this extensive analysis that a project then moves into the development and production phases, which themselves produce even greater quantities of data.

Throughout each phase of a mining project, from initial geoscience and exploration to development and production, significant time and expense are invested in attempting to understand exactly what is below the surface, as well as how to safely and cost effectively recover the desired material. Unfortunately, despite many advances in the mining industry, so called “Big Data” in this field is very different from that of other industries. Project data is often deposited in both hard and soft copy formats spread across multiple locations without structure or standards. Additionally, although much of this information describes complex 3D structures and phenomena, it is often represented in two dimensions. This creates significant challenges in performing detailed, integrated evaluations, and it impedes discovery and establishment of relationships between different data types (Suorineni, 2015).

In plain English, the result is highly inefficient data interpretation and less than optimal mining operations.

To facilitate increased understanding of exactly what this varied data represents below the surface, as well as to more accurately plan and execute recovery operations, researchers with the University of New South Wales (UNSW) in Sydney, Australia, are applying immersive visualization technologies as a means of supplementing traditional mining data interpretation, to facilitate mine planning and infrastructure layout.

In one simple example, Figure 19.3 shows the integration of mapping, geophysical (surface and inversion models), drilling, and resource models to help engineers visualize the full potential of deposits within a prospective mining site (Vasak and Suorineni, 2010). As can be inferred from these model images, significantly greater understanding of the overall geometry of the complex underground structures, as well as the location of existing sample drillholes (blue lines on the left), can be acquired by combining multiple datasets and viewing within an immersive, interactive 3D setting. Further, the planning of new drillholes (red lines on the right) to explore the extent of the formation of recoverable materials (blue areas on right) becomes a far more intuitive 3D positioning task compared to making such determinations based on 2D representations.

Image

Credit: Image courtesy of Fidelis T Suorineni - UNSW Australia

Figure 19.3 This image shows the integration of mapping, geophysical, drilling, and resource models to help engineers more effectively visualize the full potential of ore deposits within a prospective mining site.

In a more complex example, Figure 19.4 shows a detailed snapshot from UNSW’s Block Cave Mining Visualizer, an interactive 3D application designed to allow users to visualize multiple combined datasets associated with the “block cave” underground mining process.

Image

Credit: Image courtesy of Fidelis T Suorineni – UNSW Australia

Figure 19.4 This image shows a detailed, multi-dataset model within UNSW’s Block Cave Mining Visualizer.

In block cave mining, once a recoverable ore body is identified, a large network of tunnels is dug beneath the ore formation. Large upward-facing funnels are blasted into the rock above these tunnels. Ultimately, the ore formation is then undercut, creating an artificial cavern that fills with rubble as it collapses under its own weight. This broken ore falls into the large rock funnels and then into the tunnels, where it is recovered and transported to the surface.

Of particular note with this visualization application is the fact that, in addition to simultaneously displaying multiple tightly correlated datasets, including those representing geologic formations, numerical stress estimation results, seismic sensor positions, and actual mine development geometry (tunnels, undercuts, and so on), the application allows the playback of actual seismic events and other time-sequenced data (Vantage, 2015).

This application was developed for use with the UNSW’s innovative AVIE (Advanced Visualization and Interactive Environment) display shown in Figure 19.5, a custom-designed, standalone, cylindrical silvered screen measuring 10 meters in diameter by 4 meters in height. Active stereo imagery is supplied by six overhead projectors, the display fields for which are seamlessly blended to create a continuous 360° immersive viewing area. Users of the system are given active stereo shutter glasses.

Image

Credit: Image courtesy of iCinema Centre for Interactive Cinema Research, UNSW Australia

Figure 19.5 This illustration shows the design of the 10-meter diameter 360° AVIE (Advanced Visualization and Interactive Environment) display located at the University of New South Wales (UNSW) in Sydney, Australia.

In standard operating mode, real time markerless motion tracking and gesture recognition allows the operator to control movement through the models. The large size of the display together with the application provides a high degree of spatial and temporal context, a compelling sense of immersion as well as ample space for multidisciplinary collaboration.

Conclusion

Immersive virtual reality systems appear to hold significant potential as another means through which to conduct Big Data analytics, although this application area is extremely young and filled with questions. Just as there are ongoing struggles encountered in attempting to figure out effective ways to produce useful analytical products based on 2D graphics, charts, and other representations, similar challenges exist with this new medium, albeit with the benefit of an additional dimension and interactivity with which to work. In one sense, this could be considered an expansion of an artist’s palette. Specifically, what is the best way to represent data (often abstract, or of high dimensionality, or of great variety and quantity) in a manner that facilitates greater insight and understanding? This is made even more challenging by the fact that in most instances you do not know what you are looking for; otherwise, you would likely be able to automate the search. Could the visual representation be expanded to include an audio or tactile component? The technologies are certainly there. Now all that remains is the creative application.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
100.28.132.102