C H A P T E R  10

Where Do We Go From Here?

Working on this book has meant a lot to each of us. In the Kinect, we see a wonderful device that can be used for creative expression. We also see a device and technology that is capable of changing the way we put technology to work in our daily lives. At first, the Kinect and its skeleton-tracking technology are all about games and cool art projects. But it doesn't end there. The underlying technology has profound implications on how people and technology interact. In this Afterword, we each offer a few final thoughts on the impact of the technology and where it might take us in the future.

Sean Kean

I hope this introduction to development with the Microsoft Kinect has provided you with a solid foundation from which to execute new ideas that redefine our relationship with technology. You now have the building blocks for creating experiences that can help us move past the limited means of interacting with machines from the past and pave the way to a more humane relationship between people and devices in the future. As someone who initially became interested in technology for artistic and social expression, I've always felt the mouse and keyboard were a legacy of office environments that fell short of capturing the ways I wanted to play with machines. Innovations such as the Kinect, as well as the software that you will now go forth and develop, will write a new chapter of how society and technology evolve with one another.

Roughly one year after the Kinect's debut in November 2010, we've seen this device put to use in so many breathtaking ways. It's been overwhelming to keep track of or even classify the different usages. Seeing the public's imagination captured by the ‘Kinect hacks' that have flooded the web, it's clear that body gesture-based control of software is something people are eager to have integrated into their lifestyle once they've witnessed it. However, one application of this technology hasn't received quite as much attention as the others and it's the one I've been most excited about since I first saw Oliver Kreyos demonstrate it in a post to YouTube last year.

In a video entitled “3D Video Capture with Kinect” (http://youtube/7QrnwoO1-8A ), posted just ten days after the device hit stores, Oliver was the first to show volumetric 3D video that allows a viewer to move a virtual camera 360 degrees around a live scene during its recording. This still blows my mind and I think it's the sleeping giant of the Kinect that will mark a fundamental shift in the way motion pictures, photography, and live video will be experienced in the very near future. Once the tools for creating, sharing, and viewing volumetric 3D video can be demonstrated in a more mature state for the general population to consider, I'm confident that we will see widespread adoption of it for everything from video conferencing to sports, to feature films and truly 3D game systems. This is the dawning of the volumetric age.

I have feared for some time that with so much exposure to flat screen media on televisions, computers, and phones, society has eroded some of our innate abilities to decode the physical 3D world around us. I believe this shift from 2D to volumetric 3D experiences has the potential to reignite hope for a new spatial awareness that can revive a section of our minds that has lain dormant since screen media became ubiquitous. Things will truly get interesting once digital media can more closely match the depth perception capabilities we were born with but have had no way to reconcile with 2D media and traditional software interfaces.

Let's take a look at what's involved in bringing about the volumetric 3D video revolution. By examining how 2D video is created and experienced by consumers today, we can look at what needs to be put in place to do the same for volumetric 3D. Along the way, I hope you see a number of exciting opportunities to develop technology that fits into the needs that will arise as creators go from shooting 2D motion pictures to pioneering the voxies, video that is viewable from any 360 degree angle and is based on the voxel point cloud imagery generated from devices such as the Kinect.

For a consumer, recording HD video with an iPhone, trimming it on the device, and uploading it to YouTube or Facebook to share with the world is remarkably effortless. A professional may choose to use a more elaborate SLR camera that would require the additional step of connecting it to a computer along with opting to edit the video in a program, such as FinalCut or iMovie, before uploading it to the web, perhaps opting for an alternative video sharing site such as Vimeo.com. Refined over many years, there are simple, affordable, and accessible solutions for users to capture, edit, and view shared video. We will need comparable devices, services, and software to bring volumetric 3D to the mainstream market. Luckily, millions of people now have a device to capture basic volumetric 3D video with the Kinect.

If you followed through till the end of Chapter 1, you've already seen yourself captured in primitive volumetric 3D and are able to spin around your view with a synthetic camera. The latest versions of Microsoft's SDK, as well as OpenNI, lets developers make use of multiple Kinects that could be arranged in such a way to fill in the empty shadows resulting from just one camera. With the KinectFusion project, Microsoft Research shows us that there is a bright future ahead for reconstructing full 3D models of scenes in realtime (see Figure 1-25) using just one Kinect and software utilizing standard computer graphics processing chipsets. The only problem with the Kinect as a video recording device is that the user is tethered to a computer and wall outlet. This has resulted in Kinect videos containing roughly the same subject matter – people sitting at their computers.

I'd much rather shoot active video running around outdoors, adventuring in remote locations, and everything else we've come to expect that is possible from portable electronics today. Prior to the introduction of Sony's Portapak in 1967, video was pretty much immobile – just as we are today with the Kinect. TV studio equipment was so large and power intensive that it had to stay in the studio. After the Portapak's introduction, video art flourished with artists such as Nam June Paik and Bill Viola, who strapped on battery powered equipment and used the medium to explore visual expression in ways that were previously only available by working with film. Today, every smartphone is far more capable than a Portapak  – yet we will likely want to return to specialty hardware devices to take advantage of volumetric 3D's promise. This will create exciting opportunities for those that wish to design and manufacture novel cinematic tools.

For higher production quality capture, there is an exciting array of possiblities that go beyond the Kinect. The structured light approach from the PrimeSense solution is not able to work outside in bright lighting conditions that interfere with the infrared laser. Time-of-flight sensors offer one alternative to go where the Kinect cannot; however, their depthmap resolution is currently much smaller than that which is offered by PrimeSense and still relies on emitted light with a limited sensing range. A remarkable new imagining technology called a light field, or plenoptic, camera debuted this year from Lytro (www.lytro.com) that may eventually be embedded into a tool for the volumetric cinematographer.

This unique imager makes use of a micro lens array that computes all the light rays entering the camera from a number of angles and produces a depthmap similar to 3D sensors such as the Kinect. While not yet a realtime video solution, keep your eye on how this technology develops. An array of Lytro cameras surrounding a scene from different angles would not only be able to gather multiple depthmaps without interference from each other, but they would also be able to refocus on any point in an scene at the time of viewing. This would result in the type of depth of field that we've come to expect from high quality SLR cameras, but conceivably in realtime based on the users's perspective into an image. Lytro's consumer focused product may turn out to be just as disruptive and accessible to hack as the Kinect. Its breakthrough price point of US$399 is astonishing when you consider Raytrix's ( http://raytrix.de ) cameras, the only competitor, start at around US$20,000.

Before the Kinect, there was a lot of research into techniques for computing 3D scene information from stereo cameras and multi camera arrays in a technique called photogrammetry. With the advent of more mature cloud computing environments, a number of solutions are cropping up to handle this type of image processing on remote servers to reduce the requirements on user machines. Processing still imagery from an array of cameras at different perspectives is now possible using tools such as Autodesk's 123D Catch ( http://123dapp.com/catch ) and Hypr3D ( www.hypr3d.com) with photomapped 3D models of a scene returned after uploading a series of images. For use on your local machine, AgiSoft's PhotoScan ( http://www.agisoft.ru/ ) is a desktop photogrammetry solution available for Windows and Mac OS X. By using a number of inexpensive HD cameras, such as those available from GoPro ( www.gopro.com ), its conceivable to assemble a large rig with dozens of units that would capture video from an assortment of angles and then break down each frame from each camera into a series of photogrammetry batch processing jobs. Combined with 3D sensors, such as the Kinect to aid in depth mapping, we are bound to see some very interesting solutions for generating high quality volumetric 3D imagery with a blend of these techniques.

What's the difference between producing a movie in 2D and a voxie in volumetric 3D? To start, movie directors are accustomed to having absolute control over the viewer's perspective into a story through a single camera view. However, in the world of the voxie, the budding volumetric cinematographer must wrestle with choreographing performers, lighting, and camera rigs during production in a way that takes into account the way that the audience may gaze into the scene from any angle, such as by moving their head, using a controller, or simply walking around a volumetric display. But that's only the start. We'll need entirely new software to handle post production editing, transmission, storage, and display of this truly new media.

The good news is this software is being actively developed right now. The first live internet video stream of volumetric 3D video occurred during the Art&&Code 3D ( http://artandcode.com/3d ) event in Pittsburg in October 2011. This transmitted a 360 degree video of the speakers straight to web browsers tuning in around the world. This marked a significant technical accomplishment that will no doubt begin to inspire others to create more robust solutions that move beyond the limitations of sharing this depth-enabled media on systems such as YouTube and Vimeo, which currently have no capacity to store the complete volumetric data in their 2D file format.

A YouTube for volumetric 3D video, or free viewpoint video (FVV) as it is also refereed, could act as a repository for large voxel datasets in video form that could be analyzed and reprocessed with more sophisticated algorithms for 3D reconstruction, such as KinectFusion when they become available at a later point. Many people may choose to upload all of their raw volumetric video to the cloud and use web-based editing services to finish their videos in order to minimize the processing requirements on their own equipment. For more sophisticated directors, there will be a demand for professional grade workstation software for local editing and post production effects. Back in the cloud, machine vision middleware, similar to Primesense's NITE, could provide novel features based on user segmenting, skeletal tracking, and pattern recognition that could be applied to uploads in order to generate structured information for categorizing videos, objects within them, and even the semantic analysis of storylines. Once the videos are online with interactive and embeddable viewers, we can expect to see them shared online within Facebook streams and linked into the same places where 2D photos and video are used now. The opportunity for users to create mashups and remixes of user submitted volumetric video will be fascinating to watch unfold as clever artists and programmers leverage the capabilities of depth-enabled video in ways that are hard to predict.

Yet there isn't much use in capturing volumetric video if you are just going to look at it on a plain old 2D screen. While interim solutions will be available that use head tracking to allow you to experience simulated motion parallax to look around volumetric 3D on 2D screens, the driving reason to develop this type of content will result from the availability of true volumetric 3D displays that mature from techniques documented in Chapter 9. As more compelling voxel-based video content and services are created, along with games and professional 3D applications, volumetric displays will break through a whole new era of entertainment and spatial computing. As recently demonstrated by Microsoft Research's work with a touch interface for both a true 360 degree volumetric 3D display called Vermeer (http://research.microsoft.com/en-us/projects/vermeer/ ) and a their Holodesk (http://research.microsoft.com/apps/video/default.aspx?id=154571 ), which relies on head tracking, interacting with touchable imagery that occupies real 3D space opens up a realm of opportunities that were previously considered science fiction.

What kind of content and applications will consumers desire when the ability to reach out and touch volumetric video displays is priced within reach? We'll soon find out and that's where I'll be going from here. Join the volumetric age by getting involved in the community at volumetric.org and by following updates to this book at meetthekinect.com .

--Sean Kean

Phoenix Perry

Sages of the future often look foolish in hindsight. Frequently, they overstate the speed of immediate developments and underestimate the huge changes coming in the long term. That said, I am writing this prediction on the day of the death of Steve Jobs. The era of mouse-based computing has come to a close. The doors of Apple stores across America are covered in candles and the playing field for the future of computing is wide open. Gesture based computing is the future of interface design. This revolution has been developing for 20 years and the time for it is finally here. Visual recognition systems, touch screens, gesture based interfaces and voice control will be combined to replace remotes and mice over the next 5 years, particularly in casual computing experiences. User experiences will become more organic and biocentric. The wave of natural interfaces is the next big boom coming in design technology.

My disenchantment with the mouse began in 1999 when I developed an extreme case of carpel tunnel. The interface of my personal computer broke my body through bad design. I couldn't comb my hair. My boyfriend brushed my teeth and the tool that had allowed me to become a thriving creator had destroyed my body. As a result, I've spent the last 10 years healing and exploring alternate modes for computer control that allow for long term use without harming the human body. With these new modes of interactivity, we can safely develop computing experiences that match our bodies and work for the span of a human lifetime. The computing experience is being wildly rethought. Designers and DIY makers are pushing the market forward by creating new experiences. Users hunger for richer, more personalized, tactile experiences. We are rethinking the digital experience and integrating it into the human experience. From reactive signage integrating facial recognition with mobile shopping experiences and smart living rooms to new ways to heal the mind and body, there is no end to the immersive experiences waiting to be created.

Culturally, music and art making are being torn wide open. Your instrument can be anything you could possibly imagine and even draw with your fingers in the air. Media artists can map video and images directly on the body, including the face, with precision. Motion capture can happen in your living room. Artists can draw in 3D in the physical world with just their hands and then print the results out via a desktop fabrication machine bought for under $3000 from MakerBot Industries. Research is being done with brain wave control that might allow artists to work by simply closing their eyes. The future has arrived. It just looks different than we expected, and fortunately it's not the pristine corporate modeled plastic interface of the past but seamlessly integrating into the human landscape. The future of design is open source and in the hands of the makers.

--Phoenix Perry

Johnathan C. Hall

If you're reading this, I can assume that you at least find Microsoft's Kinect and other Kinect-like sensors to be intriguing. If you were born in the last millennium and don't take every technological feat for granted, you might even agree that these devices are pretty amazing. But are they revolutionary? I don't have the answer, but I can tell you where I'm looking for this technology to support social, cultural, and economic change—for better and worse—and it's not in the living room. It's in public and quasi-public spaces.

A touch-free computer interface has a certain utility that's inherent in its touchlessness. For example, a touch-free interface is more hygienic and therefore offers clear advantages if used in hospitals and doctor's offices, in clean rooms, operating rooms, and rest rooms. A touch-free interface can also empower even vertically-challenged people like me (I'm 5′9″… okay 5′8″… on my tippy toes) to intuitively manipulate arbitrarily large media for experiences in immersive entertainment, art, education, or marketing. A touch-free interface can even initiate "passive" interaction by responding to where and how many people are situated in a given space and providing intelligent, contextual feedback. much like the "ubiquitous computing" scenarios envisioned by the legendary Xerox PARC scientist Mark Weiser among others.

There remain, of course, significant barriers to our realization of these benefits. For example, I was mortified by my very first Kinect experience when, after a vigorous round of Kinect Adventures, I was presented with pictures of myself caught in compromising poses. As my Xbox threatened to post them to Facebook. I shrieked, “Noooooooooo!” and dove to yank the plug out of the wall. Who's going to be caught dead gesticulating like a moron anywhere but their living room?

Ten years ago, I might've asked, equally incredulous, “Who's going to be caught dead having a messy breakup with their significant other over the phone on a crowded train?” And yet, this genre is a staple in the soundtrack of commuter life in major metropolitan areas. The point is our cultural rules and habits do change in the wake of technological innovation and adoption: witness the mobile phone.

I believe that people will grow accustomed to a certain constrained repertoire of motion-controlled interactions with public screens over time. Part of that evolution is cultural, but part of it is in the technology itself or, more specifically, in the design of applications. Applications for touch-free interfaces in public spaces will necessarily be less physically demanding than Kinect Adventures or most Xbox games, and will be more like the Xbox dashboard, intended for quick, casual, mostly utilitarian interaction. My work on Sensecast (see Chapter 3) is designed to support just this level of engagement:  check in for a meeting in the lobby, browse some information relevant to your health at the doctor's office, grab the full text of a news story on your phone, and go. (Of course, it's far too early in the lifecycle of this work to say that we are doing it right.)

Like our willingness to post our “status” publicly on Facebook or to “check in” at a Starbucks, our interactions with public screens have the potential to create whole new ecosystems of cultural and economic value, as well as exploitation, as we'll see below. My hope is that we can steer this potential toward the good:  to transform public spaces into more sociable places through shared media that orchestrates our interaction not only with computers but with each other. Our collective habit today is one of passive, solitary media consumption. Smart filters, niche blogs, and micro-blogging let us tailor our media diets to only our own interests. So-called social and mobile apps, meanwhile, isolate us from our geographic communities by channeling our attention away from them. Imagine Kinected applications that get us on our feet in common spaces, meeting our neighbors, permeating our day-to-day lives. Imagine:

  • 8:00 a.m. On the train platform, commuters gather around a display that bears headlines and photos from a town council meeting the night before. One reads: “Youth Center to Go to Referendum.” The display polls the surrounding audience for a literal thumbs-up or thumbs-down on this decision, records their gestures, and collects/displays the aggregate town sentiment. Before you board the train, you can beam the full story to your mobile phone.
  • 3:00 p.m. High school student council members meet in the public arcade with signs urging action on the town's stalled youth center project. They hold the signs up to a community display, where an onboard Kinect recognizes their activity and snaps a photo, distributing the image across a town-wide network.
  • 7:00 p.m. A chime sounds in a crowded café, and a ceiling-mounted digital display starts showing quiz questions about local data: Did the crime rate go up or down this year? What percent of the town budget goes to education? How much does the average family pay in property taxes? Onlookers are able to "buzz in" by mimicking a game-show push-button with two hands. The display then selects and follows whomever buzzed in first, allowing him/her to choose an answer on screen.

While I've given my examples a decidedly civic cast to make a point, a much broader set of applications and games will no doubt be unleashed upon our public spaces by creative technology companies, advertisers, non-profits, and government entities in the years to come. Some will be good and some bad. But the potential is there is to create real value for people by delivering rich experiences, critical information, and spontaneous play around the shared interests and spaces of real, not virtual, communities. By designing for public and quasi-public spaces, developers of Kinected applications can explore a new era of real, not virtual, social and location-based media.

The first application for Sensecast was a news browser placed just outside a Columbia Journalism School café in a semi-public building with high foot traffic. It encouraged passers-by to read a given story lede, and if so moved, to “like” it with a thumbs-up gesture. Why? Our historian colleagues at the J School note that before 1900, people didn't read newspapers alone but rather aloud with friends and strangers gathered around. If philosopher Jürgen Habermas is to be believed, this socio-political dimension of public life, now lost to history, can support a more vital democracy. Perhaps with shared, Kinected news displays that persuade us to also connect with each other, we can resuscitate it.

Maybe. But maybe not. Privacy is a holy term in the American and European lexicons and publicity a suspect one (consider the words “publicity stunt,” “publicity whore,” etc.). The humanist geographer Yi-Fu Tuan points out that, in the ancient Greek world, these poles were reversed: privacy is related to the Greek word for idiot, as purely private folk were considered to be like shut-ins not fit for any role in society. Meanwhile, the lofty peaks of human flourishing were reserved for those willing to roll out to the agora, to make themselves known, to act on a public stage. In most of the modern world, however, privacy is king.

Still, the jury on publicity is not yet in. We tack between obliviousness to the tools of surveillance (security cameras, browser cookies, social networks, etc.) and a justified paranoia about them. As I wax euphoric about the potential of Kinect-like cameras to transform public space for the better, no doubt some of you are growing duly uncomfortable with the level of surveillance that's entailed—or, at least, enabled.

I consider these concerns, as I've said, “justified paranoias.” While I freely refer to the Kinect as a “camera,” you will note that Microsoft and device manufacturers in the space explicitly do not. They diligently assert their preferred term:  “sensor.” That choice is a conscious marketing decision intended to make vague the data that the device collects. As you've seen throughout this book, these “sensors” that we've willingly invited into our homes are powerful cameras capable of passively collecting quite a bit of intimate detail about us, our dimensions, our homes, and our families. We know from patent filings that as Microsoft rolls out live TV service on its Xbox platform—replacing cable's set-top box—the company is integrating Kinect into systems for parental control and advertising. The Kinect not only provides you a convenient remote control that you will never lose again, it provides Microsoft and its partners a rich profile of you and realtime data on who's watching. We're all Nielsen families now!

This all may seem creepy. Do we just accept as true the dystopian aphorism promulgated by Napster creator Sean Parker in a recent talk at the 2011 Web 2.0 Summit in San Francisco–“Today's creepy is tomorrow's necessity”?

Again, I don't have the answer. I've chosen to focus my work on Kinected applications for public spaces, a domain that seems less frought with privacy concerns than whatever the likes of Microsoft, Apple, Google, and Facebook might be doing with our “private” data. This domain is not free of concern, of course. Consider that 3D-hinted facial recognition algorithms are probably an order of magnitude more robust than their straight 2D counterparts. Deploying Kinects widely in public space could conceivably spell the end to anonymity in public. Of course, that outcome is a ways off and possibly intractable, as ownership of physical space is not nearly as consolidated as ownership of, say, mobile platforms, thus preventing any one party from owning all the data. But is it technologically possible? Yes.

In any case, there is clearly a non-trivial trade-off to be made when weighing the values of privacy and publicity. Companies and individuals have built amazing products and made them available to us for free or at low cost in exchange for a share of our privacy. And indeed, like the ancient Greeks, we may stand to gain something ourselves by living more public lives. We also stand to be exploited and sold as “eyeballs,” or now “skeletons.” No doubt the Kinect and the ecosystem of companies and developers building with it will stretch concepts of privacy and publicity in new directions. You, by picking up this book and doing with it whatever you do with it, are part of that vanguard. Please Kinect responsibly.

--Jonathan Hall

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.114.142