Chapter 3. Case Studies in Data Collection

Data collection is hard—even experienced engineering teams sometimes fail to build systems that express their intentions correctly. Here are a few real-world examples of high-profile systems that didn’t adequately account for the privacy considerations inherent in data collection.

Google Street View WiFi: Inadvertent Over-Collection of Data

Google’s Street View uses information gathered by specially outfitted cars to produce extremely detailed maps of city streets around the world. Along with street photography tied to GPS tracked data, the Google Street View cars were also recording WiFi signals. The growing database of the location of the world’s wireless networks aided mobile phones in determining their position faster than a GPS satellite fix alone could provide.

However, the Street View cars were not just mapping out the location of the WiFi networks but actually recording and storing snippets of network traffic. Any time the WiFi antennas on the Street View cars picked up unsecured WiFi traffic, individual 802.11x Ethernet frames were captured. These recorded frames included not just the headers that specified the name or SSID of the network (which was all the information Google needed to map the network), but also the full contents of that frame, meaning any and all data being transmitted. Depending on how the users accessed the Internet, this may have included things like passwords and full email messages.

Google became aware of this problem in 2010 after a data-protection authority in Germany asked to audit the data they were collecting on WiFi networks.

“As soon as we became aware of this problem, we grounded our Street View cars and segregated the data on our network, which we then disconnected to make it inaccessible,” Google wrote on its blog. “We want to delete this data as soon as possible, and are currently reaching out to regulators in the relevant countries about how to quickly dispose of it. Maintaining people’s trust is crucial to everything we do, and in this case we fell short.”

Google then outlined the steps it would be taking in asking a third party to review the relevant software, confirming deletion of the data, and reviewing internal procedures to address similar problems in the future. They also ended the practice of collecting WiFi network data via Street View cars.1

Google acknowledged that it was a mistake carried over by experimental code written in 2006, and stated it had no intention to collect or use such payload data. Presumably, if Google was aware of exactly what was being collected by the Street View cars, they would have not collected the full frames at all, or immediately performed a minimization process to purge all the captured except for the MAC address, SSID name, and its corresponding GPS location.

As of mid-2015, there is still pending litigation against Google for this mistake. Google’s error underscores the importance of aggressive oversight of data-collection processes to ensure data intake is consistent with the goals of the program and with privacy law and policy. It’s never safe to assume a collection system is going to perform flawlessly.

iPhone Location Database

When Apple released iOS 4 in June 2010, it included a new, silent feature on all iPhones running the newest version of their mobile operating system. The list of every location visited by each iPhone was now recorded in a file named consolidated.db. Discovered by Alasdair Allen and Pete Warden, this finding was presented at the Where 2.0 conference in April 2011. From their original blog post on the subject:

“This contains latitude-longitude coordinates along with a timestamp. The coordinates aren’t always exact, but they are pretty detailed. There can be tens of thousands of data points in this file, and it appears the collection started with iOS 4, so there’s typically around a year’s worth of information at this point. Our best guess is that the location is determined by cell-tower triangulation, and the timing of the recording is erratic, with a widely varying frequency of updates that may be triggered by traveling between cells or activity on the phone itself.”2

According to Apple, this was not exactly a record of the locations the phone had been. Instead, it was actually a copy of crowd-sourced location information to aid the phone in rapidly geolocating itself faster than it could using just a GPS satellite signal (which sometimes takes minutes to compute). Using anonymous data submissions from everyone’s iPhones, Apple had created a large database recording the location of cell towers and WiFi networks. Each individual phone would download a subset of the cache to speed up the time it would take to get a location fix in a matter of seconds.3

The completeness of the database—which appeared to be holding about 10 months of location data—was due to (according to Apple) a software bug that would never remove data downloaded into the cache. When operating correctly, the phone was only supposed to hold seven days of location database cache rather than a seemingly permanent record of every location the phone had visited.

Apple fixed the perceived problems with this feature in a subsequent iOS update:

"Software Update Sometime in the next few weeks Apple will release a free iOS software update that:

  • reduces the size of the crowd-sourced Wi-Fi hotspot and cell tower database cached on the iPhone,

  • ceases backing up this cache, and

  • deletes this cache entirely when Location Services is turned off.

In the next major iOS software release the cache will also be encrypted on the iPhone.”

This case is notable for several reasons:

  • Apple made the decision to leak information containing personal identifiers—the names of all the observed WiFi networks in an area—in order to optimize the location calculations. By contrast, Google, which operates a similar service for Android phones, chose to go a different route that better preserved privacy: rather than send individual phones portions of its crowd-sourced databases, each phone sent Google the list of networks it was currently seeing, and Google would send back its server-calculated location based on that data. The result is the same—the current location of the phone—but Google’s architecture didn’t leak information collected by other phones.

    While it’s true that SSID information is broadcast in the clear and, as such, could be considered public information, the Apple model leaked historical information about WiFi networks. Anyone in range can see a WiFi network when it’s turned on, but the Apple model made any network ever observed in that area available to any phone that visited that area, and in a larger area than the phone’s radios could actually pick up. The result was that any visit to a geographical location with an iPhone was enough to retrieve wide-area wireless survey data as crowd-sourced and recorded by Apple.

  • Apple’s architectural decision to cache the data on the phone, along with a software bug that made that cached data persistent rather than transient as originally intended, inadvertently created a location-tracking database on the phone. This collection of cached information created a privacy risk for the phone owner.

In this case, Apple failed to properly minimize the data that lived on the phone, and created a system where the collection and storage of data was disproportionate to the ultimate need for it. In addition, Apple failed to secure that information from easy unauthorized access, thereby creating the risk of exposure of highly sensitive information about the movement of its customers.

Conclusion

The examples above illustrate how privacy protections can be thwarted from within, simply through small lapses in data-collection practices. However, there are myriad ways for external attackers to compromise your data and threaten its privacy. Systems designed with privacy and security in their architecture from the very beginning have a much better chance of protecting and upholding these values.

1 “WiFi Data Collection: An Update”. Official Google Blog. May 14, 2010.

2 Allan, Alasdair. “Got an IPhone or 3G IPad? Apple Is Recording Your Moves”. OReilly Radar. April 20, 2011.

3 “Apple Q&A on Location Data”. Apple Press Info. April 27, 2011.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.71.21