The glass slabs are everywhere, and they seem to want to obnoxiously and rudely isolate us from the rest of society. We stare at our smartphone screens, texting someone afar while neglecting the warmth of an in-person conversation with friends who are next to us. The dopamine hit from our phones buzzing in our pockets has become far too difficult to ignore. We must know what fresh notifications are waiting for us—it doesn’t matter if they’re a result of someone we hardly know on Facebook merely “liking” an insignificant photograph. Admittedly, first-world societies have noticed how the glass-slab display of the smartphone is making our interactions soulless and less human. It is negatively influencing our behavior and respect of one another’s presence, and we are taking notice. It is increasingly becoming frowned upon to play with our smartphones in meetings, on dates, and during important conversations. There are areas of interaction that seem permanently obsolete, however. Look around the next time you are in an elevator or a neighborhood bar and notice the number of people with their heads down, staring at the glaring glass slabs of their smartphones. The romanticism of striking up a meaningful conversation with a stranger seems diminished.
The smartphone is only a recent example of how the glass display can influence society and our interactions with one another. We will pick on these devices a little later, but the award for the most influential and distracting display of all goes to the television. It is the TV, nicknamed “the idiot box,” that has shaped the influence of technology on our society for the last few decades. Pervasive as it is, an element of disdain is evident in the nickname. Try to start a conversation about a recent TV show at a cocktail party, and you will quickly run into someone in the group who will claim ignorance of the topic because he is proud not to own a TV. Some of this disdain is with merit. There are far too many instances of parents abusing TV to distract their children with content that dulls their intellectual capacities. There is little argument against the hypothesis that children who watch TV for hours a day are being robbed of valuable time that could be spent in more productive pursuits, or perhaps furthering paternal and maternal bonds. We can also imagine how adults who are glued to TV for hours, with no emphasis on curated content, are likely to learn misinformation and dwell on superficial content targeted toward the entertainment of the mass audience.
The television deserves as much praise as it deserves criticism, though. Aside from popular entertainment, people around the planet depend on the television for information that furthers their understanding of the world around them. We get to hear different opinions, and watch debates and documentaries that are truly educational. Television also allows us to share in worldwide events. Ask anyone alive in the US in July 1969 how profound an event it was to watch the Apollo 11 mission landing the first human beings on the moon. An estimated 600 million people watched Neil Armstrong and Buzz Aldrin walk on the surface of the moon, demonstrating the triumph of humankind’s success in harnessing technology. The coverage of the moon landing in the US and across the world bought societies together to appreciate the spirit of collaboration and the sense of humility gained by comprehending the vast distances in space—reaching our nearest neighbor, the moon, was no small feat. Even though the US was responsible for the mission, the world watched in awe, and credit was given to the entire human race.
The television has brought us live coverage of events that have forever changed our lives and impacted our opinions. The soul-crushing events of September 11, 2001 in New York City left a scar in the hearts of almost everyone who watched the live footage of the terrorist-hijacked airplanes smashing into the twin towers of the World Trade Center, followed by clips of innocent victims jumping out of windows, and the buildings collapsing, to the horror of people around the world.
No matter where you stand on the cumulative contribution of the television, we know these devices aren’t going anywhere any time soon. Families around the globe, in their billions, own TVs and watch the content broadcasted on them on a regular basis. In recent times we’ve seen huge improvements in these devices, with TVs sporting larger screen sizes and greater resolutions, resulting in stunning picture quality. High-definition (HD) televisions offer resolutions of up to 2,073,600 pixels per frame. The 4K and 8K standards are upcoming ultra-high-definition successors, with 4K offering four times this resolution (and 8K is rumored to offer resolutions up to 7,680 x 4,320 (33.2 million) pixels.
The new wave of “Smart” televisions in the market today is focused on providing us with much more than improved resolutions. These devices connect to our WiFi networks to serve us in ways we might never previously have imagined a TV could or would. These TVs include services such as watching streaming video, videoconferencing, social networking, and instant messaging. In the IoT landscape, this “thing” we’ve known as the traditional TV is morphing into a display that serves us in variety of new ways, in addition to displaying regular content.
Smart TV displays are becoming increasingly popular in households for the added purposes they serve. The current generation of Smart TVs are expensive and available only to the relatively affluent. However, given the general track record of how quickly technology becomes cheaper, the feature sets of Smart TVs will be available to the masses in the coming years. It is likely that the next incident contributing to global triumph or heartbreak will be viewed by millions of individuals on their Smart TVs.
Given that they plug into our WiFi networks, on which many of our other important computing and IoT devices reside, it becomes important that we evaluate the secure design of the Smart TV devices that are in the market currently. In this chapter, we will take a look at actual research in the area of attack vectors against Smart TVs to understand how we can improve them and securely enable an IoT future that is likely to continue to include these devices in one way or another.
Many of the popular Smart TVs, particularly from Samsung, run the Linux operating system. They are essentially similar in design to desktop or laptop computers, the only difference being that their user interface design is tailored toward displaying video content from various sources. Using a powerful operating system like Linux also gives Smart TVs the ability to run various applications such as Skype and a web browser. We will discover details of the underlying architecture as we analyze some well-publicized attacks against Smart TVs in this chapter. Let’s start with a basic attack vector called Time-of-Check-to-Time-of-Use (TOCTTOU), publicized by researchers Collin Mulliner and Benjamin Michéle.
The TOCTTOU attack targets one of the most basic security capabilities in consumer electronics: the ability for the device to ensure that a software update is legitimate and created by the manufacturer or a trusted third party. This enables the manufacturer to protect its intellectual property and secures the device against malware that can violate the integrity of the software or compromise the privacy of the consumer. A good example is the jailbreak community surrounding Apple’s iOS operating system, which powers the iPhone and the iPad. Apple continuously builds new security mechanisms to prevent others from being able to modify the core functionality of its devices, to preserve ownership of the experience of the products and to prevent malicious applications from infecting the devices. The jailbreak community, on the other hand, strives to find loopholes in Apple’s security mechanisms so it can modify the functionality of the devices to install customized tweaks and software not authorized by Apple. In the case of Smart TVs, manufacturers want to protect their devices from running unauthorized code to protect their intellectual property, to avoid warranty issues caused by users uploading buggy code, and to protect digital content such as online rental movies from being recorded. Smart TV users, on the other hand, may want to break the security mechanisms enforced by manufacturers so they can enable additional tweaks, fix software issues on devices that are no longer supported by the manufacturer, and perhaps engage in theft by permanently recording rental-based media content.
Mulliner and Michéle’s research focuses on the Samsung LExxB650 series (Figure 5-1) of Smart TVs, even though the concept of the TOCTTOU attack vector can be applied to other consumer electronic devices that may be similarly vulnerable.
In the case of Smart TVs and other electronics, the USB port is often used to read and write files that can comprise media content, applications, and software updates. A storage device, such as a USB memory stick, can be plugged into the TV to watch content stored on the memory stick, as well as to install Smart TV apps and upgrade firmware.
Apps specifically written for the Samsung LExxB650 series of TVs can be of two types: Adobe Flash and native binaries. Mulliner and Michéle’s research targets the native binary approach. These binaries end with the .so extension, which means that the binaries are able to share code with other binaries and are loaded at runtime. The advantage of this is that other modules can use code and applications written using this approach, which reduces the size of executables and also allows developers to change shared code in one file and not have to recompile other dependencies. The Samsung TVs use Linux, so this approach makes sense. In the world of Microsoft Windows, these files are known as dynamic link libraries (DLLs).
Samsung uses BusyBox, which combines tiny versions of many common Linux utilities into a single executable. The BusyBox system is useful for powering consumer devices because it offers an easy way to include or exclude commands, making it extremely modular.
The Samsung TVs run a binary called exeDSP that basically controls the entire functionality of the system. It is responsible for the user interface navigation, allowing the user to change settings, and for accessing the applications. The exeDSP binary runs as the root user; i.e., with full privileges.
The apps written for Samsung TVs contain a minimum of three files: the executable code (Adobe Flash or a shared object), a bitmap (the icon for the app), and the package description in a file called clmeta.dat. Here is an example of a clmeta.dat file:
<?xml version="1.0" encoding="utf-8"?> <contentlibrary> <contentpackid="tocttou"> <category>Wellness</category> <title language_id="English">tocttou</title> <startpointlanguage_id="English">tocttou.so</startpoint> <thumbnailpath>tocttou.bmp</thumbnailpath> <totalsize>1</totalsize> </contentpack> </contentlibrary>
The startpoint
tag specifies the actual binary, which in this case is tocttou.so. The category
tag specifies the type of app, which in this case is Wellness. Other common categories recognized by Samsung are Game and Children. Mulliner and Michéle noted that applications of type “Game” are in the form of shared objects, while other categories are typically Adobe Flash applications.
In the case of shared objects, the Game_Main
function call is invoked by the exeDSP executable, which is coded using the C programming language. The following is some simple shared object code:
int Game_Main(char *path, char *udn) { system("telnetd &"); return 0; }
In this case, the application starts up the Telnet service (assuming it is installed on the system). However, the LExxB650 series of Samsung TVs does not allow the installation of additional applications that are shared libraries. This severely limits the ability of a third party to modify the functionality of the TVs, or to install malicious code that could infect the devices (for example, letting an the attacker invade the owner’s privacy by viewing video from a camera attached to the TV or stealing any credentials that may be stored on the TV). The goal of the research was to test and demonstrate if there is a way to override this limitation.
Recall that the exeDSP executable runs with root privileges. The exeDSP process is also responsible for starting up applications that are shared libraries. Since exeDSP does not lower the privileges of shared libraries that it executes, the ability to install additional third-party applications is immensely attractive to an attacker, as well as to users who want to extend or modify the functionality of their TVs. Therefore, the goal of the attack is to somehow get the TV to allow installation of an external application that is of the Game category, which corresponds to shared library code.
Mulliner and Michéle used a Gumstix expansion board to set up the attack. The Gumstix board is equipped with a USB OTG port, which allows other USB devices to connect to it as clients (for example, USB memory sticks and digital cameras). USB OTG also allows the Gumstix board to function as a client (i.e., to connect to other USB hosts as a storage device, like a USB memory stick).
The Gumstix board is basically a mini computer. The manufacturer’s instructions on how to connect to a new Gumstix board are useful in understanding the functionality and capability of the board.
The g_file_storage.ko module is part of the Linux USB stack. By using this module and presenting the Gumstix board as a USB storage device, it is possible to analyze what files the TV reads when presented with an application. In the case of the Samsung TV, non–shared library applications (i.e., Adobe Flash applications), are copied from the USB device to the TV’s internal storage and executed. Each application should be in its own directory, which includes a bitmap file, the clemeta.dat file, and the actual binary as listed in the startpoint
tag in clmeta.dat.
The g_file_storage.ko utility takes the filename of a filesystem image as a parameter and exports it as a USB device. When connected to a host, each block request is read and sent over. The researchers modified the utility to also track every block read request in order to ascertain exactly what information the TV is reading when presented with a new application. The following is a sample output from the modified version of g_file_storage.ko when the TV is presented with an Adobe Flash application:
11:18:56 TOCTTOU (DIR) 11:18:56 CLMETA.DAT (471b) [/TOCTTOU] 11:18:56 CLMETA.DAT -> read completed! 11:18:56 CACHE (DIR) 11:18:56 CLMETA.DAT (450b) [/CACHE] 11:18:56 CLMETA.DAT -> read completed! 11:19:10 CACHE.BMP (843758b) [/CACHE] 11:19:10 CACHE.BMP -> read completed! 11:19:10 TOCTTOU.BMP (490734b) [/TOCTTOU] 11:19:10 TOCTTOU.BMP -> read completed! 11:19:56 TELNETD (1745016b) [/TOCTTOU] 11:19:56 TELNETD -> read completed! 11:19:56 TOCTTOU.SO (4608b) [/TOCTTOU] 11:19:56 TOCTTOU.SO -> read completed!
In this case, the g_file_storage.ko module running on the Gumstix board plugged into the Samsung TV included two applications in directories of their own: TOCTTOU and CACHE. For each application, the TV requests the clmeta.dat file (at the 11:18:56 mark). The user is then presented with the categories of applications that are available to be installed. Let’s assume the TOCTTOU application is of type Wellness and the user selects this using the TV remote. At this time, the entire contents of the TOCCTOU directory are copied to the TV’s internal storage, including the bitmap image, the telnetd binary executable, and the TOCTTOU.SO executable. Note that applications of the Game category will not be installed by the TV since externally coded shared library code is prohibited.
Notice that the clmeta.dat file is only read once (11:18:56). When the user installs the TOCTTOU application, the TV does not reread the clmeta.dat file. This is because the TV runs Linux, which includes the functionality of a block cache. File read operations can slow things down, and the block cache functionality speeds things up by storing recently accessed file operations into the TV’s RAM, which is faster to read than a filesystem.
The idea behind the TOCTTOU attack is to initially provide the TV with an application directory in which the corresponding clmeta.dat is of the Wellness category. Once the TV verifies this, the user is able to select the application, and the TV will copy the entire contents of the application directory into its local storage and execute it. The TOCTTOU attack changes the clmeta.dat category to Games after the initial verification, allowing for shared library code to be installed. In order to do this, Mulliner and Michéle further extended the functionality of g_file_storage.ko to be able to track how many times a file (the trigger file) has been requested for read. Furthermore, g_file_storage.ko was extended to switch to another image once the read count for the trigger file had reached a certain value (the trigger count).
The researchers created two filesystem images for the attack. The first image, called B (for Benign), includes two applications, TOCTTOU and Cache. Each of these applications contains a clmeta.dat file with a category of Wellness and corresponding files for icons and executables. The TOCTTOU application includes the telnetd executable. The second image, called M (for Modified), includes the exact same files, but with the clmeta.dat file in the TOCCTOU directory modified to the Game category.
The researchers then used their modified g_file_storage.ko code to attach to the TV as a USB stick and serve the B image. When the TV reads the clmeta.dat file in the directory of the Cache application, g_file_storage.ko switches to the M image in the background. Now, when the user elects to install the TOCCTOU application, the files from image M are served to the TV. The problem is then that even though the malicious image M contains the clmeta.dat file with category of Game, it is not reread by the TV upon installation because it is in the TV’s memory, thanks to its block caching functionality. The researchers got around this by making the size of the clemeta.dat file in the Cache application greater than 260 MB (by padding it with extra spaces). This exhausts the RAM allocated to block caches and makes the TV reread clmeta.dat, which is now of category Game.
This attack succeeds because the TV only checks the category of the clmeta.dat file initially and not when it is reread (therefore the name: Time-of-Check-to-Time-of-Use). Here is the output of g_file_storage.ko as this attack is played out:
1 TOCTTOU (DIR) 2 CLMETA.DAT (471b) [/TOCTTOU] 3 CLMETA.DAT -> read completed! 4 CACHE (DIR) 5 CLMETA.DAT (272630223b) [/CACHE] 6 CLMETA.DAT -> read completed! [device switched!] 7 CACHE.BMP (843758b) [/CACHE] 8 CACHE.BMP -> read completed! 9 TOCTTOU (DIR) 10 TOCTTOU.BMP (490734b) [/TOCTTOU] 11 TOCTTOU.BMP -> read completed! 12 TELNETD (1745016b) [/TOCTTOU] 13 TELNETD -> read completed! 14 TOCTTOU.SO (4608b) [/TOCTTOU] 15 TOCTTOU.SO -> read completed! 16 CLMETA.DAT (471b) [/TOCTTOU] 17 CLMETA.DAT -> read completed!
When the Gumstix board is first plugged into the TV, g_file_storage.ko serves up files from image “B.” The TV reads the clmeta.dat files and makes sure they are not categorized as Game. Notice that the Cache application’s clmedta.dat file is about 270 MB, which fills up the cache memory allocation in the TV. This will make the TV reread previously cached files from the Gumstix board. At this point, the g_file_storage.ko utility switches to image M (signified by device switched!
in line 6). The TV is satisfied that none of the applications is of type Game and allows the user to pick an application to install. The user selects the TOCTTOU application, and the TV copies all files in the TOCTTOU directory to its local storage, including an additional binary for the Telnet service (telnetd).
Notice that the TV rereads the clmeta.dat file in step 16, which is served from image M and is categorized as Game. Since the TV doesn’t double-check the categorization upon rereading the file, the application is copied onto local storage and executed by exeDSP with root privileges. In this way, the researchers were was able to trick the TV into running a shared library application with the highest privileges. In this case, they used the Game_Main
function in tocttou.so to invoke the telnetd binary. Assuming this binary is modified not to ask for a password, an attacker can use this method to log in to the TV (using a Telnet client) with no password and directly obtain a root shell.
This is a great example of how a simple attack can be used to bypass restrictions and security functionality designed into popular Smart TVs. Even though this attack requires physical access to the TV, it is still interesting because it exploits a simple vulnerability: the TV doesn’t check the categorization of the application when rereading the clmeta.dat file.
We shouldn’t discount the probability of an attack because it requires physical access. A specific family could indeed be targeted via a social engineering attack. This could take the form of a modified board (such as the Gumstix) being physically mailed to the family in the guise of an official update from the manufacturer. Because many Smart TVs include cameras for video calls (or allow third-party cameras to be plugged in), falling victim to this ploy can result in loss of privacy in addition to the risk of the Smart TV being compromised and abused to launch further attacks into the home network.
The countermeasure for this attack is quite simple. The TV should first copy any third-party application onto local storage and then check the categorization. If the categorization check fails, the TV should discard and reject the application. This is also true for other types of IoT devices that allow users to install only certain types of applications. This will help ensure that the IoT devices users depend on for their privacy are safe and not vulnerable to simple attacks like TOCTTOU.
The field of cryptography is alive and thriving. Advances in encryption algorithms and computational power are helping to protect our data and the integrity of our software and hardware. IoT devices are and will continue to be dependent on encryption to make sure the privacy of the user is protected and their own integrity is not compromised. Encryption algorithms are great tools to leverage to promote secure design, but ultimately, the architects and developers must have a proper understanding of how the algorithms work to be able to design them securely. Lack of comprehension of the fundamentals of encryption algorithms can and will make the end product vulnerable to flaws and attacks.
In this section, we will take a look at how the lack of understanding of basic encryption algorithms led a Samsung Smart TV to become vulnerable to a local (physical access required) attack that allowed the user to modify the TV’s firmware. This is a similar outcome to the TOCTTOU scenario, but the attack vector exploits an implementation flaw that uses XOR encryption. We will quickly recap the XOR algorithm and analyze how the attack works.
XOR (eXclusive OR, see="XOR encryption”) is a Boolean algebra function. Quite simply, it will return true
if one, and only one, of the two operators is true. With this logic, the following table holds true:
1 XOR 1 is 0 1 XOR 0 is 1 0 XOR 1 is 1 0 XOR 0 is 0
Let us write a simple C program to XOR a string cat
with the key KEY
:
#include <stdio.h>
int
main
()
{
char
string
[
4
]
=
"cat"
;
char
key
[
4
]
=
"KEY"
;
for
(
int
x
=
0
;
x
<
3
;
x
++
)
{
string
[
x
]
=
string
[
x
]
^
key
[
x
];
printf
(
"%c"
,
string
[
x
]);
}
printf
(
"
"
);
return
1
;
}
Note that ^
represents an XOR operation in the C programming language.
Now let’s compile it:
$ gcc xor.c -o xor
And run it to see the output:
$ ./xor ($-
The XOR operation of cat
and KEY
results in the output ($-
. This is because the program performs an XOR operation of c
with K
, a
with E
, and t
with Y
. Let’s analyze one of these operations, c
with K
. The ASCII value of c
is 99
, which is represented in binary as 01100011
. The ASCII value of K
is 75, which is represented in binary as 01001011
. Now let us XOR these two values:
01100011 (XOR) 01001011 -------- 00101000 --------
The result is 00101000
in binary, which is the decimal 40, whose ASCII value is (
. This explains why the program output is ($-
. (Feel free to repeat this manual exercise for the remaining two characters: you should come up with $
and -
.)
In our case, the encryption key was KEY
and the clear-text data was the word cat
, resulting in the cyphertext ($-
. Anyone who knows the cyphertext and is in possession of the key KEY
can decrypt ($-
back to the clear-text cat
. Let us make sure this works:
#include <stdio.h>
int
main
()
{
char
string
[
4
]
=
"($-"
;
char
key
[
4
]
=
"KEY"
;
for
(
int
x
=
0
;
x
<
3
;
x
++
)
{
string
[
x
]
=
string
[
x
]
^
key
[
x
];
printf
(
"%c"
,
string
[
x
]);
}
printf
(
"
"
);
return
1
;
}
Let’s compile and run the program:
$ gcc xor2.c -o xor2 $ ./xor2 cat
This is a simple and easy description of how XOR works. Of course, in our case, we used a key of the same length as the clear-text data so that the example is easy to understand. In real life, it is important to use a longer key; otherwise, it becomes easy for an attacker to guess the key with brute force. If the data is longer than the key, the key is repeated to match up with the data. XOR is a very strong encryption algorithm when the key is a one-time pad (i.e., if the key never repeats and is as long as or longer than the data).
Samsung allows users to download firmware that can be placed on a USB stick and connected to its Smart TVs in order to perform upgrades. We will download the firmware for the PN58B860Y2F model. In this case, we will analyze the firmware upgrade issued on September 22, 2009 (version 1013; see Figure 5-2).
Even though the firmware upgrade file is in the Windows executable format of .exe, it is also a ZIP file that can be uncompressed using the unzip tool:
$ unzip 2009_DTV_2G_firmware.exe Archive: 2009_DTV_2G_firmware.exe inflating: T-CHE7AUSC/crc inflating: T-CHE7AUSC/ddcmp creating: T-CHE7AUSC/image/ inflating: T-CHE7AUSC/image/appdata.img.enc inflating: T-CHE7AUSC/image/exe.img.enc extracting: T-CHE7AUSC/image/info.txt inflating: T-CHE7AUSC/image/validinfo.txt inflating: T-CHE7AUSC/image/version_info.txt inflating: T-CHE7AUSC/MicomCtrl inflating: T-CHE7AUSC/run.sh.enc
The important firmware image files appear to be T-CHE7AUSC/image/appdata.img.enc and T-CHE7AUSC/image/exe.img.enc. Let’s see what happens when we inspect these files using the strings tool, which is used to output the printable parts of binary files:
$ strings T-CHE7AUSC/image/exe.img.enc ct-KLG7CUQC, KHM7@USCT-CHE7AUz'r ausct dect CHE7AUSCT-CHE7AUSCT-CHE7AUSCT-CHE7AUSCT-CHE7AUSCT-CHE7AUSCT- CHE7AUSCT-CHE7AUSCT-CHE7AUSCT-CHE7AUSCT-CHE7AUSCT-CHE7AUSCT- CHE7AUSCT-CHE7AUSCT-CHE7AUSCT-CHE7AUSCT-CHE7AUSCT-CHE7AUSCT- CHE7AUSCT-CHE7AUSCT-CHE7AUSCT-CHE7AUSCT-CHE7AUSCT-CHE7AUSCT- CHE7AUSCT-CHE7AUSCT-CHE7AUSCT-CHE7AUSCT-CHE7AUSCT-CHE7AUSCT- CHE7AUSCT-CHE7AUSCT-CHE7AUSCT-CHE7AUSCT-CHE7AUSCT-CHE7AUSCT- CHE7AUSCT-CHE7AUSCT-CHE7AUSCT-CHE7AUSCT-CHE7AUSCT-CHE7AUSCT- CHE7AUSCT-CHE7AUSCT-CHE7AUSC [rest of output removed for brevity]
Isn’t it interesting to see the string T-CHE7AUSC
repeat in a file that is supposedly encrypted? It is especially notable because it is also the name of the root directory, which is created when the firmware download is unzipped. If the image files are truly encrypted, this string should not be showing up in clear text. What is going on here? Well, let’s take a moment to consider what happens when a character is XOR’d with the null
ASCII character of decimal value 0
. Null strings are often used to signify the ends of strings in memory and represented with the escape sequence of