Lesson 25. Working with binary data

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Lesson 25. Working with binary data

After reading lesson 25, you’ll be able to

Use the ByteString type to efficiently work with binary data
Treat ByteStrings as regular ASCII strings by using ByteString.Char8
Glitch JPEG images by using Haskell
Work with binary Unicode data

In this lesson, you’ll learn about working with binary file data by using Haskell’s ByteString type. ByteString allows you to treat raw binary data as though it were a regular string. To demonstrate the use of ByteString, you’ll focus on a fun project requiring you to manipulate binary file data. You’ll create a simple command-line tool that will allow you to create glitch art, like that in figure 25.1.

Figure 25.1. A scene from Michael Betancourt’s glitch art video “Kodak Moment” (2013)

Glitch art is the practice of deliberately corrupting binary data in order to create visual artifacts in an image or video. You’ll work on the relatively simple task of “glitching” JPEG images. You’ll also take a look at some of the issues around working with binary Unicode data.

Consider this

You have the name of the Japanese author Tatsuhiko Takimoto represented in Japanese Kanji using T.Text:

tatsuhikoTakimoto :: T.Text
tatsuhikoTakimoto = ""

You need to know the number of bytes in this text. For ASCII text, this would be the length of the text, but in this case, using T.length gives you only the number of characters (5). How can you find the number of bytes?

25.1. Working with binary data by using ByteString

So far in the unit, you’ve only looked at working with text in files. You started working with the basic String type and then learned that Text is a better type for working with textual data. Another important type that’s similar to String and Text is called ByteString. The interesting thing about ByteString is that it’s not specifically for text, as the name String might imply. ByteString is an efficient way to deal with any streams of binary data. Like Data.Text, you almost always import Data.ByteString with a qualified import by using a single letter:

import qualified Data.ByteString as B

Even though ByteString is an array of bytes and not a type of text, you can always use ASCII to represent strings of bytes. There are 256, or 2⁸ (8 bits) ASCII characters, so every possible byte can be represented as an ASCII character. As long as you’re using the OverloadedStrings extension, you can use literal ASCII strings to represent vectors of bytes.

Listing 25.1. `ByteString` defined by using the `OverloadedStrings` extension

sampleBytes :: B.ByteString
sampleBytes = "Hello!"

But you quickly run into a problem if you try to convert your ByteString into an ordinary String by using B.unpack. The following code will throw an error.

Listing 25.2. Trying to unpack a `ByteString` into a `String` causes an error

sampleString :: String
sampleString = B.unpack sampleBytes

As you can see by the type signature, B.unpack attempts to convert the ByteString into a list of bytes (of type Word8):

B.unpack :: B.ByteString -> [GHC.Word.Word8]

By default, Data.ByteString doesn’t allow you to treat bytes just like Char, so instead you use Data.ByteString.Char8. The Char8 stands for 8-bit Chars (ASCII characters). You need to import Char8 separately, and usually use the qualifier BC:

import qualified Data.ByteString.Char8 as BC

You can see the difference between plain ByteString and ByteString.Char8 by looking at the types of their unpack functions:

B.unpack :: BC.ByteString -> [GHC.Word.Word8]
BC.unpack :: BC.ByteString -> [Char]

You can see that ByteString.Char8’s unpack works just like Data.Text’s unpack. ByteString.Char8 allows you to use the same core functions for working with text as Data.Text does. The careful reader will also notice that the type signature of B.unpack has changed! B.unpack now uses the ByteString representation from ByteString.Char8. This means you’re free to treat your ByteStrings as plain ASCII text from here on out.

Like Text, ByteString shares a common API with String. As you’ll see in the next section, you can use all of the same functions you would when working with Text and String with binary data. This makes it easy to reason about efficiently stored binary data just as you would a normal list.

Quick check 25.1

Q1:
Write a function that takes numbers in ASCII character form and converts them to Ints. For example, make the following an Int:
bcInt :: BC.ByteString
bcInt = "6"

QC 25.1 answer

bcInt :: BC.ByteString
bcInt = "6"

bcToInt :: BC.ByteString -> Int
bcToInt = read . BC.unpack

25.2. Glitching JPEGs

Now that you’ve covered the basic use of ByteString, let’s dive into creating glitch art. All the code for your program will be put in a file called glitcher.hs. You’ll be working with an image that can be downloaded from Wikipedia (https://en.wikipedia.org/wiki/H._P._Lovecraft#/media/File:H._P._Lovecraft,_June_1934.jpg). For this exercise, you’ll name this file lovecraft.jpg, shown in figure 25.2.

Figure 25.2. The target of your glitching is the lovecraft.jpg image.

To get started with your program, let’s see how to create the basic functionality you want for reading and writing your image. Here’s the basic structure of the program:

Take a filename argument from the user.
Read in the binary data for the image file.
Randomly alter bytes in the image data.
Write a new file containing the glitched image.

You’ll use both Data.ByteString and Data.ByteString.Char8 in the program to work with the image’s binary data. Because you’re working with binary data, you want to read in your file by using BC.readFile. Here’s the basic outline of your program without any of the glitching code.

Listing 25.3. Basic layout for your glitcher.hs file

import System.Environment
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as BC

main :: IO ()
main = do
   args <- getArgs                                              1
   let fileName = head args                                     2
   imageFile <- BC.readFile fileName                            3
   glitched <- return imageFile                                 4
   let glitchedFileName = mconcat ["glitched_",fileName]        5
   BC.writeFile glitchedFileName glitched                       6
   print "all done"

1 Using getArgs to access the filename
2 The first and only argument should be your file.
3 Next you want to read your file in by using the BC version of readFile.
4 You’re using return because eventually this will be replaced with an IO action that will alter the binary data.
5 Glitching can often create broken files; you definitely want to write to a new file.
6 Finally, you use the BC version of writeFile to write your altered data to the new file.

With this bit of code, you can run your program, pass in a file, and get a new file that should be your glitched JPEG. The only thing missing is the code to glitch your image!

Quick check 25.2

Q1:
At this point, the glitched variable in your main doesn’t need to be an IO type. Change that line so that glitched is a regular variable.

QC 25.2 answer

let glitched = imageFile

25.2.1. Inserting random bytes

Part of the aesthetic of glitch art is to try different approaches to corrupting the data and see what works. You’ll start with replacing a random byte in the file with another byte you pick at random. Creating a random number requires an IO action. But it’s always best to separate as much code from IO actions as you possibly can, because your non-I/O code is pure and predictable. You can also easily test your code by loading it into GHCi and trying it out on a range of data samples.

Before you make your IO action, you’ll start with a function that will convert an Int to a Char. Because Char is a member of Enum, you can use toEnum. You could use toEnum by itself, but if you do, you have no way to enforce the constraint that your Char is between 0 and 255. To enforce this, you’ll take modulo 255 of the Int you pass to toEnum. You’ll put all this together in an intToChar function.

Listing 25.4. `intToChar` creates a valid byte from an `Int`

intToChar :: Int -> Char
intToChar int =  toEnum safeInt
 where safeInt = int `mod` 255

Next you need a function that will convert this Char into a ByteString. You can do this by using BC.pack to take your Char and make it a BC.ByteString. Because BC.pack requires a string, you need to put your Char inside a list.

Listing 25.5. `intToBC` takes an `Int` and gives you a single-character `ByteString`

intToBC :: Int -> BC.ByteString
intToBC int = BC.pack [intToChar int]

Now that you have a way to make an Int into a single byte represented as a BC.ByteString, you can write the code to replace a byte with this value. You still don’t need to use IO actions yet.

Your replaceByte function is the deterministic version of your random goal. This function will take the location of the byte to be replaced, the Int value of the new Char/Byte to go there, and the bytes of the image file. You’ll use BC.splitAt to split your byte around the target byte. BC.splitAt will give you a pair of values representing the first part of the data and then the rest (just like calling take and drop at the same time). Then you’ll drop one from the rest of the bytes to make room for your new byte. Finally, you’ll concatenate the new byte in the middle of these two sections.

Listing 25.6. `replaceByte` removes a byte and replaces it with a new one

replaceByte :: Int -> Int -> BC.ByteString -> BC.ByteString
replaceByte loc charVal bytes = mconcat [before,newChar,after]           1
  where (before,rest) = BC.splitAt loc bytes                             2
        after = BC.drop 1 rest                                           3
        newChar = intToBC charVal                                        4

1 Your new ByteString is just the parts before and after the byte you’re replacing concatenated with your newChar.
2 BC.splitAt will give you a pair of values, like drop and take at the same time. You can use pattern matching to assign these to two variables at once.
3 You use BC.drop 1 here to remove the byte you’re going to replace.
4 You’re representing your byte as an ASCII Char.

Now you’re ready for your IO action. You’ll be using randomRIO from System.Random. randomRIO will take a pair of values in a tuple and randomly give you a number in that range. Your IO action will be named randomReplaceByte. All randomReplaceByte needs to do is pick two random numbers: one for the Char, and one for the location.

Listing 25.7. `randomReplaceByte` applies random numbers to `replaceByte`

randomReplaceByte :: BC.ByteString -> IO BC.ByteString
randomReplaceByte bytes = do
  let bytesLength = BC.length bytes
  location <- randomRIO (1,bytesLength)
  charVal <- randomRIO (0,255)
  return (replaceByte location charVal bytes)

Now you can use this IO action in your main to modify your image file:

main :: IO ()
main = do
  args <- getArgs
  let fileName = head args
  imageFile <- BC.readFile fileName
  glitched <- randomReplaceByte imageFile
  let glitchedFileName = mconcat ["glitched_",fileName]
  BC.writeFile glitchedFileName glitched
  print "all done"

You can compile your program and run it at the command line:

$ ghc glitcher.hs
$ ./glitcher lovecraft.jpg

These results are okay but not as dramatic as you’d hoped, as shown in figure 25.3.

Figure 25.3. The underwhelming effect of changing a single byte

Let’s try something a bit more sophisticated to see if you can get better results.

Quick check 25.3

Q1:
Write an IO action that returns a random Char.

QC 25.3 answer

randomChar :: IO Char
randomChar = do
   randomInt <- randomRIO (0,255) -- could also use max and min bound
   return (toEnum randomInt)

25.2.2. Sorting random bytes

Another common technique for image glitching is to take a subsection of bytes and sort them. You can achieve this by splitting your ByteString at a point by using BC.splitAt, and then splitting the second half of this into a chunk of a fixed size; you sort the chunk and then put it all back together with mconcat. Here’s your sortSection function, which takes a starting point of the section, a size of the section, and the byte stream.

Listing 25.8. `sortSection` sorts a section of bytes in your file

sortSection :: Int -> Int -> BC.ByteString -> BC.ByteString
sortSection start size bytes = mconcat [before,changed,after]
  where (before,rest) = BC.splitAt start bytes
        (target,after) = BC.splitAt size rest
        changed =  BC.reverse (BC.sort target)

All you need to use this in your main to create an IO action that picks a random starting point.

Listing 25.9. Randomizing your `sortSection` by using an `IO` action

randomSortSection :: BC.ByteString -> IO BC.ByteString
randomSortSection bytes = do
  let sectionSize = 25                                       1
  let bytesLength = BC.length bytes
  start <- randomRIO (0,bytesLength - sectionSize)           2
  return (sortSection start sectionSize bytes)

1 Here you’re somewhat arbitrarily picking the size of the section you’re going to sort.
2 Using randomRIO to figure out where you should sort your data from

You can replace randomReplaceByte with randomSortSection and try a revised approach.

Listing 25.10. Your `main` revised to use `randomSortSection` to glitch your file

main :: IO ()
main = do
  args <- getArgs
  let fileName = head args
  imageFile <- BC.readFile fileName
  glitched <- randomSortSection imageFile
  let glitchedFileName = mconcat ["glitched_",fileName]
  BC.writeFile glitchedFileName glitched
  print "all done"

With this trick, you get much more interesting results, as you can see in figure 25.4.

Figure 25.4. A much more interesting result, achieved with `randomSortSection`

But you could probably do better if you could combine these approaches!

25.2.3. Chaining together IO actions with foldM

Suppose you want to use randomSortSection twice on your data and randomReplaceByte three times. You could rewrite your main like this.

Listing 25.11. A cumbersome approach to applying multiple actions

main :: IO ()
main = do
  args <- getArgs
  let fileName = head args
  imageFile <- BC.readFile fileName
  glitched1 <- randomReplaceByte imageFile
  glitched2 <- randomSortSection glitched1
  glitched3 <- randomReplaceByte glitched2
  glitched4 <- randomSortSection glitched3
  glitched5 <- randomReplaceByte glitched4
  let glitchedFileName = mconcat ["glitched_",fileName]
  BC.writeFile glitchedFileName glitched5
  print "all done"

This works, but it’s clearly cumbersome to write code this way, and it’s easy to make a simple typo with all the names you have to keep track of. Instead, you can use foldM from Control.Monad. Just as mapM generalizes map to monads (at this point, just code using do-notation), foldM does the same for folding. With foldM, you can take your original imageFile as the initial values, and then a list of IO actions that will transform your file. The only thing missing is a function that will apply these functions. In this case, you can use a simple lambda. Here’s your main rewritten by using foldM.

Listing 25.12. An improved way to use multiple actions with `foldM`

main :: IO ()
main = do
  args <- getArgs
  let fileName = head args
  imageFile <- BC.readFile fileName
  glitched <- foldM (ytes func -> func bytes) imageFile
                                                [randomReplaceByte
                                                ,randomSortSection
                                                ,randomReplaceByte
                                                ,randomSortSection
                                                ,randomReplaceByte]
  let glitchedFileName = mconcat ["glitched_",fileName]
  BC.writeFile glitchedFileName glitched
  print "all done"

Now you can compile your program one last time and see what kind of glitches you can make! Figure 25.5 shows an example.

Figure 25.5. Now your beloved author looks more like a resident of Innsmouth!

There’s still probably more you could do to make this image even more interesting, but now you have a setup that allows you to easily chain together whatever strange mutations you can come up with.

Quick check 25.4

Q1:
Create a variable glitchActions outside your main that includes all your actions in a list. Don’t forget to give it the correct type.

QC 25.4 answer

glitchActions :: [BC.ByteString -> IO BC.ByteString]
glitchActions = [randomReplaceByte
                ,randomSortSection
                ,randomReplaceByte
                ,randomSortSection
                ,randomReplaceByte]

25.3. ByteStrings, Char8, and Unicode

As you’ve seen in our glitch art example, ByteString.Char8 is a helpful tool for treating binary data as though it were text. But it’s important to be careful when using ByteString, ByteString.Char8, and Unicode data. Here’s an example of setting a BC.ByteString to a Unicode string (for this Unicode, you’re using the Devanagari script for the famous philosopher Nagarjuna).

Listing 25.13. Creating a Unicode `BC.ByteString`

nagarjunaBC :: BC.ByteString
nagarjunaBC = ""

If you load this into GHCi, you see that the Unicode isn’t preserved:

GHCi> nagarjunaBC
"(>ETB0MFSA("

This isn’t too surprising, as Char8 ByteStrings are only for ASCII. But you may want to transform text to just bytes for a variety of reasons, the primary one being writing Unicode to a file as ByteStrings. Suppose you have your Unicode safely represented as the Text type.

Listing 25.14. Same Unicode example, properly represented as `Text`

nagarjunaText :: T.Text
nagarjunaText = ""

To convert nagarjunaText to a vector of bytes, you can’t simply use BC.pack because BC.pack is of type String -> ByteString, so first you’d need to use T.unpack and then BC.pack.

Listing 25.15. Attempting to transform `Text` into a `ByteString`

nagarjunaB :: B.ByteString
nagarjunaB = (BC.pack . T.unpack) nagarjunaText

If you look at the type signature, you should have your Unicode safely represented as bytes. But if you convert back, you see this isn’t the case. Note that you need to do a qualified import of Data.Text.IO to ensure that you’re printing the text properly:

GHCi> TIO.putStrLn ((T.pack . BC.unpack) nagarjunaB)
"(>ETB0MFSA("

You’re stuck with the same problem! If you had written nagarjunaB to a file, you’d have ultimately lost your Unicode. What you need is a way to convert Text directly to a B.ByteString and not a BC.ByteString along the way. For this, you use Data.Text.Encoding, and you’ll do another qualified import:

import qualified Data.Text.Encoding as E

This module contains two essential functions that allow you to perform this direct transformation:

E.encodeUtf8 :: T.Text -> BC.ByteString
E.decodeUtf8 :: BC.ByteString -> T.Text

Now you can safely convert Unicode text to raw bytes and back again.

Listing 25.16. Converting between `Text` and `ByteString` with `de/encodeUtf8`

nagarjunaSafe :: B.ByteString
nagarjunaSafe = E.encodeUtf8 nagarjunaText

GHCi> TIO.putStrLn (E.decodeUtf8 nagarjunaSafe)

To be safe, never use the convenience of Data.ByteString.Char8 if you’re working with data that may contain Unicode. If you’re working with purely binary data, as in the case of this lesson’s example, the combination of regular ByteStrings and Char8 works great. For anything else, stick to ByteString, Text, and Text.Encoding. In this unit’s capstone, you’ll see an extended example of this latter case.

Summary

In this lesson, our objective was to teach you about writing binary data in Haskell. The ByteString type allows you to treat raw binary data similarly to the way you would ordinary strings. This can greatly simplify how you write programs editing binary data. But it’s essential to remember not to mix single-byte representations of binary data (Char8) with Unicode text. Let’s see if you got this.

Q25.1
Write a program that reads in a text file and outputs the difference between the number of characters in the file and the number of bytes in the file.

Q25.2
Add another glitching technique, randomReverseBytes, that randomly reverses a section of bytes in your data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Lesson 25. Working with binary data

Create new playlist

Sign In

Sign Up

Lesson 25. Working with binary data

Figure 25.1. A scene from Michael Betancourt’s glitch art video “Kodak Moment” (2013)

25.1. Working with binary data by using ByteString

Listing 25.1. ByteString defined by using the OverloadedStrings extension

Listing 25.2. Trying to unpack a ByteString into a String causes an error

25.2. Glitching JPEGs

Figure 25.2. The target of your glitching is the lovecraft.jpg image.

Listing 25.3. Basic layout for your glitcher.hs file

25.2.1. Inserting random bytes

Listing 25.4. intToChar creates a valid byte from an Int

Listing 25.5. intToBC takes an Int and gives you a single-character ByteString

Listing 25.6. replaceByte removes a byte and replaces it with a new one

Listing 25.7. randomReplaceByte applies random numbers to replaceByte

Figure 25.3. The underwhelming effect of changing a single byte

25.2.2. Sorting random bytes

Listing 25.8. sortSection sorts a section of bytes in your file

Listing 25.9. Randomizing your sortSection by using an IO action

Listing 25.10. Your main revised to use randomSortSection to glitch your file

Figure 25.4. A much more interesting result, achieved with randomSortSection

25.2.3. Chaining together IO actions with foldM

Listing 25.11. A cumbersome approach to applying multiple actions

Listing 25.12. An improved way to use multiple actions with foldM

Figure 25.5. Now your beloved author looks more like a resident of Innsmouth!

25.3. ByteStrings, Char8, and Unicode

Listing 25.13. Creating a Unicode BC.ByteString

Listing 25.14. Same Unicode example, properly represented as Text

Listing 25.15. Attempting to transform Text into a ByteString

Listing 25.16. Converting between Text and ByteString with de/encodeUtf8

Summary

Table of Contents for
Lesson 25. Working with binary data

Listing 25.1. `ByteString` defined by using the `OverloadedStrings` extension

Listing 25.2. Trying to unpack a `ByteString` into a `String` causes an error

Listing 25.4. `intToChar` creates a valid byte from an `Int`

Listing 25.5. `intToBC` takes an `Int` and gives you a single-character `ByteString`

Listing 25.6. `replaceByte` removes a byte and replaces it with a new one

Listing 25.7. `randomReplaceByte` applies random numbers to `replaceByte`

Listing 25.8. `sortSection` sorts a section of bytes in your file

Listing 25.9. Randomizing your `sortSection` by using an `IO` action

Listing 25.10. Your `main` revised to use `randomSortSection` to glitch your file

Figure 25.4. A much more interesting result, achieved with `randomSortSection`

Listing 25.12. An improved way to use multiple actions with `foldM`

Listing 25.13. Creating a Unicode `BC.ByteString`

Listing 25.14. Same Unicode example, properly represented as `Text`

Listing 25.15. Attempting to transform `Text` into a `ByteString`

Listing 25.16. Converting between `Text` and `ByteString` with `de/encodeUtf8`