After reading lesson 25, you’ll be able to
In this lesson, you’ll learn about working with binary file data by using Haskell’s ByteString type. ByteString allows you to treat raw binary data as though it were a regular string. To demonstrate the use of ByteString, you’ll focus on a fun project requiring you to manipulate binary file data. You’ll create a simple command-line tool that will allow you to create glitch art, like that in figure 25.1.
Glitch art is the practice of deliberately corrupting binary data in order to create visual artifacts in an image or video. You’ll work on the relatively simple task of “glitching” JPEG images. You’ll also take a look at some of the issues around working with binary Unicode data.
You have the name of the Japanese author Tatsuhiko Takimoto represented in Japanese Kanji using T.Text:
tatsuhikoTakimoto :: T.Text tatsuhikoTakimoto = ""
You need to know the number of bytes in this text. For ASCII text, this would be the length of the text, but in this case, using T.length gives you only the number of characters (5). How can you find the number of bytes?
So far in the unit, you’ve only looked at working with text in files. You started working with the basic String type and then learned that Text is a better type for working with textual data. Another important type that’s similar to String and Text is called ByteString. The interesting thing about ByteString is that it’s not specifically for text, as the name String might imply. ByteString is an efficient way to deal with any streams of binary data. Like Data.Text, you almost always import Data.ByteString with a qualified import by using a single letter:
import qualified Data.ByteString as B
Even though ByteString is an array of bytes and not a type of text, you can always use ASCII to represent strings of bytes. There are 256, or 28 (8 bits) ASCII characters, so every possible byte can be represented as an ASCII character. As long as you’re using the OverloadedStrings extension, you can use literal ASCII strings to represent vectors of bytes.
sampleBytes :: B.ByteString sampleBytes = "Hello!"
But you quickly run into a problem if you try to convert your ByteString into an ordinary String by using B.unpack. The following code will throw an error.
sampleString :: String sampleString = B.unpack sampleBytes
As you can see by the type signature, B.unpack attempts to convert the ByteString into a list of bytes (of type Word8):
B.unpack :: B.ByteString -> [GHC.Word.Word8]
By default, Data.ByteString doesn’t allow you to treat bytes just like Char, so instead you use Data.ByteString.Char8. The Char8 stands for 8-bit Chars (ASCII characters). You need to import Char8 separately, and usually use the qualifier BC:
import qualified Data.ByteString.Char8 as BC
You can see the difference between plain ByteString and ByteString.Char8 by looking at the types of their unpack functions:
B.unpack :: BC.ByteString -> [GHC.Word.Word8] BC.unpack :: BC.ByteString -> [Char]
You can see that ByteString.Char8’s unpack works just like Data.Text’s unpack. ByteString.Char8 allows you to use the same core functions for working with text as Data.Text does. The careful reader will also notice that the type signature of B.unpack has changed! B.unpack now uses the ByteString representation from ByteString.Char8. This means you’re free to treat your ByteStrings as plain ASCII text from here on out.
Like Text, ByteString shares a common API with String. As you’ll see in the next section, you can use all of the same functions you would when working with Text and String with binary data. This makes it easy to reason about efficiently stored binary data just as you would a normal list.
Write a function that takes numbers in ASCII character form and converts them to Ints. For example, make the following an Int:
bcInt :: BC.ByteString bcInt = "6"
bcInt :: BC.ByteString bcInt = "6" bcToInt :: BC.ByteString -> Int bcToInt = read . BC.unpack
Now that you’ve covered the basic use of ByteString, let’s dive into creating glitch art. All the code for your program will be put in a file called glitcher.hs. You’ll be working with an image that can be downloaded from Wikipedia (https://en.wikipedia.org/wiki/H._P._Lovecraft#/media/File:H._P._Lovecraft,_June_1934.jpg). For this exercise, you’ll name this file lovecraft.jpg, shown in figure 25.2.
To get started with your program, let’s see how to create the basic functionality you want for reading and writing your image. Here’s the basic structure of the program:
You’ll use both Data.ByteString and Data.ByteString.Char8 in the program to work with the image’s binary data. Because you’re working with binary data, you want to read in your file by using BC.readFile. Here’s the basic outline of your program without any of the glitching code.
import System.Environment import qualified Data.ByteString as B import qualified Data.ByteString.Char8 as BC main :: IO () main = do args <- getArgs 1 let fileName = head args 2 imageFile <- BC.readFile fileName 3 glitched <- return imageFile 4 let glitchedFileName = mconcat ["glitched_",fileName] 5 BC.writeFile glitchedFileName glitched 6 print "all done"
With this bit of code, you can run your program, pass in a file, and get a new file that should be your glitched JPEG. The only thing missing is the code to glitch your image!
At this point, the glitched variable in your main doesn’t need to be an IO type. Change that line so that glitched is a regular variable.
Part of the aesthetic of glitch art is to try different approaches to corrupting the data and see what works. You’ll start with replacing a random byte in the file with another byte you pick at random. Creating a random number requires an IO action. But it’s always best to separate as much code from IO actions as you possibly can, because your non-I/O code is pure and predictable. You can also easily test your code by loading it into GHCi and trying it out on a range of data samples.
Before you make your IO action, you’ll start with a function that will convert an Int to a Char. Because Char is a member of Enum, you can use toEnum. You could use toEnum by itself, but if you do, you have no way to enforce the constraint that your Char is between 0 and 255. To enforce this, you’ll take modulo 255 of the Int you pass to toEnum. You’ll put all this together in an intToChar function.
intToChar :: Int -> Char intToChar int = toEnum safeInt where safeInt = int `mod` 255
Next you need a function that will convert this Char into a ByteString. You can do this by using BC.pack to take your Char and make it a BC.ByteString. Because BC.pack requires a string, you need to put your Char inside a list.
intToBC :: Int -> BC.ByteString intToBC int = BC.pack [intToChar int]
Now that you have a way to make an Int into a single byte represented as a BC.ByteString, you can write the code to replace a byte with this value. You still don’t need to use IO actions yet.
Your replaceByte function is the deterministic version of your random goal. This function will take the location of the byte to be replaced, the Int value of the new Char/Byte to go there, and the bytes of the image file. You’ll use BC.splitAt to split your byte around the target byte. BC.splitAt will give you a pair of values representing the first part of the data and then the rest (just like calling take and drop at the same time). Then you’ll drop one from the rest of the bytes to make room for your new byte. Finally, you’ll concatenate the new byte in the middle of these two sections.
replaceByte :: Int -> Int -> BC.ByteString -> BC.ByteString replaceByte loc charVal bytes = mconcat [before,newChar,after] 1 where (before,rest) = BC.splitAt loc bytes 2 after = BC.drop 1 rest 3 newChar = intToBC charVal 4
Now you’re ready for your IO action. You’ll be using randomRIO from System.Random. randomRIO will take a pair of values in a tuple and randomly give you a number in that range. Your IO action will be named randomReplaceByte. All randomReplaceByte needs to do is pick two random numbers: one for the Char, and one for the location.
randomReplaceByte :: BC.ByteString -> IO BC.ByteString randomReplaceByte bytes = do let bytesLength = BC.length bytes location <- randomRIO (1,bytesLength) charVal <- randomRIO (0,255) return (replaceByte location charVal bytes)
Now you can use this IO action in your main to modify your image file:
main :: IO () main = do args <- getArgs let fileName = head args imageFile <- BC.readFile fileName glitched <- randomReplaceByte imageFile let glitchedFileName = mconcat ["glitched_",fileName] BC.writeFile glitchedFileName glitched print "all done"
You can compile your program and run it at the command line:
$ ghc glitcher.hs $ ./glitcher lovecraft.jpg
These results are okay but not as dramatic as you’d hoped, as shown in figure 25.3.
Let’s try something a bit more sophisticated to see if you can get better results.
randomChar :: IO Char randomChar = do randomInt <- randomRIO (0,255) -- could also use max and min bound return (toEnum randomInt)
Another common technique for image glitching is to take a subsection of bytes and sort them. You can achieve this by splitting your ByteString at a point by using BC.splitAt, and then splitting the second half of this into a chunk of a fixed size; you sort the chunk and then put it all back together with mconcat. Here’s your sortSection function, which takes a starting point of the section, a size of the section, and the byte stream.
sortSection :: Int -> Int -> BC.ByteString -> BC.ByteString sortSection start size bytes = mconcat [before,changed,after] where (before,rest) = BC.splitAt start bytes (target,after) = BC.splitAt size rest changed = BC.reverse (BC.sort target)
All you need to use this in your main to create an IO action that picks a random starting point.
randomSortSection :: BC.ByteString -> IO BC.ByteString randomSortSection bytes = do let sectionSize = 25 1 let bytesLength = BC.length bytes start <- randomRIO (0,bytesLength - sectionSize) 2 return (sortSection start sectionSize bytes)
You can replace randomReplaceByte with randomSortSection and try a revised approach.
main :: IO () main = do args <- getArgs let fileName = head args imageFile <- BC.readFile fileName glitched <- randomSortSection imageFile let glitchedFileName = mconcat ["glitched_",fileName] BC.writeFile glitchedFileName glitched print "all done"
With this trick, you get much more interesting results, as you can see in figure 25.4.
But you could probably do better if you could combine these approaches!
Suppose you want to use randomSortSection twice on your data and randomReplaceByte three times. You could rewrite your main like this.
main :: IO () main = do args <- getArgs let fileName = head args imageFile <- BC.readFile fileName glitched1 <- randomReplaceByte imageFile glitched2 <- randomSortSection glitched1 glitched3 <- randomReplaceByte glitched2 glitched4 <- randomSortSection glitched3 glitched5 <- randomReplaceByte glitched4 let glitchedFileName = mconcat ["glitched_",fileName] BC.writeFile glitchedFileName glitched5 print "all done"
This works, but it’s clearly cumbersome to write code this way, and it’s easy to make a simple typo with all the names you have to keep track of. Instead, you can use foldM from Control.Monad. Just as mapM generalizes map to monads (at this point, just code using do-notation), foldM does the same for folding. With foldM, you can take your original imageFile as the initial values, and then a list of IO actions that will transform your file. The only thing missing is a function that will apply these functions. In this case, you can use a simple lambda. Here’s your main rewritten by using foldM.
main :: IO () main = do args <- getArgs let fileName = head args imageFile <- BC.readFile fileName glitched <- foldM (ytes func -> func bytes) imageFile [randomReplaceByte ,randomSortSection ,randomReplaceByte ,randomSortSection ,randomReplaceByte] let glitchedFileName = mconcat ["glitched_",fileName] BC.writeFile glitchedFileName glitched print "all done"
Now you can compile your program one last time and see what kind of glitches you can make! Figure 25.5 shows an example.
There’s still probably more you could do to make this image even more interesting, but now you have a setup that allows you to easily chain together whatever strange mutations you can come up with.
Create a variable glitchActions outside your main that includes all your actions in a list. Don’t forget to give it the correct type.
glitchActions :: [BC.ByteString -> IO BC.ByteString] glitchActions = [randomReplaceByte ,randomSortSection ,randomReplaceByte ,randomSortSection ,randomReplaceByte]
As you’ve seen in our glitch art example, ByteString.Char8 is a helpful tool for treating binary data as though it were text. But it’s important to be careful when using ByteString, ByteString.Char8, and Unicode data. Here’s an example of setting a BC.ByteString to a Unicode string (for this Unicode, you’re using the Devanagari script for the famous philosopher Nagarjuna).
nagarjunaBC :: BC.ByteString nagarjunaBC = ""
If you load this into GHCi, you see that the Unicode isn’t preserved:
GHCi> nagarjunaBC "(>ETB0MFSA("
This isn’t too surprising, as Char8 ByteStrings are only for ASCII. But you may want to transform text to just bytes for a variety of reasons, the primary one being writing Unicode to a file as ByteStrings. Suppose you have your Unicode safely represented as the Text type.
nagarjunaText :: T.Text nagarjunaText = ""
To convert nagarjunaText to a vector of bytes, you can’t simply use BC.pack because BC.pack is of type String -> ByteString, so first you’d need to use T.unpack and then BC.pack.
nagarjunaB :: B.ByteString nagarjunaB = (BC.pack . T.unpack) nagarjunaText
If you look at the type signature, you should have your Unicode safely represented as bytes. But if you convert back, you see this isn’t the case. Note that you need to do a qualified import of Data.Text.IO to ensure that you’re printing the text properly:
GHCi> TIO.putStrLn ((T.pack . BC.unpack) nagarjunaB) "(>ETB0MFSA("
You’re stuck with the same problem! If you had written nagarjunaB to a file, you’d have ultimately lost your Unicode. What you need is a way to convert Text directly to a B.ByteString and not a BC.ByteString along the way. For this, you use Data.Text.Encoding, and you’ll do another qualified import:
import qualified Data.Text.Encoding as E
This module contains two essential functions that allow you to perform this direct transformation:
E.encodeUtf8 :: T.Text -> BC.ByteString E.decodeUtf8 :: BC.ByteString -> T.Text
Now you can safely convert Unicode text to raw bytes and back again.
nagarjunaSafe :: B.ByteString nagarjunaSafe = E.encodeUtf8 nagarjunaText GHCi> TIO.putStrLn (E.decodeUtf8 nagarjunaSafe)
To be safe, never use the convenience of Data.ByteString.Char8 if you’re working with data that may contain Unicode. If you’re working with purely binary data, as in the case of this lesson’s example, the combination of regular ByteStrings and Char8 works great. For anything else, stick to ByteString, Text, and Text.Encoding. In this unit’s capstone, you’ll see an extended example of this latter case.
In this lesson, our objective was to teach you about writing binary data in Haskell. The ByteString type allows you to treat raw binary data similarly to the way you would ordinary strings. This can greatly simplify how you write programs editing binary data. But it’s essential to remember not to mix single-byte representations of binary data (Char8) with Unicode text. Let’s see if you got this.
Write a program that reads in a text file and outputs the difference between the number of characters in the file and the number of bytes in the file.
Add another glitching technique, randomReverseBytes, that randomly reverses a section of bytes in your data.
18.220.252.227