Frequency analysis

A very useful way to tell if a set of data is encrypted, encoded, or obfuscated is to analyze the frequency at which each character repeats inside the data. In a cleartext message, say a letter for example, the ASCII characters in the alphanumeric range (32 to 126) will have a much higher frequency than slashes or nonprintable characters, such as the Escape (27) or Delete (127) keys.

On the other hand, one would expect that an encrypted file would have a very similar frequency for every character from 0 to 255.

This can be tested by preparing a simple set of files to compare with. Let's compare a plaintext file as base with two other versions of that file: one obfuscated and the other encrypted. First create a plaintext file. Use dmesg to send the kernel messages to a file:

dmesg > /tmp/clear_text.txt  

You can also apply an obfuscation technique called rotation, which replaces one letter by another in a circular manner around the alphabet. We will use ROT13, rotating 13 places in the alphabet (that is, a will change to n, b will change to o, and so on). This can be done through programming or using sites such as http://www.rot13.com/:

Next, encrypt the cleartext file using the OpenSSL command-line utility with the AES-256 algorithm and CBC mode:

openssl aes-256-cbc -a -salt -in /tmp/clear_text.txt -out /tmp/encrypted_text.txt  

As you can see, OpenSSL's output is base64 encoded. You will need to take that into account when analyzing the results.

Now, how is a frequency analysis performed on those files? We will use Python and the Matplotlib (https://matplotlib.org/) library, preinstalled in Kali Linux, to represent graphically the character frequency for each file. The following script takes two command-line parameters, a file name and an indicator, if the file is base64 encoded (1 or 0), reads that file, and decodes it if necessary. Then, it counts the repetitions of each character in the ASCII space (0-255) and plots the character count:

import matplotlib.pyplot as plt 
import sys 
import base64 
 
if (len(sys.argv))<2: 
    print "Usage file_histogram.py <source_file> [1|0]" 
 
print "Reading " + sys.argv[1] + "... " 
s_file=open(sys.argv[1]) 
 
if sys.argv[2] == "1": 
    text=base64.b64decode(s_file.read()) 
else: 
    text=s_file.read() 
 
chars=[0]*256 
for line in text: 
    for c in line: 
        chars[ord(c)] = chars[ord(c)]+1 
 
s_file.close() 
p=plt.plot(chars) 
plt.show() 

When comparing the frequency of the plaintext (left) and ROT13 (right) files, you will see that there is no big difference—all characters are concentrated in the printable range:

On the other hand, when viewing the encrypted file's plot, the distribution is much more chaotic:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.96.146