In this section, we are going to build a script to connect to a Website, download all the images on the site, and then check them for Exif metadata.
For this task, we are using the urllib module from python3 that provides parse and request packages:
https://docs.python.org/3.0/library/urllib.parse.html
https://docs.python.org/3.0/library/urllib.request.html
You can find the following code in the exif_images_web_page.py file in the exiftags folder.
This script contains the methods for find images in a website with BeautifulSoup and the lxml parser, and download images in an images folder:
def findImages(url):
print('[+] Finding images on ' + url)
urlContent = requests.get(url).text
soup = BeautifulSoup(urlContent,'lxml')
imgTags = soup.findAll('img')
return imgTags
def downloadImage(imgTag):
try:
print('[+] Dowloading in images directory...'+imgTag['src'])
imgSrc = imgTag['src']
imgContent = urlopen(imgSrc).read()
imgFileName = basename(urlsplit(imgSrc)[2])
imgFile = open('images/'+imgFileName, 'wb')
imgFile.write(imgContent)
imgFile.close()
return imgFileName
except Exception as e:
print(e)
return ''
This is the function that extract metadata from images inside the images directory:
def printMetadata():
print("Extracting metadata from images in images directory.........")
for dirpath, dirnames, files in os.walk("images"):
for name in files:
print("[+] Metadata for file: %s " %(dirpath+os.path.sep+name))
try:
exifData = {}
exif = get_exif_metadata(dirpath+os.path.sep+name)
for metadata in exif:
print("Metadata: %s - Value: %s " %(metadata, exif[metadata]))
except:
import sys, traceback
traceback.print_exc(file=sys.stdout)
This is our main method that gets a url from parameter and calls the findImages(url), downloadImage(imgTags), and printMetadata() methods:
def main():
parser = optparse.OptionParser('-url <target url>')
parser.add_option('-u', dest='url', type='string', help='specify url address')
(options, args) = parser.parse_args()
url = options.url
if url == None:
print(parser.usage)
exit(0)
else:#find and download images and extract metadata
imgTags = findImages(url)
print(imgTags)
for imgTag in imgTags:
imgFileName = downloadImage(imgTag)
printMetadata()