Extracting metadata from web images

In this section, we are going to build a script to connect to a Website, download all the images on the site, and then check them for Exif metadata.

For this task, we are using the urllib module from python3 that provides parse and request packages:

https://docs.python.org/3.0/library/urllib.parse.html

https://docs.python.org/3.0/library/urllib.request.html

You can find the following code in the  exif_images_web_page.py file in the exiftags folder.

This script contains the methods for find images in a website with BeautifulSoup and the lxml parser, and download images in an images folder:

def findImages(url):
print('[+] Finding images on ' + url)
urlContent = requests.get(url).text
soup = BeautifulSoup(urlContent,'lxml')
imgTags = soup.findAll('img')
return imgTags

def downloadImage(imgTag):
try:
print('[+] Dowloading in images directory...'+imgTag['src'])
imgSrc = imgTag['src']
imgContent = urlopen(imgSrc).read()
imgFileName = basename(urlsplit(imgSrc)[2])
imgFile = open('images/'+imgFileName, 'wb')
imgFile.write(imgContent)
imgFile.close()
return imgFileName
except Exception as e:
print(e)
return ''

This is the function that extract metadata from images inside the images directory:

def printMetadata():
print("Extracting metadata from images in images directory.........")
for dirpath, dirnames, files in os.walk("images"):
for name in files:
print("[+] Metadata for file: %s " %(dirpath+os.path.sep+name))
try:
exifData = {}
exif = get_exif_metadata(dirpath+os.path.sep+name)
for metadata in exif:
print("Metadata: %s - Value: %s " %(metadata, exif[metadata]))
except:
import sys, traceback
traceback.print_exc(file=sys.stdout)

This is our main method that gets a url from parameter and calls the findImages(url)downloadImage(imgTags), and printMetadata() methods:

def main():
parser = optparse.OptionParser('-url <target url>')
parser.add_option('-u', dest='url', type='string', help='specify url address')
(options, args) = parser.parse_args()
url = options.url
if url == None:
print(parser.usage)
exit(0)
else:#find and download images and extract metadata
imgTags = findImages(url)
print(imgTags)
for imgTag in imgTags:
imgFileName = downloadImage(imgTag)
printMetadata()
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.74.66