Extracting metadata from web images

In this section, we are going to build a script to connect to a Website, download all the images on the site, and then check them for Exif metadata.

For this task, we are using the urllib module from python3 that provides parse and request packages:

https://docs.python.org/3.0/library/urllib.parse.html

https://docs.python.org/3.0/library/urllib.request.html

You can find the following code in the exif_images_web_page.py file in the exiftags folder.

This script contains the methods for find images in a website with BeautifulSoup and the lxml parser, and download images in an images folder:

def findImages(url):
    print('[+] Finding images on ' + url)
    urlContent = requests.get(url).text
    soup = BeautifulSoup(urlContent,'lxml')
    imgTags = soup.findAll('img')
    return imgTags

def downloadImage(imgTag):
    try:
        print('[+] Dowloading in images directory...'+imgTag['src'])
        imgSrc = imgTag['src']
        imgContent = urlopen(imgSrc).read()
        imgFileName = basename(urlsplit(imgSrc)[2])
        imgFile = open('images/'+imgFileName, 'wb')
        imgFile.write(imgContent)
        imgFile.close()
        return imgFileName
    except Exception as e:
        print(e)
        return ''

This is the function that extract metadata from images inside the images directory:

def printMetadata():
    print("Extracting metadata from images in images directory.........")
    for dirpath, dirnames, files in os.walk("images"):
    for name in files:
        print("[+] Metadata for file: %s " %(dirpath+os.path.sep+name))
            try:
                exifData = {}
                exif = get_exif_metadata(dirpath+os.path.sep+name)
                for metadata in exif:
                print("Metadata: %s - Value: %s " %(metadata, exif[metadata]))
            except:
                import sys, traceback
                traceback.print_exc(file=sys.stdout)

This is our main method that gets a url from parameter and calls the findImages(url), downloadImage(imgTags), and printMetadata() methods:

def main():
    parser = optparse.OptionParser('-url <target url>')
    parser.add_option('-u', dest='url', type='string', help='specify url address')
    (options, args) = parser.parse_args()
    url = options.url
    if url == None:
        print(parser.usage)
        exit(0)
    else:#find and download images and extract metadata
        imgTags = findImages(url)
        print(imgTags)
        for imgTag in imgTags:
            imgFileName = downloadImage(imgTag)
        printMetadata()

Table of Contents for Extracting metadata from web images

Create new playlist

Sign In

Sign Up

Table of Contents for
Extracting metadata from web images