Exploring image data

Let's begin by looking at the number of images included with each story. We'll run a value count and then plot the numbers:

dfc['img_count'].value_counts().to_frame('count') 

This should display an output similar to the following:

Now, let's plot that same information:

fig, ax = plt.subplots(figsize=(8,6)) 
y = dfc['img_count'].value_counts().sort_index() 
x = y.sort_index().index 
plt.bar(x, y, color='k', align='center') 
plt.title('Image Count Frequency', fontsize=16, y=1.01) 
ax.set_xlim(-.5,5.5) 
ax.set_ylabel('Count') 
ax.set_xlabel('Number of Images') 

This code generates the following output:

Already, I'm surprised by the numbers. The vast majority of stories have five pictures in them, while those stories that have either one or no pictures at all are quite rare.

Hence, we can see that people tend to share content with lots of images. Now, let's take a look at the most common colors in those images:

mci = dfc['main_hex'].value_counts().to_frame('count') 
 
mci 

This code generates the following output:

I don't know about you, but this isn't extremely helpful given that I don't see hex values as colors. We can, however, use a new feature in pandas called conditional formatting to help us out:

mci['color'] = ' ' 
 
def color_cells(x): 
    return 'background-color: ' + x.index 
 
mci.style.apply(color_cells, subset=['color'], axis=0) 
 
mci 

The preceding code generates the following output:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.89.2