152 ◾ Simple Statistical Methods for Software Engineering
A brilliant case in point is the video signature of Liu et al. [5]:
e explosive growth of information technology and digital content
industry stimulates various video applications over the Internet. Duplicate
detection and measurement is essential to identify the excessive content
duplication. ere are approximately two or three duplicate videos among
the ten results on the first web page. Finding visually similar content is
the central theme in the area of content-based image retrieval; histogram
distributions of similar videos are with much likeness, while the dissimilar
ones are completely different. e video histogram is used to represent the
distributions of videos’ feature vectors in the feature space. is approach
is both efficient and effective for web video duplicate detection.
Histogram Shapes
Histograms are empirical distributions (or density functions). ey can be smoothed
by nonparametric methods, as is performed in machine intelligence algorithms.
Alternatively, they can be fitted to mathematical models.
BOX 10.3 DETECTING BRAIN TUMOR WITH HISTOGRAM
Brain cancer can be counted among the most deadly and intractable diseases.
Tumors may be embedded in regions of the brain forming more tumors too
small to detect using conventional imaging techniques. Malignant tumors
are typically called brain cancer. ese tumors can spread outside of the brain.
Brain tumor detection is a serious issue in medical science. Imaging plays a
central role in the diagnosis and treatment planning of a brain tumor.
e image of the brain is acquired through MRI technique. If the histo-
grams of the images corresponding to the two halves of the brain are plotted,
a symmetry between the two histograms should be observed due to the sym-
metrical nature of the brain along its central axis. On the other hand, if any
asymmetry is observed, the presence of the tumor is detected. After detection
of the presence of the tumor, thresholding can be done for segmentation of
the image. e differences of the two histograms are plotted and the peak of
the difference is chosen as the threshold point. Using this threshold point, the
whole image is converted into a binary image providing the boundary of the
tumor. e binary image is now cropped along the contour of the tumor to
calculate the physical dimension of the tumor. e whole of the work has
been implemented using MATLAB
®
2010. (Kowar and Yadav [6])