Chapter 2. Multimedia Basics

Introduction

One of the first issues facing the application developer working with rich media data is the sheer size of the objects—image, audio, video, mapping data, and documents (in binary or XML format). The second problem is that the application may not need to store just one version of the media. For example, in the case of images we may need replica images of different sizes and quality for different purposes. Media data can be stored in many different file types—data formats—that meet various user requirements so decisions need to be made about replication and storage options. In this chapter we will start by looking at the characteristics of the rich media itself and how some of these characteristics are stored as metadata. Then we will look at the relationship between file format and compression using images as the main example.

The richness of information stored in a media file, such as an image, provides many more opportunities for change. If we were dealing with a text string such as

Yosemite National Park the potential changes are strictly limited:

  • Change style (e.g., font, bold, or italics).

  • Change text (e.g., uppercase, lowercase, or initial capitals).

Also we would not store these different formats because transforming the data can be achieved simply and efficiently by using standard SQL functions. But if we were dealing with an image from Yosemite, we would need to use a format with as small a size as possible for delivery and display on the Web but a much larger file of dense information for printing. In the next section we look at the differences between multimedia data and normal “database” data.

What Is Different about Multimedia Data?

There are several challenges to using multimedia data:

  • The first challenge is size—one colored image could require 24 Mb. In traditional SQL databases the data types used for storing data are of strictly limited size. The format might be very restricted as well (e.g., DATE). Even for string data the limit for VARCHAR2 is currently 32,768 bytes. This can cause difficulties for storing rich media within the database.

  • The second challenge is time—called “real time nature” (e.g., in video, the frame sequence). The components of a video have to be maintained in the right order even if they have been processed separately.

  • The third challenge is that the semantic nature of multimedia is much more complex. Data retrieval is much more complex with rich media. With traditional structured data, we can search for a specific value in a straightforward manner using SQL and retrieve exact matches or use BOOLEAN operators. With rich media there can be a great deal of information, but that information if described in words would be different for different people. This causes a great deal of confusion in retrieval and indexing rich media and means we may be looking at near matches instead of exact matches.

Despite these challenges electronic multimedia can offer the users a richer experience. Multimedia can engage the senses to inform, persuade, and entertain, but, used poorly, multimedia can annoy and confuse an audience. The reduction in hardware cost and the increase in hardware performance have enabled the assembly of large heterogeneous collections of raw image and video data. Initially, applications have been driving the development almost entirely so that although an application “works,” its functionality is not transferable to any other domain. Examples of this type of development are interactive games and WWW encyclopaedias.

Multimedia refers to the combination of two or more media. For the Web this would include:

  • Spoken word

  • Text

  • Music

  • Audio

  • Images

  • Video

  • Animation

Therefore, we will need to deal with a range of multimedia file formats:

  • Specific for different media types—image, audio, video

    • Audio such as file formats, .wav, .mp3

    • Video such as file formats, .avi, .mov, .mpg

  • Specific for different compressions algorithms

    • MPEG, JPEG, LZW

Multimedia Metadata

Metadata literally means data about data. It has always been an essential component of databases. For example, in a relational database there is metadata relating to every table that stores its structure in the data dictionary. There is information about every user and their access rights to data within the database. Therefore, metadata is any data that is required to interpret other data as meaningful information and it is an extremely important aspect of multimedia databases since it is used for retrieving and manipulating the data. It can be based on the interpretation of information held within the media, or alternatively, it can be based on the interpretation of multiple media and their relationships. In the case of multimedia, metadata deals with the content, structure, and semantics of the data. In a straightforward relational database we would have metadata about the table, associated views, and indexes. In a rich media database we would need metadata about each row of the table and certainly for every media object. For example, in Table 2.1 we can see an image and the metadata associated with it.

Table 2.1. Example of Metadata

 

Metadata

 
Example of Metadata

File Name

Graduation.jpg

File Format

JFIF

Compression

JPEG

MIME Type

Image/JPEG

Size

73,874

Image Height

360

Image Width

564

In the next sections we are going to look at digital data attributes in detail, such as the metadata shown in Table 2.1:

  • File format

  • Size

  • Compression

  • Ownership

  • Content

In terms of rich media databases we are virtually always dealing with the storage of digital data, although an image created with vector graphics consists of mathematical formulas or equations that represent lines and curves, but these would also be stored in a digital file. We will start by using image data as an example of the general features of rich media. We are going to look first at general image properties, then at image features and resolution. We will digress to look at the way the Internet has influenced image processing requirements before looking in more detail at compression. Then we will look at MIME types and metadata so that we can interpret the data shown in Table 2.1

Image Data

We will nearly always be storing image data for later use by humans so that human capabilities and preferences are going to have to be considered in the design of the system. Human vision is fast, high resolution, and detects a large range of colors. Images used in information and communication technology (ICT) are now displayed on a wide range of hardware, such as:

  • Computer monitors

  • LCD screens

  • Mobile phones

  • PDA

  • Printers

  • Data projectors

  • Plasma screens

  • Interactive TVs

These devices deal with digital images. However, many of the instruments that capture images are analog. These analog signals are continuously variable in their description of color, brightness, or loudness. Digital images result from a process called sampling, which results in the conversion of analog data into digital data. The resulting digital image, often called a raster graphic image, is a mosaiclike grid of picture elements known as pixels. In the process of sampling the image a measurement is taken at a given position that records the color and brightness of the image. A binary number holds the information for specifying a pixel.

History of the Pixel and Digital Images

The pixel first appeared in New Jersey in 1954 when mathematicians and engineers created the first computer graphic at Princeton’s Institute for Advanced Study.

This was the first instance of digital typography and required a computer the size of a Manhattan apartment to generate the digital image. This produced a primitive graphic—a small matrix of glowing vacuum tubes that was used to spell out letters in the first-ever instance of computer memory being mapped directly to dots in a display.

A number of technologies contributed to the development of digital images. Medical staff started working with images from X-ray machines from 1900 and ultrasound scanners were invented in the 1950s with the display of images on cathode ray tubes. At the same time printers were being used to display digital images. Peterson of Control Data Corporation (CDC) utilized a CDC 3200 computer and a “flying-spot” scanner to create a digital transposition/representation of da Vinci’s Mona Lisa in 1964. This can be seen at http://www.digitalmonalisa.com/. The production process took 14 hours to complete the image, which contained 100,000 pixels that were plotted using numerals, sometimes overprinted, to approximate the required density. In the thirtieth year of the pixel, the Macintosh arrived, the first commercially successful personal computer that used a graphical user interface that treated on-screen text as just another graphic.

Multimedia Data Acquisition

Digital media is created by a diverse range of processes, but there are common stages that are carried out, summarized as:

  • Capture the multimedia data (e.g., by exposing a photographic film).

  • Sample the analog signal at double the bandwidth and convert to a digital value (e.g., scanner).

  • Predict how much redundant data is present that can be removed.

  • Transform the raw data by compression to reduce size.

The more sampling that takes place the more information will be held about the image so that more storage space will be required for the image. Analog sources could be photographic film, prints, sketches, etc. The whole sampling process can be carried out by an analog-to-digital converter (ADC), which could be hardware or software based.

Look at a simple application such as the Family Picture Book described in chapter 3. Users may want to store images with as high a resolution as possible to ensure no details are lost, later the images may need to be displayed on a website so that an efficient compressed image is needed. If we want the image printed we will need a high-resolution image. In addition, media-specific applications have been developed that require data in their own format, such as Adobe Photoshop. The diversity of media devices—digital cameras, inkjet printers, etc.—has led to a minefield of compatibility problems.

Multimedia Data Transformation

Manipulating multimedia involves some operations that would never arise in traditional applications. Most of these operations can be summarized as concerned with

  • Manipulation (editing and modifying data)

  • Presentation

  • Analysis (indexing and searching)

Table 2.2 shows the range of operations supported by different media types in digital form grouped by the concepts of manipulation, presentation, and analysis. Many of these operations could in theory be carried out within the database as an alternative to retrieving the media data from the database and then manipulating it with specialist software. The operations shown in bold are currently supported by Oracle interMedia. The operations connected with presentations tend not to be supported for obvious reasons.

Table 2.2. Multimedia Operations

Text

Audio

Image

Animation

Video

Manipulation

    

Character manipulation

Sample manipulation

Geometric manipulation

Primitive editing

Frame manipulation

String manipulation

Waveform Manipulation

Pixel operations

Structural editing

Pixel operations

Editing

Audio editing

Filtering

  

Presentation

    

Formatting

Synchronization

Compositing

Synchronization

Synchronization

Encryption

Compression

Compression

Compression

Compression

    

Video effects

   

Rendering

Mixing

Sorting

Conversion

Conversion

 

Conversion

Analysis

    

Indexing

Indexing

Indexing

Indexing

Indexing

Searching

Searching

Searching

Searching

Searching

Multimedia data introduce different kinds of relationships between data. For example, the relationships between the data items may be both spatial and temporal. Temporal relationships describe

  • When an object should be presented.

  • How long an object is presented.

  • How one object presentation relates to others (audio with video).

Image data can be stored in many different file types known as formats. These formats allow the user to prepare and store the data in specific ways for future use in specialized applications. In Table 2.3 we can see a brief comparison of common file formats, some of which are described in more detail later. This information can help us decide which format should be the main vehicle for the media storage.

Table 2.3. File Formats

File Type

Data Compression

DTP Use

Internet Use

Layers

Saved Selections

Saved Paths

Adobe Photoshop (.psd)

X

X

X

GIF (.gif)

X

X

X

PNG (.png)

X

X

JPEG (.jpg)

X

X

X

JPEG2000

 

 

X

 

Photoshop EPS (.eps)

X

X

X

X

PICT (.pct)

X

X

X

X

X

TIFF (.tiff)

X

Here are three common reasons for wanting to change media

  • Resolution

  • Delivery

  • Compression

Resolution

Digital image resolution refers to the quantity of visible information and the number of colors present. The quality of a digital image is described by two independent measurements:

  • The pixel dimension (e.g., 640 × 480—number of pixels).

  • The color depth from 1 bit to 48 bits, but most devices operate with either 8 bits (grayscale of 256 tones ) or 24 bits (16 million colors).

Pixels are approximately square in shape and a grid of pixels is called a bitmap. Digital images are therefore bitmapped to square or rectangular shapes. The exact position of each pixel in the grid can be mapped using x(horizontal) and y (vertical) coordinates. This gives the specific address of each pixel. Black-and-white images, which are known as grayscale, are created from a limited palette of 256 tones that range from black (0) to white (255). These files are much smaller than red, green, and blue (RGB) images.

We have said that we may need to store images of different quality. Resolution is a way of assessing image quality but it can be measured in different ways by different technologies. One way of comparing this quality in digital images is pixel count resolution. The higher the resolution, the more pixels in the image. Higher resolution allows for more detail and subtle color transitions in an image. A printed image that has a low resolution may look pixelated or made up of small squares, with jagged edges and without smoothness. Image spatial resolution refers to the spacing of pixels in an image within a physical measurement range and is measured in pixels per inch (ppi), sometimes called dots per inch (dpi). Therefore it is not a property of the image itself and is variable—it only becomes fixed when the image takes physical form (e.g., when it is printed). Size versus resolution is often an issue of compromise. In Figure 2.1(a) it is possible to see the individual pixels that give the image a jagged appearance, while Figure 2.1(b) would be the normal resolution.

Resolution and pixels.

Figure 2.1. Resolution and pixels.

Monitors have a smaller dynamic range than printers and are not able to display the deepest tones. The term dynamic range refers to the variations that can exist between the brightest and darkest parts of an image. Analog devices—cameras with photographic film, human visual system—will produce much wider dynamic range than most digital scanners and display devices. The spatial resolution of a monitor is usually 72 to 75 dots per inch (dpi). The pixel dimension of a monitor is also often quoted as the number of horizontal pixels, x dimensions, versus the number of vertical pixels, y dimensions (e.g., 1152 × 870). LCD screens have less dynamic range than cathode ray tube (CRT) monitors. A letter-size, 300-dpi RGB image will be about 24 MB while a 200-dpi image will require about 10 MB of storage. The digital image file would contain three color values for every RGB pixel, or location, in the image grid of rows and columns. The data is also organized in the file in rows and columns. File formats vary, but the beginning of the file contains numbers specifying the number of rows and columns (which is the image size, like 800 × 600 pixels) and this is followed by huge strings of data representing the RGB color of every pixel.

Color Depth

The computer monitor is the oldest device for displaying digital images. Therefore, it was an important technology for setting the original standards and methods for digital images. The computer monitor is based on a flat cathode ray tube that is coated with phosphors that glow and produce light of different colors when hit by an electron beam. Three color phosphors are present, red, green, and blue (RGB). These are arranged in triads—clusters of three—with each triad corresponding to a pixel on the screen. The effect of the color is to change the pixel from the original black. Each pixel’s color sample has three numerical RGB components to represent the color. These three RGB components are three 8-bit numbers for each pixel. Three 8-bit bytes (one byte for each of RGB) are called 24-bit color. If all three values are equal, the result is white. 24-bit RGB color images use 3 bytes of storage, and can have 256 shades of red, 256 shades of green, and 256 shades of blue. This is 256 × 256 × 256 = 16.7 million possible combinations or colors for 24-bit RGB color images. Screen images are seen as transmitted light (radiating from the surface). This is completely different from a printed image that is reflected light from the surface. The visible color gamat, which is the range the eye can see, includes many more colors than the RGB gamut.

In the SI_StillImage set of objects described in Chapter 8 there is an object type, SI_Color, that encapsulates color values of an image in terms of its RGB values as integers in range 0 to 255.

Image Channels

Each separate pixel in the RGB color image derives its color from a combination of three separate values. The separate colors are often called channels, each with a brightness value from 0 to 255. At present the 24-bit RGB is the most common image type but 30-, 36-, and 48-bit color are available.

Using individual image channels with a large image file could be an important option if the main image is too large to be processed in memory. As pixel dimensions are set when a digital image is captured, enlarging or reducing an image means adding new pixels to the mosaic. This process is known as interpolation or resampling. It makes the pixel grid larger but does not add any detail. New pixels that are added are assigned a color value that is derived from their nearest neighbors. This can result in a loss of the sharpness of the image and produce a jagged-edge effect. Reducing an image is the opposite and means taking pixels away. Cropping is an operation that reduces image size and it is beneficial as it makes for efficient storage. In this way unnecessary background information can be removed from the image.

Contrast Correction

This is an operation that is used to manipulate digital images. We could simply brighten an image by adding an adjustment to every pixel. Gamma is a term used for the extent of contrast in the midtone gray areas of an image. In Chapter 8 we cover the way interMedia object types include operators to carry out this kind of image processing, including cropping and gamma correction.

Dithering

A problem arose with displaying images on the Web because the originator has no idea what hardware/software combination a user will be employing. There is a standard system for color reproduction on the Internet but Mac and PC operating systems use two different sets of 256 colors with only 216 common values. The 216 colors form the browser-safe palette. If there are unsafe colors in the image they can be converted to an approximate value by a process called dithering, which breaks up complex colors into a pattern of dots but can create disappointing results with bright colors.

Layers

Layers have become increasingly important for some media applications for animated and digital images. Layers are individual overlay cells in an animated sequence. Different source images can be floated over each other and combined. A designer may want to store an image in layers so that he can easily backtrack from a change and make finite alterations to the image until the desired effect is achieved. Some image software, such as Photoshop, permit the use of layers. Different source images can be floated over each other and set down as required. Digital image creators find it easier to maintain and manage images stored as layers. As well as original developers, third-party developers produce software plug-ins (see later) that add extra features and functions.

Delivery

Internet and ICT Applications

Most Web pages do not consist of a single file, but often contain embedded images and graphics in specific formats. The Web browser needs to know what to do with these specialist files if the Web page is to be displayed correctly. This may involve the use of a plug-in, which we can look upon as a helper application that can deal with files of a specified format. The plug-in tends to be deployed either

  • Incorporated within the browser so that the media data will be displayed in the same browser window (e.g., SVG graphic image), or

  • Called so that the media data is displayed in a separate window opened by the plug-in application with a separate set of controls (e.g., Real Player).

For example, IE5 supports GIF, PNG, and JPEG (see Table 2.3) while SVG, Acrobat, shockwave, flash graphics, and CGM require plug-ins. The current version of Firefox supports SVG without a plug-in. Currently WBMP is not that widely supported because it is a very limited file format.

If the digital data is then used in an ICT system, other processes are needed to send the media data as packets across a network, deal with packet loss, reorder packets, receive media data, restore redundant data, and display data using required output technology.

These processes may involve several transformations of the data. It is worth noting the bandwidth issue. We already appreciate that the accuracy of the digital representation of the analog original depends on the rate at which it is sampled. The term bandwidth originally meant the difference between the minimum and maximum frequencies in the analog signal. If the signal varies a lot then we need to take more samples—music is much more variable than human speech. Nyquist’s theorem tells us the signal must be sampled at double the bandwidth. Bandwidth will influence factors such as buffer size and real-time delay.

MIME Types

MIME types are very important in terms of any transmission of files across the Internet. MIME stands for multipurpose Internet mail extensions, a protocol which was developed in the early 1990s. It was devised to provide a way for specifying and describing the format of Internet messages. The original MIME Request for Comments (RFC) defined a message representation protocol that specified considerable detail about message headers, but which left the message content, or message body, as a flat ASCII text. This was then amended by later RFCs to specify the format of the message body to allow multipart textual and nontextual message bodies to be represented and exchanged without loss of information. In particular, this was designed to provide facilities to include multiple objects in a single message, to represent body text in character sets other than US-ASCII, to represent formatted multifont text messages, and to represent nontextual material such as images and audio fragments. Generally the intention was to facilitate later extensions for new types of Internet mail for use by cooperating mail agents. As explained previously, a browser will look for a plug-in to display the component files of a Web page, especially the rich media files. This is where the MIME type is vital as this information is provided by the HTTP protocol as illustrated for the following fragment of HTML:

<image width="600" height = "400" data ="mygraphic.svg"
type="image/svg+xml">
<img src=-"mygraphic.png" width ="600" height = "400"/>
</image>

The type element gives the MIME type of the document requested. The MIME type consists of a major type and a minor type. A particular MIME type is a pair of elements delimited by a slash (/). The first element describes the “type” of data, for example, text or image. Examples include, but are not limited to:

  • Application

  • Audio

  • Image

  • Text

  • Video

The second element describes the format of the type such as:

  • msword

  • GIF

  • JPEG

WWW server applications come configured to handle most MIME types.

The browser looks at the MIME type and decides what to do. If the browser supports the MIME type it can download the document file and process it. If it does not support the MIME type the browser can try to load the appropriate plug-in into memory and hand the data over to it. If the plug-in required in this instance is not available it can look for the other alternative file “mygraphic.png.” The file extension can also be used by the browser to deduce the MIME type. When a file is downloaded from a Web server the MIME type is downloaded automatically in the HTTP file. If the image file is located in the local file system the MIME type is not available. Plug-ins will have a number of file extensions registered, as shown in Table 2.4. The MIME standard was flexible enough that it could be easily incorporated into HTTP. Consequently, MIME provides the mechanism for seemlessly transferring nontext data easily to the browser.

Table 2.4. MIME Type Plug-ins

MIME Type

Extensions

Plug-in

application/asf

.asf

Media Player

application/pdf

.pdf

Acrobat

application/windowsmedia

.wma

Media Player

application/x-pn-realaudio

.ram, .rm, .rpm

RealPlayer

application/x-shockwaveflash

.spl, .swf

Flash

Audio/mp3

.mp3

Crescendo, Koan, Liquid MusicPlayer, Media Player

Audio/wav

.wav

Apple Quicktime, Beatnik, Liquid MusicPlayer, Media Player

image/bmp

.bmp

Innomage, Prizm

image/jpeg

.jpg, .jpeg

Innomage, Prizm, RapidVue

image/gif

.gif

Innomage, Prizm

image/png

.png, .ptng

Apple Quicktime, Innomage

image/tiff

.tif, .tiff

Apple Quicktime, Innomage, Prizm, RapidVue

video/avi

.avi

Apple Quicktime, Media Player, NET TOOB Stream

video/mpeg

.mpe, .mpeg

Apple Quicktime, Media Player, NET TOOB Stream

video/x-pn-realvideo

.ram, .rm, .rpm

RealPlayer

x-world/x-svr

.svr, .vrt, .xvr

Superscape e-Visualizer

x-world/x-vrt

.svr, .vrt, .xvr

Superscape e-Visualizer

audio/x-pn-RealAudio

.ram

Real Player

video/mpg

.mpg

Real Player, Media Player

video/mp4

.mp4

Real Player, Media Player

The format indicates which plug-in is required for the registered format type.

When the browser (or e-mail application) receives a MIME type in the form application/msword it is expected to know what application to run (Microsoft Word) and handle it accordingly. Similarly, when your browser receives data of type text/html (the most common WWW MIME type), then the browser knows to interpret the incoming data as HTML and display it accordingly.

Compression

The most significant media transformations are those involving compression. The main issues associated with compression can be summarized as:

  • Remove any redundant data in the media, particularly before transmission.

  • Lossless compression can be restored.

  • Lossy compression will permanently lose data.

  • Lossy compression will reduce media size more than lossless compression.

Computer engineers are always trying to develop better compression methods but these methods may have different objectives, for example:

  • Reduce bandwidth

  • Ability to restore to original

  • Robustness

  • Scalability

  • Extensibility

Compression allows the image to be reduced in size for efficient transport and storage. JPEG, TIFF, and GIF files can be used to reduce storage size. There are hundreds of different ways of compressing images, called compression algorithms, which have been developed by computer scientists. Compression methods for video and audio are performed by codecs. The compression ratio is the number of bytes in the original image compared with the number of bytes after compression. The first stage is to estimate how much redundant data exists that could be removed in this way and select the best compression algorithm. Although there are many different ways in which this can be achieved the principles are the same—removing information that is duplicated and abbreviating information whenever possible. For example, one way of compressing an image is by abbreviating any repeated information in the image and eliminating information that is difficult for the human eye to see. MP3 compression, for example, does not keep information about sound outside of what is audible by humans. When the digital data is restored this is called decompression. If it can be decompressed in such a way that none of the original information is lost, this is known as lossless compression. However, if some information is lost by the process, it is known as lossy compression.

Even lossy compression can be applied to images of text documents without any significant loss and may result in a much smaller binary object. However, when the object cannot be restored to the original, lossy compression may not be appropriate for some applications such as medical imaging. Lossless compression techniques typically result in ratios of 2:1 or 3:1 in medical images. Lossy compression can accomplish 10:1 to 80:1 compression ratios. In the case of text, this could be achieved by building up a dictionary or look-up table where frequently used words were mapped to a symbol. The file would then be coded so that its size would be very much reduced. This is called the dictionary method, which can also be applied to image and audio files. For example, a group of pixels with the same color values can be replaced by a single symbol as shown in Table 2.5, which demonstrates run-length encoding. Fortunately we will not be concerned with the details of how this is achieved. For image, audio, and video there are devices that encode, decode, compress, and decompress all together, called codecs (coder/decoders). However, we will be concerned with the results of the process. Run-length encoding is based on long runs of identical symbols. For example, LZW compression would be more suitable for data files with similar sequences (e.g., graphics) while Huffman coding would suit data sets where some symbols have higher frequencies than others (e.g., text). Most images will have sequential pixel patterns that occur frequently throughout the file, which is particularly suitable for LZW compression.

Table 2.5. Example of Run-length Encoding

Original Pixels

Compressed Pixels

2211333334333222 (16 characters)

22 21 53 14 33 32 (i.e., two 2s, two 1s, five 3s, etc.) (12 characters)

The amount of abbreviation can be significant. With voice coding, a codec system can just send differences in adjacent speech samples, reducing the size by 50%. This can be very effective because even if information were lost, humans would interpolate speech to make gaps comprehensible. However, because musical input is much more varied than speech this can be more of a problem as significant loss of information can be very noticeable. TIFF files are generally better at preserving information than JPEG format. To some extent there is a minefield of incompatibility at present with many applications that require specific format and compression. Therefore, there is a good reason for incorporating common software codecs within the Database Management System (DBMS).

Let’s look at some common file types and contrast the way they deal with compression and MIME type.

Computer graphics metafile (CGM) is the main vector drawing format available before SVG. It is an ISO standard for transmitting 2D-vector graphics data, widely used in engineering. Vector graphics images have a number of advantages for Web use. They are smaller and download faster. They can be interacted with and zoomed without losing quality unlike images. The main problem is that there are many different implementation profiles. One Web profile has been agreed WebCGM based on the original profile for Air Transport. interMedia does not have codecs for this format.

Graphic interchange format (GIF) can display 255 colors for Web use and is particularly useful as separate frames can be stored together to provide a simple animation. These files are produced with lossless compression processes. The format was based on LZW compression, which was subject to a patent held by Unisys. Creating a GIF encoder required the payment of a royalty fee. Therefore, although this is a very popular format, others have been developed that are free from patents. The image consists of a set of LZW compressed blocks. GIF images can be interlaced; this means that the GIF image need not be transmitted row by row in order but instead rows of the image are output so that the image is gradually built up on the screen. However, for images less than 10 Kbytes, interlacing will not improve performance. There are two specific variants of the GIF format, called 87a and 89a; interMedia reads both variants but writes the 87a variant. All GIF images are compressed using a GIF-specific LZW compression scheme, which interMedia calls GIFLZW. interMedia can read GIF images that include animations and transparency effects but only the first frame of an animated GIF image is made available, and there is no support for writing animated GIF images. The MIME type is image/gif.

Encapsulated postscript (EPS) image files are specialized for desktop publishing applications and can be used to create irregular shapes or cutouts. A related file format is DCS, desktop color separation. interMedia does not have codecs for this format.

Joint Photographic Experts Group (JPEG) created ISO standard in 1993. It was designed to transmit continuous-tone still images efficiently. JPEG users can trade quality against compression ratio and the standard includes 29 different coding methods to achieve compression. JPEG is based on a 24-bit palette. It can be used to compress photographic images without causing them to posterize, a distortion that tends to make the image look less natural (i.e., more like a poster). JPEG uses a lossy compression process that assigns a compromise color to a block of 9 × 9 pixels. This produces a deterioration in image quality that is most noticeable in images with smooth gradient areas. Sequential storage processes can actually cause further deterioration. The main compression method is based on using a transform, which means the inverse process does not result in the same image. A process called quantization reduces the accuracy of some terms, reducing the information that needs to be transmitted. However, quantization can cause errors in the image, such as 2–3%, which in a photograph would not be noticed by the average viewer. The high compression ratios are achieved by the transform function picking out major features in the image distinct from the noise, and the quantizer reducing the accuracy of the less important values. This results in an image that is similar but not identical with the original. Hard edges are likely to become fuzzy. For these reasons PNG is a viable alternative. The JPEG-compression format is very complex, but most images belong to a class called “baseline JPEG,” which is a much simpler subset. Oracle interMedia supports only baseline JPEG compression. The MIME type is image/jpg.

JPEG-Progressive is a variation of the JPEG-compression format in which image scanlines are interlaced, or stored in several passes, all of which must be decoded to compute the complete image. This variant is intended to be used in low-bandwidth environments where users can watch the image take form as intermediate passes are decoded, and terminate the image display if desired. While the low-bandwidth requirement is not typically relevant anymore, this variant sometimes results in a smaller encoded image and is still popular. Oracle interMedia provides read, but not write, support for this encoding.

JPEG2000 is a new version of JPEG intended to provide an image coding system using state-of-the-art compression based on wavelets, especially for digital cameras, mobile phones, and medical imaging. It is ideal for processes where the main characteristics of the image are transmitted followed by successive refinements. This can appear similar to interlacing, but in this case fundamental parts of the image can be transformed and transmitted. A low-resolution image can be delivered to the user first, followed by successively higher resolution images. JPEG2000 also allows the user to add encrypted copyright information. Though JPEG2000 is a new format that is not yet widely supported by image-editing and Web-browsing applications, it can already be of major benefit to photographers who learn to incorporate it into their workflow routines. JPEG2000 offers

  • Completely lossless image compression.

  • Transparency preservation in images.

  • Use of masks (alpha channels) to specify an area of the image that should be saved at a lower rate of data compression (loss of image information) than other areas of the image that are of less interest to the viewer.

  • EXIF (exchangeable image file format) data preservation in images. EXIF is a standard for storing interchange information in image files, especially those using JPEG compression. Most digital cameras now use the EXIF format to support interoperability.

  • User options as to the size, quality, and number of image-preview thumbnails on a website.

PICT is Apple’s own high-quality compressed image file. The Macintosh PICT format was developed by Apple Computer, Inc., as part of the QuickDraw toolkit built into the Macintosh ROM. It provides the ability to “record” and “playback” QuickDraw sequences, including both vector and raster graphics painting. Oracle interMedia supports only the raster elements of PICT files. Both Packbits and JPEG-compressed PICT images are supported. The MIME type is image/pict.

The portable network graphics (PNG) format was the result of an industry-based working group who was seeking an alternative to GIF that would not be bound by patents. It is aimed at the transmission of computer graphics images but improves GIF, including better support for color, transparency, and interlacing. This combines the sharpness of GIF with the subtle color reproduction of JPEG with lossless compression of natural images. File sizes are larger than JPEG for 24-bit and 8-bit but the PNG process involves massaging the image to achieve as much compression as possible. PNG refers to samples of a particular type (e.g., green as a channel). PNG images can be used for progressive display as the original image is converted to a sequence of smaller images so that the first image in the sequence defines a coarse view until the last image completes the original source image. The set of reduced images is called an interlaced PNG image. This allows the image to be seen more quickly but may make it more difficult to compress. PNGF is the Oracle interMedia designation for the portable network graphics (PNG) format. All PNG images are compressed using the DEFLATE scheme. The MIME type is image/png.

PSD, PSP, etc. are proprietary formats used by graphics programs. Adobe Photoshop’s files have the PSD extension, while PaintShop Pro files use PSP. These are the preferred working formats since these are used to edit images in the software, because only the proprietary formats retain all the editing power of the programs. These packages use layers, for example, to build complex images, and layer information may be lost in the nonproprietary formats such as TIFF and JPEG. However, these are not wise choices for long-time storage and you may not be able to view the image in a few years when the software has changed. Therefore, these should be saved as standard TIFF or JPEG.

RAW is an image output option available on some digital cameras. RAW images are extremely high-quality images that are not degraded by compression algorithms when recorded. However, they are not currently supported by most image-editing programs in their native format, so they must be converted before use. Though lossless, it is a factor of three of four smaller than TIFF files of the same image. Even though the TIFF file only retains 8 bits/channels of information, it will take up twice the storage space because it has three 8-bit color channels versus one 12-bit RAW channel. It is popular with professional photographers since RAW preserves the original color bit depth and image quality and saves storage space compared to TIFF. Some cameras offer nearly lossless compressed RAW. The disadvantage is that there is a different RAW format for each manufacturer, and so you may have to use the manufacturer’s software to view the images.

RPIX, or Raw Pixel, is a format developed by Oracle for storing simple raw pixel data without compression and using a simple well-described header structure. It was designed to be used by applications whose native image format is not supported by interMedia but for which an external translation might be available. It flexibly supports N-banded image data (8 bits per sample) where N is less than 256 bands, and can handle data that is encoded in a variety of channel orders (such as RGB, BGR, BRG, and so forth); a variety of pixel orders (left-to-right and right-to-left); a variety of scanline orders (top-down or bottom-up); and a variety of band orders (band interleaved by pixel, scanline, and plane). The flexibility of the format includes a data offset capability, which can allow an RPIX header to be prepended to other image data, thus allowing the RPIX decoder to read an otherwise compliant image format. The extension is .rpx, and the MIME type is image/x-ora-rpix.

Tag image format file (TIFF) is used for bit-mapped images and provides lossless compression. It is often used for printing. TIFF is a flexible and adaptable file format. It can handle multiple images and data in a single file through the inclusion of “tags” in the file header. Tags can indicate the basic geometry of the image, such as its size, or define how the image data is arranged and whether various options are used. Unlike standard JPEG, TIFF files can be edited and resaved without suffering a compression loss. Other TIFF file options include multiple layers or pages.

Oracle interMedia supports the “baseline TIFF” specification and also includes support for some TIFF “extensions,” including tiled images and certain compression formats not included as part of the baseline TIFF specification. “Planar” TIFF images are not supported. TIFF images in either big-endian or little-endian format can be read, but interMedia always writes big-endian TIFFs. (Note this refers to the type of computer system—in a big-endian system, the most significant value in the sequence is stored at the lowest address of the computer memory; in a little-endian system, the least significant value in the sequence is stored first. Most PCs are little endians.)

One final important difference between TIFF and most other image file formats is that TIFF defines support for multiple images in a single file. In Chapter 8 when we deal with image-processing methods of ORDImage we need to note that although the TIFF decoder in interMedia includes support for page selection using the “page” verb in the process() and processCopy() methods, the setProperties() method always returns the properties of the initial page in the file. It is important to note that this initial page is accessed by setting page=0 in the process command string. Oracle interMedia currently does not support writing multiple page TIFF files.

Wireless bitmap (WBMP) format is a simple image format used in the context of WAP and mobile phones. (WAP is an open international standard for applications that use wireless communication, e.g., Internet access from a mobile phone.) Currently, the only type of WBMP file defined is a simple black-and-white image file with one bit per pixel and no compression.

For a list of interMedia file formats see Oracle interMedia Reference 10g Release 2 (10.2).

Color Perception

In the human visual system cells called cones found at the center of the retina perceive three colors. There are about six million cones that can distinguish eight million colors. Color sensations arise because there are three types of color receptors:

  1. Blue cones peak at 445 nm.

  2. Green cones peak at 535 nm.

  3. Red cones peak at 570 nm.

The different cones cover three overlapping portions of the visual spectrum. The spectral sensitivity of the human vision varies. Eyes are most sensitive to green range and least sensitive to blue range. All the other two million colors are perceived because of the tristimulus theory of color mixing, which is covered in more detail later in this chapter. The average person can distinguish one to two million colors. However, color perception is very individual with about 8% of the population being color blind. It is impossible to consistently describe a particular color without using a color model. A color model is a representative system and there are a number of different ones in use.

CIE_XYZ Color-space

CIE (Commission Internationale de l’Eclairage) created the CIE_XYZ color-space as early as 1931. The color vision of a group of people was tested and a model for human visual perception called the CIE Standard Observer was created based on those tests. The CIE_XYZ color-space was then created by combining

  • Well-known physical properties of light, and

  • The characteristics and restrictions/boundaries of the human visual perception according to the CIE Standard Observer.

In other words the CIE_XYZ color-space

  • Defines the light radiation exactly as it appears in real life.

  • Is weighted by the color-matching abilities of the average human eye.

  • Is restricted to the radiation spectrum that is visible for the average human eye.

  • Is a physical axis system that closely simulates the human visual perception.

For these reasons, CIE_XYZ is the fundamental basis of all color management, such as calibration, color-space conversions, and color matching. It is also the foundation for all other color-spaces. CIE is different from the RGB standard and needs to be converted by using a formula.

RGB Additive Color from Glowing Bodies, Lights, TVs, and Monitors

This color model describes colors as a mixture of red, green, and blue (RGB).

This is an additive color model so adding all three colors gives white. If we look at a computer screen with a magnifying glass we can see separate pixels of only one of these three colors. At a distance the human eye blends the colors to mix all the different colors. Colors on monitors look brighter in a darkened room and vice versa.

  • RED + GREEN = YELLOW.

CMYK: Subtractive Color from Reflecting Objects—Color Printing

Most objects reflect light. A blue shirt will have absorbed the other light colors—they have been subtracted so blue is left. In a subtractive system, mixing two colors gives a darker color (e.g., painting and printing). CMYK derives from cyan, which is the complement of red; magenta, which is the complement of green; yellow, which is the complement of blue; and the K in CMYK confusingly stands for black because originally this was called the key color.

If we printed cyan and yellow on top of each other, the result would be green.

CYAN + YELLOW = GREEN.

If we printed cyan and magenta on top of each other, the result would be blue.

CYAN subtracts RED, YELLOW subtracts BLUE, leaving GREEN.

These are the standard links used in the lithographic printing industry to reproduce color images. Photographic images are divided into four channels, one for each print ink color. This is why the printed image will look different from the screen display. The conversion of RGB images into CMYK involves the storage of extra data corresponding to the fourth channel so the image file could increase by 25%.

HSB System (Hue, Saturation, and Brightness)

This is similar to traditional paint-based systems for describing color.

  • Hue—the name of the color (e.g., red). The numeric value for hue is given in the degrees around a circle, so that, for example, 120 is green, 240 is blue, while red is zero.

  • Saturation—the purity of the color (e.g., pure green, 120, is fully saturated while pale green has a lot of white so it has low saturation). White, black, and grey all have zero saturation. Numeric values vary from 0 to 100% for pure colors.

  • Brightness—white has the highest brightness, black the lowest. Again there is a numeric scale from 0 to 100%. As brightness decreases there is a smaller range in saturation values because humans are less able to sense differences in hue and saturation in darker colors. This leads to the system being presented as a cone.

RYB (Red, Yellow, and Blue)

This is the system used traditionally by artists to understand color and it is often presented as a color wheel:

  • Primary colors are red, yellow, and blue.

  • Secondary colors are made by mixing equal quantities of primary colors to give orange, violet, and green.

  • Tertiary colors are the result from mixing equal quantities of a primary color with a secondary color to give six colors: red-violet, blue-violet, yellow-green, yellow-orange, blue-green, and red-orange.

These 12 colors are the basis of color harmony design.

Comparing the Four Models

Not all colors can be expressed in all color systems. The designer needs to be aware that

  • RGB is hardware-oriented but is also used by CBIR methods (see Chapter 11).

  • CMYK is technology-oriented (i.e., printers). It is a system that cannot produce colors of high saturation and brightness.

  • HSB is an attempt to be more usable for users to describe the color they want.

  • HSB and RYB use a circular arrangement of colors that is useful for creating and analyzing color harmonies.

For example, a bright red on a screen would have values (255, 0, 0) in RGB and (0, 100, 10) in HSB, but will be outside the range for description by CMYK.

However, although we know roughly the frequencies of the red, green, and blue color receptors in the human visual system, we may not be sure what “red,” “green,” and “blue” mean for different computer hardware. What frequency should we consider to be pure red, pure green, and pure blue? A cheaper monitor with less range will generally use slightly different frequencies for red, green, and blue than a more expensive model. To get truly accurate color on devices that define their color differently, we need to refer to the ICC/ICM profiles that describe the characteristics of how an image or device reproduces color using primaries. The ICC/ICM profile is a set of measurements that describes the colors of a particular imaging device. It contains:

  • Transfer function of the device. It is either a measured or mathematical function by which the particular device codes the intensities, in the simplest form a single gamma value.

  • Trichromatic coordinates of the device. It describes the device’s color gamut in the CIE_XYZ color-space.

  • The white-point coordinates of the device. It is the description of this hue that is considered to be colorless (gray, achromatic).

These together fully describe the colors that a digital imaging device uses and allows accurate color conversion from device to device. Color conversion is more of an issue with printers than for monitors.

Color gamut is another term used for a subset of CIE_XYZ color-space, in other words color gamut describes the variety or range of colors that the device is able to reproduce. Another sense, less frequently used but not less correct, refers to the complete set of colors found within an image at a given time. In this context, digitizing a photograph, converting a digitized image to a different color space, or outputting a digitized image to a given medium using a certain output device generally alters its gamut, in the sense that some of the colors in the original are lost in the process. Note that the generic traditional photography and film term for the complete range of colors that can be captured on a particular medium is dynamic range.

Real-time Media

Most of the previous discussion has focused on image media. When we consider video and audio we need to deal with what is termed their real-time nature. This usually refers to the need for synchronization and order. Since digital images are specified by the coordinates of pixels, there is no need to be concerned about the order of delivery of the pixels to the display device. With audio and video samples, the order is crucial.

Seeing Video

Temporal and spatial attributes will be important. Other cells in the retina called rods react to light in 25 ms, significantly faster than cones. Our ability to fuse images into a continuum depends on the image size and brightness. Cinema displays at 24 frames per second (fps) while TV is refreshed at 60 Hz in the United States and 50 Hz in the United Kingdom

Brightness is a subjective reaction to levels of light. It is affected by the luminance of an object, which in turn is a physical property, the result of the amount of light falling on an object and its reflective properties. Visual acuity increases with brightness but flicker also increases. The eye will perceive a light turned rapidly on and off as continuous as long as the speed of switching is more than 50 Hz, otherwise it will be seen to flicker.

Video can be processed to extract audiovisual features, such as

  • Image-based features

  • Motion-based features

  • Object detection and tracking

  • Speech recognition

  • Speaker identification

  • Word spotting

  • Audio classification

There is no single best format for delivering audio and video at present. The choice of format is related to the way the media is going to be delivered to the users. There are three alternatives:

  1. Download. The materials must be downloaded and can only be played once these are fully loaded. The media data would continue to be available stored on the user’s computer for replaying whenever they wish, sharing with others, or transfer to other devices.

  2. Progressive download. This is similar in all ways to download, except that the user’s download software is set up to start playing the materials when enough has been downloaded to ensure a continuous listening/viewing experience. The media is stored on the local computer.

  3. Streaming. The user does not actually download any materials to his or her computer; rather, the streaming software on his or her computer makes a semi-real-time connection to a streaming server, which sends a stream of “moving images” of compressed audio/video over the Internet that are displayed as they arrive. Once playing is completed, no materials are available on the user’s computer, so if he or she wishes to play the material again, he or she must repeat the streaming process. No media is stored on the local computer.

To take an example, the RealVideo delivery system can download or stream media. A RealVideo clip is a file or live broadcast containing sound and video encoded in RealVideo format. These formats are highly compressed to deliver good quality sound and video over a limited-bandwidth connection. The RealVideo system provides several formats that are optimized differently for different kinds of content. For example, you would use a different format to deliver speech over a 14.4 Kbps modem than you would to deliver a music video over an ISDN connection.

The .rm files can contain multiple streams, including audio, video, image maps, and events. If we want the user to download the file from the Web, we would insert into the HTML file a reference to a video file for format type .rm:

<a href="video01.rm">Click here to download video</a>

However, if we want the user to stream the same video, we actually need to create another file, a simple text file with a .ram extension. This is known as a metafile—a file that contains data about another file. The metafile is a simple text file that includes the URL of the media data source, such as

http://www.mycollege.com/media/video/video01.rm

We would then change the HTML reference to the video metafile with the extension .ram:

<a href="video01.ram">Click here to view streaming video</a>

When the user clicks this hyperlink, the metafile opens and in turn opens the video file at the specified URL. This time the video will be streamed instead of downloaded. This process is transparent to the user, as far as he or she is concerned the hyperlink just starts the streaming.

There are two metafile formats, .ram and .rpm (RealPlayer Plug-in metafile). This is the same as a metafile, but used with RealPlayer Plug-in for Netscape Navigator and Internet Explorer 3.0 and later:

  • .ram file—browser launches RealPlayer

  • .rpm file—browser launches RealPlayer Plug-in (see below)

For files with .rpm file extension (RealPlayer Plug-in), the Web server sets the MIME type of the file to audio/x-pn-RealAudio-plugin.

In order for RealMedia files to stream from the website the user’s host server must recognize the .ra, .ram, .rm, and .rpm MIME types. The Web server delivers the RealVideo metafile to the Web browser. Based on the .ram file extension, the Web server sets the MIME type of the file.

The Web browser looks up the MIME type of the RealVideo metafile. Based on the MIME type, the Web browser starts RealPlayer as a helper application and passes it the metafile.

RealPlayer reads the first URL from the metafile and requests it from RealServer.

RealServer begins streaming the requested RealVideo or RealAudio clip to RealPlayer.

A related concept is a container data type, such as RealMedia File Format (RMFF) or ASF, that can contain other data types. Each container data type is identified by a unique MIME type. interMedia supports RMFF data format for file extension .rm with MIME type video/x-pn-realvideo.

Table 2.6 outlines common video file formats.

Table 2.6. Common Video File Formats

Video Format

File Extension

Originator

Streaming

Additional Player

AVI

.avi

Microsoft

Yes

No

QuickTime

.qt

Apple

Yes

No

MPEG-4

.mpg

MPEG

Yes

No

RealVideo

.rm

RealNetworks

Yes

Yes

Audio

There are two categories for audio, streaming and nonstreaming. In the case of nonstreaming, the entire file is downloaded and saved to disk before playing. In the case of streaming, the file is not saved to disk. The advantages of streaming are that the user can listen during the download process. The formats for nonstreaming audio files include .wav, .au, or .midi. Formats for streaming audio include .ra, .mps. However, streaming audio may not be displayed by the Web browser without a plug-in being available, such as RealAudio, and may require a helper application, such as RealPlayer. Compression is available for MP3 and RealAudio formats.

Another issue with audio is that it is considered good practice to include a transcript of any spoken audio files so that users can read as an alternative to listening. This can be catered to in the ORDAudio object type (see Chapter 3).

Video and Audio Streaming

It has been possible to download and play back high-quality audio and video files from the Internet for some years. This was achieved by full-file transfer but it meant very long, unacceptable transfer times and playback latency. Ideally, video and audio should be streamed across the Internet from the server to the client in response to a client request for a Web page containing embedded videos. The client plays the incoming multimedia stream in real time as the data is received.

The most important video codec standards for streaming video are H.261 and MPEG-1, MPEG-2, and MPEG-4. Codecs designed for the Internet are more complex because of the problems of latency. In addition, the codecs must be tightly linked to network delivery software to achieve the highest possible frame rates and picture quality. None of the existing codecs are ideal for Internet video.

What Is Available?

H.261 was targeted at teleconferencing applications and is intended for carrying video over ISDN—in particular for face-to-face videophone applications and for videoconferencing. H.261 needs substantially less CPU power for real-time encoding than MPEG. However, because of the nature of the application it does not usually involve database storage so the format is not supported by interMedia.

MPEG-1, -2, and -4 are standards for the bandwidth efficient transmission of video and audio. MPEG-1 does not offer resolution scalability and the video quality is highly susceptible to packet losses. MPEG-2 extends MPEG-1 by including support for higher resolution video and increased audio capabilities. The targeted bit rate for MPEG-2 is 4–15Mbits/sec, providing broadcast-quality full-screen video. For the same reasons as MPEG-1, it is also prone to poor video quality in the presence of packet losses. However, both MPEG-1 and MPEG-2 are well suited to the purposes for which they were originally developed. For example, MPEG-1 works well for playback from CD-ROM, and MPEG-2 is fine for high-quality archiving applications and for TV broadcast applications. However, for existing computer and Internet infrastructures, MPEG-based solutions are too expensive and require too much bandwidth; they were not designed with the Internet in mind. interMedia supports video MPEG formats—MPEG-1, MPEG-2, and MPEG-4—with the MIME types given in Table 2.4.

MPEG-7 has not yet been widely implemented but it has been promoted as a standard with the objective to provide a common interface for audiovisual content description in multimedia environments. This would easily provide interoperability for different MPEG-7 systems or modules. The standard is based on the previous MPEG standards but in addition includes the notion of descriptors (D) and description schemes (DS). The former (D) represents a model for specific high- or low-level features that can be annotated for a given media object. The latter (DS) just represents a grouping of a series of descriptors or further description schemes in a particular functional area.

The definition of the MPEG-7 standard relies on further standards of the MPEG family and heavily on the XML language and XML-schema that are used in its representation and its definition. MPEG-7 itself is provided in the form of an extensible XML-schema defining an object-oriented type hierarchy that delivers a set of predefined descriptors grouped into its functional description schemes. For example, the standard defines an Agent DS that can represent data for persons, groups, or organizations.

However, from a database viewpoint the MPEG-7 standard does not define how searching or indexing should be implemented on the media data. It also does not make any assumptions about the internal storage format. The terms used in the standard include the following:

  • FileFormat (MPEG-7 FileFormat CS or MIME)

    • System (MPEG-7 System CS)

    • Bandwidth (Hz)

    • BitRate (attributes: minimum, average, maximum)

  • VisualCoding

    • Format (MPEG-7 VisualCoding CS; attribute: color domain)

    • Pixel (attributes: resolution, aspectRatio, bitPer [accuracy])

    • Frame (attributes: height, width, aspectRatio, rate)

  • AudioCoding

    • Format (MPEG-7 AudioCoding CS)

  • AudioChannels

    • Sample (attributes: rate, bitPer)

    • Presentation (MPEG-7 AudioPresentation CS)

  • Classification DS

    • Form (recommend LC’s migfg)

    • Genre (recommend LC’s migfg)

    • Subject (recommend LCSH or other)

    • Language (also SubtitleLanguage, ClosedCaptionLanguage, etc.)

    • Release (country and date)

    • Target (market, age [e.g., audience, such as higher education])

  • 9.1.4 RelatedMaterial DS

    • PublicationType (MPEG-7 PublicationType CS)

    • MaterialType (recommend Ruth Bogan list of MaterialTypes)

    • MediaLocator

    • MediaInformation

    • CreationInformation

    • UsageInformation

Some MPEG-7 implementations have been achieved, such as an implementation of an MPEG-7 database that was reported by Wust & Celma in 2004, for the content-based retrieval of music using Oracle 9. In addition Fedora (flexible extensible digital object and repository architecture; http://www.fedora.info/) was created to implement the sharing and preservation of digital library objects using a profile of the METS metadata scheme. This has also been based on Oracle Database.

Another evolving standard is MPEG-21. As noted in ISO/IEC, “The vision for MPEG-21 is to define a multimedia framework to enable transparent and augmented use of multimedia resources across a wide range of networks and devices used by different communities” (2001, p. 5).

Work on the new standard MPEG-21, “Multimedia Framework,” started in June 2000 with the aims of defining a normative open framework for multimedia delivery and consumption for use by all the players in the delivery and consumption chain. This open framework is intended to provide content creators, producers, distributors, and service providers with equal opportunities in an MPEG-21 enabled open market. This will also benefit the content consumer in providing them with access to a large variety of content in an interoperable manner. MPEG-21 is based on two essential concepts: the definition of a fundamental unit of distribution and transaction (the digital item) and the concept of users interacting with digital items. The digital items can be considered the “what” of the Multimedia Framework (e.g., a video collection, a music album), and the users can be considered the “who.” The goal of MPEG-21 can be rephrased as defining the technology needed to support users to exchange, access, consume, trade, and otherwise manipulate digital items in an efficient, transparent, and interoperable way.

However, as of yet, MPEG-21 has even fewer current implementations but there is one in development at Los Alamos National Laboratory (Bekaert, 2004). It has been estimated from previous experience that it takes about ten years for a new MPEG standard to be fully adopted.

Despite the open standards of MPEG most people use one of the big three proprietary formats. These are RealMedia, Quicktime, and Windows Media. All three have specific advantages that have allowed them to gain ground in the market, mainly because they are free and support the Real-Time Streaming Protocol (RTSP).

New solutions are appearing that use Java to eliminate the need to download and install plug-ins or players. Such an approach will become standard once the Java Media Player APIs being developed by Sun, Silicon Graphics, and Intel are available. This approach will also ensure client platform independence.

3GP Standards

3G stands for third generation, a generic wireless industry term for high-speed mobile data delivery over cellular networks. 3G networks allow users to send and receive bandwidth-intensive information, such as video, video conferencing, high-quality audio, and Web data on demand, virtually anytime and anyplace.

3GPP and 3GPP2 are the new worldwide standards for the creation, delivery, and playback of multimedia over third-generation, high-speed wireless networks. Defined by the 3rd Generation Partnership Project and 3rd Generation Partnership Project 2, respectively, these standards seek to provide uniform delivery of rich multimedia over newly evolved, broadband mobile networks (third-generation networks) to the latest multimedia-enabled wireless devices. 3GPP and 3GPP2 take advantage of MPEG-4, the standard for delivery of video and audio over the Internet, but are tailored to the unique requirements of mobile devices. These formats are supported by interMedia and the MIME types are audio/3gpp or video/3gpp. The extensions are as follows:

  • .3gp 3GPP standard, GSM network

  • Video: MPEG-4, H.263

  • Audio: AAC, AMR

  • .3g2 3GPP2 standard, CDMA2000 network

  • Video: MPEG-4, H.263

  • Audio: AAC, AMR, QCELP

What Is Metadata?

In the earlier sections of the chapter we introduced a number of different forms of metadata:

  • In database schema and constraints, setting up value sets and domain.

  • In distributed DBMS, the location and distribution of data.

  • In multimedia databases, descriptions of the individual objects (data about data) and descriptive data about each stored object.

  • On the Web, provenance (i.e., origin), quality, and integration of data.

We have already met several examples of both metadata and its use. For example the RealVideo metafiles needed to stream video. There are many uses of metadata, including:

  • Administrative: managing data collection process.

  • Descriptive: describing for retrieval purposes and creating indexes.

  • Preservation: managing data refreshing and migration.

  • Technical: used to describe a media object in a technical sense (formats, compression, scaling, encryption, authentication, and security).

  • Usage: users, users’ level and type of use, and user tracking.

Generating and Extracting Metadata

The generation of metadata can be achieved in a number of ways, for example:

  • Analysis of raw media data

  • Implicit metadata generation

  • Semi-automatic generation

  • Manual augmentation

The creation and management of metadata can become a complex issue when we wish to provide metadata for text and content-based retrieval. Unlike text documents, images make no attempt to tell us what they are about and often they are used for purposes not anticipated by their originators. In terms of manual augmentation, it is very difficult to express in words what a work of art is about when it is based on a wordless medium. Annotating digital images with additional metadata is a common practice in photographic and news-gathering applications, for image archiving usages, and at the consumer level. However, metadata based on the manual addition of keywords to image objects is very time-consuming.

Multimedia objects may acquire layers of metadata as they move through their lifecycle so when we design the structure of the metadata we need to consider how it would be updated. Where will the metadata be stored in the database? There are a range of ways in which metadata can be associated with the media object:

  • Contained within the same envelope as the media, for example, through the header of an image file, as part of the object definition in Oracle interMedia.

  • Bundled with the media object, for example, universal preservation format (UPF).

  • Attached to the information object through bidirectional pointers and hyperlinks.

  • Stored separately in a metadata registry, which is a special kind of data dictionary for metadata.

Image Metadata

Storing metadata together with image data in the same container provides encapsulation. With encapsulation, both types of data can be shared and exchanged reliably as one unit. Metadata that is stored in the image file format is referred to as embedded metadata. Metadata can be stored in image files using a variety of mechanisms. Digital cameras and scanners automatically insert metadata into the images as they are created. Digital photograph processing applications like Adobe Photoshop allow users to add or edit metadata to be stored with the image.

For a large number of image file formats, Oracle interMedia can extract and manage a limited set of metadata attributes. These attributes include height, width, contentLength, fileFormat, contentFormat, compressionFormat, and MIME type. All these are included in the ORDImage data type described in Chapter 3.

For a limited number of image file formats, interMedia can also extract a rich set of metadata attributes. This metadata is represented in schema-based XML documents. These XML documents can be stored in a database, indexed, searched, updated, and made available to applications using the standard mechanisms of Oracle database.

interMedia supports metadata embedding for the GIF, TIFF, and JPEG file formats using the methods described in Chapters 8. The application provides the metadata as a schema-based XML document. Then interMedia processes the XML document and writes the metadata into the image file. The metadata must conform to the Adobe XMP format and may also be required to conform to one of the standards for the interoperability of systems, such as the data shown in Table 2.7

Table 2.7. ISO/IEC 11179 Attributes

Attribute

Description

Name

The label assigned to the data element

Identifier

The unique identifier assigned to the data element

Version

The version of the data element

Registration authority

The entity authorized to register the data element

Language

The language in which the data element is specified

Definition

A statement that clearly represents the concept and essential nature of the data element

Obligation

Indicates if the data element is required to always or sometimes be present (contains a value)

Data type

Indicates the type of data that can be represented in the value of the data element

Maximum occurrence

Indicates any limit to the repeatability of the data element

Comment

A remark concerning the application of the data element

Resource Description Framework (RDF)

The World Wide Web Consortium is developing a standard, the Resource Description Framework (RDF) for metadata. RDF allows multiple metadata schemes to be read by humans as well as parsed by machines. RDF is a foundation for processing metadata; it provides interoperability between applications that exchange machine-understandable information on the Web.

RDF will specify a framework for detailed descriptions of all kinds of objects stored on the Web, allowing search engines to identify relevant content with much greater precision than is at present possible. The specification allows users to define attribute types and values relevant to their own needs, with the objective of providing sufficient extensibility to meet a whole range of specialist needs. RDF uses XML to express structure thereby allowing metadata communities to define the actual semantics.

Image Metadata Format

The term image metadata format refers to the standard protocols and techniques used to store image metadata within an image file. The embedded image metadata formats supported by interMedia are:

  • EXIF. The standard for image metadata storage for digital still cameras. It can be stored in TIFF, JPEG, and JPEG2000 format images. interMedia supports the extraction of EXIF metadata from TIFF, JPEG, and JPEG2000 file formats.

  • IPTC-IIM. The International Press Telecommunications Council—Information Interchange Model (IPTC-IIM) Version 4 is a standard developed jointly by the International Press Telecommunications Council and the Newspaper Association of America. This metadata standard is designed to capture information that is important to the activities of news gathering, reporting, and publishing. These information records are commonly referred to as IPTC tags. IPTC metadata can be stored in TIFF, JPEG, and JPEG2000 format images. The use of embedded IPTC tags in image file formats became widespread with the use of Adobe Photoshop’s tool for image editing. interMedia supports the extraction of IPTC metadata from TIFF, JPEG, and JPEG2000 file formats.

  • XMP. The extensible metadata platform (XMP) is a standard metadata format, developed by Adobe, for the creation, processing, and interchange of metadata in a variety of applications. XMP uses Resource Description Framework (RDF) technology for data modeling. XMP also defines how the data model is serialized (converted to a byte stream), and embedded within an image file. interMedia supports the extraction of XMP metadata from GIF, TIFF, JPEG, and JPEG2000 file formats. interMedia also supports writing XMP data packets into GIF, TIFF, JPEG, and JPEG2000 file formats.

Once metadata has been extracted from the binary image file, the next step is to represent the metadata in a form that can be easily stored, indexed, queried, updated, and presented. interMedia returns image metadata in XML documents. These documents are based on XML schemas that interMedia registers with the database, the XML schemas used by the metadata methods of the ORDImage object type. Each type of image metadata has a separate XML schema. These XML schemas are used by the metadata methods of the ORDImage object type. These schemas are registered in Oracle Database when Oracle interMedia is installed. The schemas may be examined by querying the dictionary view ALL_XML_SCHEMAS.

The following XML schemas are available:

  • XML schema for DICOM metadata

  • XML schema for EXIF metadata

  • XML schema for IPTC-IIM metadata

  • XML schema for ORDImage attributes

  • XML schema for XMP metadata

Users may store the returned metadata documents in metadata columns of type XMLType, which are bound to the corresponding metadata XML schemas that interMedia provides. An example of an XML instance document follows.

/*

<xmpMetadata xmlns="http://xmlns.oracle.com/ord/meta/xmp"
    xsi:schemaLocation="http://xmlns.oracle.com/ord/meta/xmp
       http://xmlns.oracle.com/ord/meta/xmp"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-
ns#">
<rdf:Description about="" xmlns:dc="http://purl.org/dc/
elements/1.1/">
      <dc:title>A Winter Day<dc:title>
      <dc:creator>Frosty S. Man</dc:creator>
      <dc:date>21-Dec-2004</dc:date>
      <dc:description>a sleigh ride</dc:description>
      <dc:copyright>North Pole Inc.</dc:copyright>
    </rdf:Description>
  </rdf:RDF>
</xmpMetadata>

*/

There are specialized ORDImage methods available for dealing with image metadata. ORDImage has methods to get and write the metadata into special XML documents as described in Chapter 8.

We claim DICOM is indeed a file format as well as a communcation protocol. Midway in this paragraph I asked you to add the following sentence “It specifies a file format as well as a communication protocol.” I also asked you to remove some earlier text, including this sentence “It does not specify a separate file format; rather, it assumes a media environment using industry standard media.” This earlier sentence directly contradicts the later sentence. It does not specify a file format. It specifies a file format. We cannot say both. I would like to remove the sentence that says, “It does not specify a separate file format; rather, it assumes a media environment using industry standard media.” We do have a member of the team on the DICOM standards committee and I believe it is accurate to say DICOM is a file format and a communications protocol.

Summary

In this chapter we have looked at the format, compression, and delivery requirements for rich media. It was clear that the standards for interoperability, quality, and metadata are vital. In the next chapter we will look at the options for storing rich media in a database.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.94.190