i
i
i
i
i
i
i
i
58 3. Raster Images
3.1.3 Input Devices
Raster images have to come from somewhere, and any image that wasn’t com-
puted by some algorithm has to have been measured by some raster input device,
most often a camera or scanner. Even in rendering images of 3D scenes, pho-
tographs are used constantly as texture maps (see Chapter 11). A raster input
device has to make a light measurement for each pixel, and (like output devices)
they are usually based on arrays of sensors.
A digital camera is an example of a 2D array input device. The image sensor
lens
image
sensor
scene
Figure 3.7. The operation
of a digital camera.
in a camera is a semiconductor device with a grid of light-sensitive pixels. Two
common types of arrays are known as CCDs (charge-coupleddevices) and CMOS
(complimentary metal–oxide–semiconductor) image sensors. The camera’s lens
projects an image of the scene to be photographed onto the sensor, and then each
pixel measures the light energy falling on it, ultimately resulting in a number that
goes into the output image (Figure 3.7). In much the same way as color displays
use red, green, and blue subpixels, most color cameras work by using a colo r-filter
array or mosaic to allow each pixel to see only red, green, or blue light, leaving
the image processing software to ll in the missing values in a process known as
demosaicking (Figure 3.8).
Figure 3.8. Most color
digital cameras use a color-
filter array similar to the
Bayer mosaic
shown here.
Each pixel measures either
red, green, or blue light.
Other cameras use three separate arrays, or three separate layers in the array, to
measure independent red, green, and blue values at each pixel, producing a usable
color image without further processing. The resolution of a camera is determined
by the xed number of pixels in the array and is usually quoted using the total
count of pixels: a camera with an array of 3000 columns and 2000 rows produces
an image of resolution 3000 × 2000, which has 6 million pixels, and is called a
6 megapixel (MP) camera. It’s important to remember that a mosiac sensor does
People who are selling
cameras use "mega" to
mean 10
6
, not 2
20
as with
megabytes.
not measure a complete color image, so a camera that measures the same number
of pixels but with independent red, green, and blue measurements records more
information about the image than one with a mosaic sensor.
A atbed scanner also measures red, green, and blue values for each of a grid
of pixels, but like a thermal dye transfer printer it uses a 1D array that sweeps
across the page being scanned, making many measurements per second. The
resolution across the page is xed by the size of the array, and the resolution
The resolution of a scanner
is sometimes called its “op-
tical resolution” since most
scanners can produce im-
ages of other resolutions,
via built-in conversion.
along the page is determined by the frequency of measurements compared to the
speed at which the scan head moves. A color scanner has a 3 × n
x
array, where
n
x
is the number of pixels across the page, with the three rows covered by red,
green, and blue lters. With an appropriate delay between the times at which the
three colors are measured, this allows three independent color measurements at
each grid point. As with continuous-tone printers, the resolution of scanners is
reported in pixels per inch (ppi).
i
i
i
i
i
i
i
i
3.2. Images, Pixels, and Geometry 59
With this concrete information about where our images come from and where
they will go, we’ll now discuss images more abstractly, in the way we’ll use them
in graphics algorithms.
R
lens
array
detector
Figure 3.9. The operation
of a flatbed scanner.
3.2 Images, Pixels, and Geometry
We know that a raster image is a big array of pixels, each of which stores informa-
tion about the color of the image at its grid point. We’ve seen what various output
devices do with images we send to them and how input devices derive them from
images formed by light in the physical world. But for computations in the com-
puter we need a convenient abstraction that is independent of the specics of any
device, that we can use to reason about how to produce or interpret the values
stored in images.
When we measure or reproduce images, they take the form of two-dimensional
distributions of light energy: the light emitted from the monitor as a function of
position on the face of the display; the light falling on a camera’s image sensor
as a function of position across the sensor’s plane; the reflectance, or fraction of
“A pixel is not a little
square!”
—Alvy Ray Smith (A. R.
Smith, 1995)
light reected (as opposed to absorbed) as a function of position on a piece of pa-
per. So in the physical world, images are functions dened over two-dimensional
areas—almost always rectangles. So we can abstract an image as a function
I(x, y):R V,
where R R
2
is a rectangular area and V is the set of possible pixel values. The Are there any raster de-
vices that are not rectangu-
lar?
simplest case is an idealized grayscale image where each point in the rectangle
has just a brightness (no color), and we can say V = R
+
(the non-negative reals).
An idealized color image, with red, green, and blue values at each pixel, has
V =(R
+
)
3
. We’ll discuss other possibilities for V in the next section.
How does a raster image relate to this abstract notion of a continuous image?
Looking to the concrete examples, a pixel from a camera or scanner is a measure-
ment of the average color of the image over some small area around the pixel. A
display pixel, with its red, green, and blue subpixels, is designed so that the aver-
age color of the image over the face of the pixel is controlled by the corresponding
pixel value in the raster image. In both cases, the pixel value is a local average
of the color of the image, and it is called a point sample of the image. In other
words, when we nd the value x in a pixel, it means “the value of the image in the
vicinity of this grid point is x. The idea of images as sampled representations of
functions is explored further in Chapter 9.
A mundane but important question is where the pixels are located in 2D space.
This is only a matter of convention, but establishing a consistent convention is
i
i
i
i
i
i
i
i
60 3. Raster Images
Figure 3.10. Coordinates of a four pixel × three pixel screen. Note that in some APIs the
y
-axis will point downwards.
important! In this book, a raster image is indexed by the pair (i, j) indicating theIn some APIs, and many
file formats, the rows of an
image are organized top-to-
bottom, so that (0, 0) is at
the top left. This is for his-
torical reasons: the rows in
analog television transmis-
sion started from the top.
column (i)androw(j) of the pixel, counting from the bottom left. If an image
has n
x
columns and n
y
rows of pixels, the bottom-left pixel is (0, 0) and the top-
right is pixel (n
x
1,n
y
1). We need 2D real screen coordinates to specify
pixel positions. We will place the pixels’ sample points at integer coordinates, as
shown by the 4 × 3 screen in Figure 3.10.
The rectangular domain of the image has width n
x
and height n
y
and is cen-
tered on this grid, meaning that it extends half a pixel beyond the last sample point
on each side. So the rectangular domain of a n
x
× n
y
image isSome systems shift the co-
ordinates by half a pixel
to place the sample points
halfway between the inte-
gers but place the edges of
the image at integers.
R =[0.5,n
x
0.5] × [0.5,n
y
0.5] .
Again, these coordinates are simply conventions, but they will be important
to remember later when implementing cameras and viewing transformations.
3.2.1 Pixel Values
So far we have described the values of pixels in terms of real numbers, represent-
ing intensity (possibly separately for red, green, and blue) at a point in the image.
This suggests that images should be arrays of oating-point numbers, with either
one (for grayscale, or black and white, images) or three (for RGB color images)
32-bit oating point numbers stored per pixel. This format is sometimes used,
i
i
i
i
i
i
i
i
3.2. Images, Pixels, and Geometry 61
when its precision and range of values are needed, but images have a lot of pix-
els and memory and bandwidth for storing and transmitting images are invariably
Why 115 MB and not 120
MB?
scarce. Just one ten-megapixel photograph would consume about 115 MB of
RAM in this format.
Less range is required for images that are meant to be displayed directly.
While the range of possible light intensities is unbounded in principle, any given
device has a decidedly nite maximum intensity, so in many contexts it is per-
fectly sufcient for pixels to have a bounded range, usually taken to be [0, 1] for
simplicity. For instance, the possible values in an 8-bit image are 0, 1/255, 2/255,
...,254/255, 1. Images stored with oating-point numbers, allowing a wide
The denominator of 255,
rather than 256, is awk-
ward, but being able to rep-
resent 0 and 1 exactly is im-
portant.
range of values, are often called high dynamic range (HDR) images to distinguish
them from xed-range, or low dynamic range (LDR) images that are stored with
integers. See Chapter 23 for an in-depth discussion of techniques and applications
for high dynamic range images.
Here are some pixel formats with typical applications:
1-bit grayscale—text and other images where intermediate grays are not
desired (high resolution required);
8-bit RGB xed-range color (24 bits total per pixel)—web and email appli-
cations, consumer photographs;
8- or 10-bit xed-range RGB (24–30 bits/pixel)—digital interfaces to com-
puter displays;
12- to 14-bit xed-range RGB (36–42 bits/pixel)—raw camera images for
professional photography;
16-bitxed-rangeRGB (48 bits/pixel)—professionalphotographyand print-
ing; intermediate format for image processing of xed-range images;
16-bit xed-range grayscale (16 bits/pixel)—radiology and medical imag-
ing;
16-bit“half-precision”oating-pointRGB—HDR images; intermediatefor-
mat for real-time rendering;
32-bit oating-point RGB—general-purpose intermediate format for soft-
ware rendering and processing of HDR images.
Reducing the number of bits used to store each pixel leads to two distinc-
tive types of artifacts,orarticially introduced aws, in images. First, encoding
images with xed-range values produces clipping when pixels that would other-
wise be brighter than the maximum value are set, or clipped, to the maximum
i
i
i
i
i
i
i
i
62 3. Raster Images
representable value. For instance, a photograph of a sunny scene may include re-
ections that are much brighter than white surfaces; these will be clipped (even if
they were measured by the camera) when the image is converted to a xed range
to be displayed. Second, encoding images with limited precision leads to quan-
tization artifacts, or banding, when the need to round pixel values to the nearest
representable value introduces visible jumps in intensity or color. Banding can
be particularly insidious in animation and video, where the bands may not be
objectionable in still images but become very visible when they move back and
forth.
3.2.2 Monitor Intensities and Gamma
All modern monitors take digital input for the “value” of a pixel and convert this
to an intensity level. Real monitors have some non-zero intensity when they are
off because the screen reects some light. For our purposes we can consider this
“black” and the monitor fully on as “white. We assume a numeric description
of pixel color that ranges from zero to one. Black is zero, white is one, and a
gray halfway between black and white is 0.5. Note that here “halfway” refers to
the physical amount of light coming from the pixel, rather than the appearance.
The human perception of intensity is non-linear and will not be part of the present
discussion; see Chapter 22 for more.
There are two key issues that must be understood to produce correct images
on monitors. The rst is that monitors are non-linear with respect to input. For
example, if you give a monitor 0, 0.5, and 1.0 as inputs for three pixels, the
intensities displayed might be 0, 0.25, and 1.0 (off, one-quarter fully on, and
fully on). As an approximate characterization of this non-linearity, monitors are
commonly characterized by a γ (“gamma”) value. This value is the degree of
freedom in the formula
displayed intensity =(maximum intensity)a
γ
, (3.1)
where a is the input pixel value between zero and one. For example, if a monitor
has a gamma of 2.0, and we input a value of a =0.5, the displayed intensity
will be one fourth the maximum possible intensity because 0.5
2
=0.25.Note
that a =0maps to zero intensity and a =1maps to the maximum intensity
regardless of the value of γ. Describing a display’s non-linearity using γ is only
an approximation; we do not need a great deal of accuracy in estimating the γ of
a device. A nice visual way to gauge the non-linearity is to nd what value of a
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.232.95