So much for our own account of these things. But in a more fitting place we shall attempt to show by quotations from the ancients, what others have said.
—Eusebius
We postulate volume projection metrics as CAM neurons, a biologically plausible model for clusters of low-level neurons describing an object region, that take input from image regions assembled in the LGN. CAM neurons produce low-level features to feed into the visual cortex V1–Vn regions, following the Hubel and Weisel model. CAM neurons are modeled via a 3-input and 1-out neuron, which composes an output address from three inputs. The output address is referred to as a CAM feature, which enables volume projection metrics. The CAM neuron encodes the pixel inputs provided to the neuron, and the output is a concatenation of all three inputs into an address (see Figure 6.1). The CAM address is a content-addressable memory (CAM). The CAM address is the feature. The number of CAM features found per genome is the metric.
The CAM address feature can be understood and visualized as a volumetric projection metric, where the CAM address is decomposed into (x,y,z) axis values used to record feature presence in a volume, as shown in the upper right of Figure 6.1. The CAM address is a volumetric projection metric.
Each input to the CAM neuron represents a low-level magno or parvo feature from triples, or a 3x1 matrix of adjacent pixels from oriented lines within LGN images at 0, 45, 90 and 135 degree angles or orientations and is assembled into a CAM address as shown in Figure 6.2. For some input spaces, the 3x1 triple is assembled from oriented lines, and for other input spaces, the 3x1 is assembled from Z columns from a single RBG pixel, as explained in the next section and summarized in Table 6.1 in the “CAM Feature Spaces” section.
As shown in Figure 6.2, the 8-bit quantized volume projection space is contained in four 16M feature segments (224 addresses), one for each orientation A, B, C and D. Quantization spaces of 8,5,4,3,2 bits are supported (see Figure 6.6 for volume renderings).
Why use 3x1 features instead of 3x3 or some other shape? Here we provide some discussion, trade-offs, and future plans.
As shown in Figure 6.2, the CAM feature memory address is created by extracting four oriented 3x1 lines into four separate features oriented at 0,45,90, 135 degrees; the pixel values comprise the CAM address feature. The idea is to represent each visual pixel impression as a memory feature and preserve all visual information. The set of 3-byte concatenated memory address representations shown in Figure 6.2 are a simple way to preserve all the information in the 3x3 regions. Why not 3x3? A 9-byte (72-bit) CAM address using all nine pixels from a 3x3 pixel region was originally considered, and seems like a good idea, but 72-bit addressing is impractical for desktop computers as shown in Figure 6.3. While it is possible to reduce the pixel resolution to 5-bits and still retain good accuracy, the resulting 3x3 245 memory space is about 35 terabytes (35,184,372,088,832), which is still too large for common computers today. Even a 4-bit quantization 236 yields a 6 terabyte address space 6,871,947,673,668; the resolution would be too low for many metrics. Intel provides a 128-bit set of math operations, but still the memory address space is 32GB for the current XEON processors (which apparently only uses 34 address lines).
The current VGM implementation uses a trade-off to segment the address space into four oriented 3x1 regions, as shown in Figure 6.2. Most desktop computers using 32-bit and 64-bit memory addressing with commercial operating systems support at least 2GB of address space per process, so 24-bit addresses are fine (8 bits x 3). NOTE: For practical reasons, desktop computers and operating systems do not use all 64 bits of the CPU address lines to map against a contiguous 64-bit addressed memory space.
However, in the future when computers provide much larger address spaces, it is desirable to add a larger CAM neuron type into the system, capable of 72-bit addressing using a 3x3 kernel to compute the address. Or eventually perhaps even a 5x5 kernel region to produce a 25-byte address with 2200 bits.
CAM addresses describe a primitive combined color, texture, and shape metric in and oriented quantization space, implemented as variable bit precision values. The quantization space provides a form of blur-sharp and scale invariance analogous to an image pyramid as used with SIFT, ORB, and other feature descriptors [1]. Thus, the content of the feature forms the CAM address.
Note that there are several types of input spaces taken by CAM neurons; to describe it differently, there are several types of CAM neurons in the VGM metrics spaces as shown in Figure 6.4:
Table 6.1: The CAM Neuron Input Spaces
CAM Neuron Inputs Spaces | # Input Features for Each Space |
3x1 matrix of adjacent values [p-1, p, p+1] from oriented lines | Spaces: R,G,B,I |
R 3x1 [p-1,p,p+1] [A_0,B_90,C_135,D_45] [8-bit, 5-bit, 4-bit, 3-bit, 2-bit] | |
G 3x1 [p-1,p,p+1] [A_0,B_90,C_135,D_45] [8-bit, 5-bit, 4-bit, 3-bit, 2-bit] | 4 orientations * |
B 3x1 [p-1,p,p+1] [A_0,B_90,C_135,D_45] [8-bit, 5-bit, 4-bit, 3-bit, 2-bit] | 5 quantizations per space |
I 3x1 [p-1,p,p+1] [A_0,B_90,C_135,D_45] [8-bit, 5-bit, 4-bit, 3-bit, 2-bit] | |
1x3 Matrix Z-column from components of single pixels with no orientation | Spaces: RGB, LBP, MIN, MAX, AVE |
RGB -> 3x1 [R,G,B] [8-bit, 5-bit, 4-bit, 3-bit, 2-bit] | |
LBP -> 3x1 [RLBP,GLBP,BLBP] [8-bit, 5-bit, 4-bit, 3-bit, 2-bit] | 5 quantizations per space |
RANK-MIN -> 3x1 [RMIN,GMIN,BMIN] [8-bit, 5-bit, 4-bit, 3-bit, 2-bit] | |
RANK-AVE -> 3x1 [RAVE,GAVE,BAVE] [8-bit, 5-bit, 4-bit, 3-bit, 2-bit] | |
RANK-MAX -> 3x1 [RMAX,GMAX,BMAX] [8-bit, 5-bit, 4-bit, 3-bit, 2-bit] |
All the CAM addresses in a genome feed into a set of summary CAM neural clusters, which record all CAM features from each input space as shown in Figure 6.4 into a set of 3D histogram volumes to sum the occurrence of each feature in the genome for each input space (see Figure 6.5).
As shown in Figure 6.5, the CAM neurons feed into a CAM neural cluster to sum all the CAM features in the genome—one cluster for each specific metric input space. Since there are 25 CAM input spaces (Figure 6.4), there are 25 corresponding CAM neural clusters per genome, one per each of the five pre-processed images (raw, sharp, retinex, histeq, blur), for a total of 125 CAM cluster neurons per genome. For well-segmented genomes representing homogenous bounded regions, the CAM neural clusters are regular shapes centered about the axis, usually with very few outliers. In other words, the feature counts are concentrated in a smaller area revealing similar features, rather than spread out in the volume revealing something more like a noise distribution of unlike features.
The magnitude (corresponding to size) of the CAM cluster neuron emulates biologically plausible neural growth [1] each time a CAM neuron feature impression increments the corresponding (x,y,z) cell in the CAM neural cluster. The CAM cluster neuron is a memory device. Each cluster represents related features from an input space. The size of each neuron follows plausible neuroscience findings and is determined by (1) how often the visual impression is observed and (2) the number of neural connections. Thus, CAM cluster neuron size is a function of the frequency which a visual function is observed.
As an alternative to the 3x1 pixel mappings to generate the CAM cluster addresses, the VGM supports various other methods as discussed in Table 6.1; for example, RGB volume clustering uses each RGB pixel component to compose an (x,y,z) address by assigning x = R, y = G, z = B, so for each pixel in the genome we increment the neural cluster:
CAM neural clusters can be rendered as a simple volume rendering as shown in Figure 6.6. The volume is the feature; another way to say it is the neural memory is the feature. CAM neural clusters are used for correspondence using various distance functions discussed in this chapter. The number of times a CAM feature is discovered over the entire genome region is recorded or summed in the volume, so the volume projection is a feature metric.
The metric projection concept is often employed in statistics; for example, the support vector machine (SVM) approach of representing metrics in higher dimensional spaces is commonly used to find a better correspondence (see Vapnik [80][77][78][79], and also [1]). Likewise, we find insights via multivariate volumetric projection metrics. The basic volume projection is based on simple Hubel and Weisel style edge information over RGBI + LBP color spaces taken within a quantization space, as discussed below, emulating varying levels of detail across the low levels of the magno and parvo pathways.
As shown in Figure 6.6, the volumetric projection metrics contain a range of color, shape, and texture metrics. The false coloring in the renderings represents impression counts (magnitude) across the genome for each CAM feature, so the volume rendering is a 4D representation. The volume renderings in Figure 6.6 are surface renderings in this case, obscuring the volume internals, and use a color map to represent magnitude at each voxel. Other volume rendering styles can be used to view the internals of the volume as shown later in this chapter. Volume metrics are often rendered using the familiar 3D scatter-plot for data visualization (see [81][82][83]). However, we use volumetric projections as a native metric for feature representation and correspondence. Several distance functions have proven to be useful as discussed later in this chapter. Note that the CAM neural clusters are accessed by the visual processing centers V1–Vn of the visual cortex and used for correspondence as texture, shape, and color features.
Quantization space pyramids are used to represent visual data at various levels of detail and can be used to perform first-order comparisons of genome metrics to narrow down the best matches within the genome space by using progressively more bits to increase resolution. We use 8-bit, 5-bit, 4-bit, 3-bit, and 2-bit quantization. The bit-level quantization simulates a form of attentional level of detail, which is biologically plausible [1].
As shown in Figure 6.7, different bit resolution per color yield different levels of detail, and quantization to 6-bits and 7-bits seems unnecessary, given that 8-bit quantization results are perceptually close to 5-bit quantization. Based on testing, we have found that 8-bit and 5-bit quantization yield similar correspondence results, so 5-bits are used for some of the color and volume metrics, but 8-bit color is better suited for many metrics. For color, using 5-bits instead of 8-bits coalesces similar colors into a common color, which is desirable for some metrics. The quantization input ? to the CAM neuron shown earlier in Figure 6.1 can be used to shape the memory address by masking each pixel to coalesce similar memory addresses which focuses and groups similar features together. Even so, the full 8-bit resolution is still preserved in the genome and used when needed for various metrics.
For each image, a strand containing a summary of 2-bit quantizations of CAM clusters for each genome can be created to assist in optimizing correspondence, similar to an image pyramid used by SIFT at various resolutions [1], where SIFT correspondence is measured across the image pyramid to find features even when the image scale changes. In an analogous manner, 2-bit quantized CAM neural cluster features can be created for each genome in 128 bits, which can be evaluated natively in most Intel processor instruction sets today. Therefore, by searching for 128-bit strands, it is possible to quickly narrow down candidates target genomes to follow up with higher-level correspondence at 8- or 5-bit resolution. Using quantization spaces larger than 2-bits is beyond 128 bits, more complicated, and not supported natively in the CPU instruction set. We will illustrate the concept with the example below.
Imagine we use a 2-bit (four unique values) resolution CAM volume, with 4x4x4=64 cells in the (x,y,z) volume. We reduce the resolution of each cell counter to 4-bits and scale the input magnitudes using floats for input and mask off to the range 0..4. Then the total number of unique 2-bit genomes is:
*By coincidence, the Intel XEON processor provides 128-bit arithmetic, and
So, an address composed of all 64 counters in a 2-bit quantized volume, each with a 4-bit counter in base 4 (0,1,2,3), can be represented in a 128-bit value and compared in a 128-bit Intel ALU register as follows:
Typically, a 20MP image sequences to perhaps 3,000 unique genome regions, so a strand for each 20MP image would contain a set of 3,000 2-bit quantized genomes, each having 128 bits or 16 bytes, which is supported by current Intel architecture.
In this section we provide some discussion on the details of volume projection metrics, including the definitions, distance functions used, and memory size requirements.
Each time a CAM feature address is detected in the image, the count for the address is incremented in the CAM neural cluster volume, corresponding to feature commonality. The method for computing the feature addresses and counts is simple (as illustrated in the following code snippet) and relies on the quantization input value as an 8-bit hexadecimal mask value of 0xF8 (binary 1111 1000). Then each pixel value in the address is bit-masked into the desired quantization space to ignore the bottom three bits.
CAM neurons are built for each of the five types of images (raw, sharp, blur, retinex, histeq), and at 5-bit quantization levels (8,5,4,3,2). So, taking the 25 different types of CAM neurons for each image as illustrated previously in Figure 6.4, there are a total of:
Currently, we define a set of 25 distance metrics for CAM features as shown in Table 6.2. Note that some of the metric functions are volume intersection metrics, and others are total volume metrics. The intersection metrics mi are computed if and only if both volume values are nonzero (i.e. an impression count exists in both volumes for the same (x,y,z) coordinate, so the volumes intersect), and the volume total metrics mt are computed over the entire volumes regardless of full or empty cells. The intersection and total distance metrics f() are computed as follows:
Table 6.2: The volume metrics and distance functions
We apply the volumetric metrics primarily to texture and shape features as discussed in Chapters 8 and 9. However, the volumetric projections do contain color information as well, so the volumetric projections combine shape, texture, color, and quantization.
Recall that a genome is a 2D segmented region of pixels, typically containing pixels. Each genome is computed for each type of possible input image from the eye/retinal model for color channel and pre-processing and also over either an orientation space or a volume space. Finally, each of the metric combinations discussed in the preceding section are computed within a quantization space. Here is a laborious illustration of the memory requirements for volumetric CAM features.
PARVO feature spaces
4CP = 4 color channels: red,green,blue,intensity
5IP = 5 pre-processed versions of each image: raw,sharp,retinex,histeq,blur
4OP = 4 genome orientations 0,90,135,45 degrees
5QP = 5 quantization channels: 8,5,4,3,2 bits
Parvo oriented spaces: 4CP ∗ 5IP ∗ 4OP ∗ 5QP = 4000
5ZP =5 Z-column spaces, *unoriented: RAW, RANK-MIN, RNK-MAX, RANK-AVE, LBP
Parvo Z-column spaces: 4CP ∗ 5IP ∗ 5QP ∗ 5ZP = 500
MAGNO feature spaces
1C M = 1 color channels: luma (intensity)
5l M = 5 pre-processed versions of each image: raw,sharp,retinex,histeq,blur
4O M = 4 genome orientations 0,90,135,45 degrees
5Q M = 5 quantization channels: 8,5,4,3,2 bits
Magno oriented spaces: 1C M∗ 5l M∗ 4O M ∗ 5Q M= 1000
5Z M =5 Z-column spaces, *unoriented: RAW, RANK-MIN, RNK-MAX, RANK-AVE, LBP
Magno Z-column spaces: 1C M∗ 5I M∗ 5Q M∗ 5Z M= 125
Total feature spaces = total CAM Neural Clusters
4000 + 500 + 1000 + 125 = 5625
*Clusters can be compared 29 ways via distance metrics; see Table 6.2.
Each quantization space determines the amount of memory required to contain the volume metric space, since the (x,y,z) coordinate range is determined by the quantization (see Figure 6.5 earlier in the chapter). Since each of the CAM neural cluster volumes consume a different amount of memory based on the quantization space, the total memory required to store all CAM feature volumes for a given image is worked out here for a 20MP image.
*NOTE: each volume cell is a 4-byte long range: 0 - 0xffffffff (4,294,967,295)
Parvo memory per individual volumes
4000/5 = 800 oriented spaces
500/5 = 100 Z-column spaces
800 + 100 = 900 spaces at each quantization (8,5,4,3,2)
Magno individual volumes
1000/5 = 200 oriented spaces
125/5 = 25 Z-column spaces
200 + 25 = 225 spaces at each quantization (8,5,4,3,2)
For practical reasons, the individual volume projection spaces are kept in files stored with lossless compression, which usually provides a 100:1 compression ratio, since most of the volume data is zeros and compresses very well. The volume projection spaces are loaded into memory on demand from files and uncompressed for correspondence. Assuming a 4000x3000 12Mpixel image is sequenced into 2,000 genome regions, total volumetric space storage is shown below. Note that 8-bit quantization spaces are computed for base metric computations, but not stored, reducing storage. So, total compressed volumetric storage space is usually 1–2GB per 12Mpixel image.
Parvo volume memory structures totals = ~120.270TB : 1–2GB actually stored (compressed)
8 − bit quantization ∶ 2000 ∗ 60GB = 120TB (not stored to save space)
5 − bit quantization ∶ 2000 ∗ 118MB = 236GB
4 − bit quantization ∶ 2000 ∗ 15MB = 30GB
3 − bit quantization ∶ 2000 ∗ 2MB = 4GB
2 − bit quantization ∶ 2000 ∗ 230KB = 460MB
Magno volume memory structures totals = ~27TB : ~10MB actulally stored (compressed)
8 − bit quantization ∶ 2000 ∗ 13GB = 26TB (not stored to save space)
5 − bit quantization ∶ 2000 ∗ 26MB = 52GB
4 − bit quantization ∶ 2000 ∗ 3MB = 6GB
3 − bit quantization ∶ 2000 ∗ 410KB = 820MB
2 − bit quantization ∶ 2000 ∗ 51KB = 102MB
The magno and parvo feature tiles are composed of groups of low-level 3x1 pixel gradient information, following the basic Hubel and Weisel observations. Tiles are low-level features, summed in the higher-level CAM neural clusters features discussed earlier in this chapter. We note that Hubel and Weisel define primal shapes [1] for the lowest level receptive fields as oriented edge-like features, which CAM neurons model in a similar manner as edge orientations A, B, C and D. High-level feature shapes are represented in strands of segmented regions resembling corners, blobs, and arbitrarily shaped regions. The primal features are recorded over time by experiential learning (see [1, ref.552]). Magno and parvo tiles are illustrated in Figures 6.8 and 6.9.
The parvo CAM features are computed from five input image spaces: raw, sharpened, blurred, local contrast enhanced, and global contrast enhanced, broken into 3 RGB channels, for a total of 15 input images composed into the four CAM orientations A, B, C, D for each RGB color. Figure 6.9 shows magno luminance channel input from the five input image spaces at the four magno CAM orientations A, B, C, D.
It should be noted that for realistic images, the higher the quantization level, the more sparsely the volume will be populated. For example, for an 8-bit quantization, most of the volume will be empty and clustered around the center axis, but for a 2-bit quantization, most of the volume will be populated and likely still most highly populated around the center axis. Several volume renderings are provided to illustrate the point in the next section, “Quantized Volume Projection Metric Renderings.”
Note that maximally or widely diverging adjacent pixel values within the volumes do not often occur from natural images, and rather adjacent pixels are usually closer together in value. Widely diverging adjacent pixel values are more characteristic of very sharp edge transitions, noise and saturation effects; reasonable divergence corresponds to texture; and no divergence corresponds to no texture or a flat surface. So, the extremes of the volume address space will likely never be populated for visual genome features of natural images, which resemble sparse volumetric shapes clustered about the center axis.
The Visual Genome Project will be able to determine the most popular CAM clusters by sequencing millions of images and recording a master volume for all CAM neural clusters to record all known genomes. Or for a specific application domain, a master volume can be recorded as well.
The following volume renderings are made using the ImageJ Fiji Volume Viewer Plugin (http://fiji.sc/Volume_Viewer) to illustrate the detail provided by different quantization spaces. The renderings that are primarily reddish hues are surface renderings that include opaque surface lighting and shading but do not show internal details of the distributions. The bluish renderings do not use lighting and shading but rather use transparency effects to reveal the internal details of the volumes. The false coloring represents CAM neuron feature count.
The following renderings are provided:
Fig. 6.10. Renderings from a high-texture genome region from the Sequoias image
Fig. 6.11. Renderings from a medium-texture image of kids playing in a room
Fig. 6.12. An ambiguous rendering made without ignoring the 0-values in the addresses
Fig. 6.13. 4-bit quantization renderings of F18 ceiling region
Fig. 6.14. 5-bit quantization renderings of F18 ceiling region
Fig. 6.15. 8-bit quantization renderings of F18 ceiling region
Fig. 6.16. 8-bit rendering of F18 ceiling region under bright light, showing saturation effects in the addresses which bleed outside the volume
In this chapter we discussed volume projection metrics, which are rendered into an (x,y,z) volume for purposes of visualization and correspondence, and represent CAM neuron clusters. We discussed how CAM neurons implement CAM address features within the 125 input spaces including RGB, pre-processed images, and low-level 3x1 Hubel & Weisel style primal gradient features. The concept of clustering CAM features summed into CAM neural clusters was discussed in detail, as well as the associated volume metrics and distance functions available in the synthetic model. Details on memory size for all the volumetric memory features were enumerated, along with some discussion on the trade-offs for creating volumetric metric structures from different sized micro regions. Finally, volume projections were presented as volume renderings across a representative range of quantization spaces to provide insight.
3.21.46.78