Chapter 10. Image Metadata Support

Introduction

Metadata, to put it simply, is data about data. The metadata for an image is data about the image. This metadata may be used to store information such as:

  • Camera settings

  • Who created the image

  • When the image was taken

  • Where the image was taken

  • Copyright information

  • Contact information

  • Legal use information

  • Relevant keywords for the image

  • User-defined metadata (workflow data, etc.)

Additionally, there is metadata associated with DICOM medical images. This information may include:

  • Patient information

  • Study information

  • Equipment information

  • Series information

This metadata is stored within the image file itself, typically at the beginning of the file. Most imaging tools such as Adobe Photoshop® include the functionality to view and modify this information.

The metadata within the image file is either in binary format (IPTC, EXIF) or XML (XMP). When extracted by interMedia, all metadata formats are returned as XML. If the format is a binary format, it is converted to XML. This XML format is easier to use with database features such as XML DB and to search with Oracle Text. The kinds of metadata that can be extracted are

  • EXIF—exchangeable image file format

  • IPTC-IIM—International Press Telecommunications Council, information interchange model

  • XMP—extensible metadata platform

  • interMedia metadata

The EXIF standard focuses on camera setting information, such as camera settings, what kind of camera, and the time the picture was taken, and is inserted by most digital cameras when the photograph is taken.

The IPTC-IIM standard is used mostly to add “file information” useful in press applications. This information includes data like caption, news category, author, legal use, keywords, copyright, or dateline. IPTC-IIM is a frozen legacy standard that has been superseded by XMP.

XMP is the newest standard that is flexible. It can contain any kind of information and encompasses both EXIF and IPTC-IIM standards. It is also used in job processing and workflow applications. It is seen as data that may be updated by applications along a workflow.

The interMedia image metadata can be obtained from all interMedia supported image formats. It includes format metadata, such as height, width, and file format. For other embedded metadata, the image metadata that can be extracted from or inserted into images using interMedia is shown in Table 10.1.

Table 10.1. interMedia Metadata Insert Extraction Capabilities

File Type

EXIF Metadata

IPTC-IIM Metadata

XMP Metadata

JPEG

Extract

Extract

Extract/insert

JPEG2000

  

Extract/insert

GIF

  

Extract/insert

TIFF

Extract

Extract

Extract/insert

Insertion of metadata is supported using interMedia, however, interMedia can only put XMP, the newest and most flexible format, into images. This fits in nicely with the view of XMP that it is updated by the applications that use it. When XMP metadata is put into an image by interMedia, the non-XMP metadata already in the image is preserved.

Bear in mind that if an image is processed, with either the process or processCopy ORDImage methods, the resulting image will not have metadata attached. Since the format or size may change, the original metadata may not be correct, so it is intentionally not preserved.

This metadata in images is of great importance in finding images or having your images found. It is used in image searches, including Web and desktop image search engines. Typically, most image Web search engines use the HTTP <IMG> tag ALT attribute. This attribute indicates the text to display if the image is not displayed because of an error or the browser cannot display it (as in a text-based browser), or is set not to display images. Image search is also done using HTTP text near the image. An image website would be well advised to make image metadata available so their images can be found by potential customers. Keywords from the metadata stored in the images can be extracted and placed into the ALT attribute to make sure Web search engines can find the image on keyword searches.

Because of the increased use of embedded image metadata by applications, it is an area that most image applications need to consider for use in finding images in the database or on the Web.

Metadata Schemas

The XML schemas, in DTD format, for all of the XML metadata types are registered with XML DB by interMedia. This registration allows for schema validation, which is important when setting the metadata, and binding XMLTYPE columns to a particular XML representation. Validation for registered schemas is quicker since the schema is already parsed.

These XML schemas are registered with XML DB using the following schema namespace names for use with interMedia:

The following database table is defined by binding each XMLTYPE in the table to one of these schemas:

create table ImagesWithMetadata (item_id number primary key,
                                 exifMeta XMLTYPE,
                                 iptcMeta XMLTYPE,
                                 xmpMeta XMLTYPE,
                                 image ordsys.ordimage)
XMLTYPE COLUMN EXIFMETA
  XMLSCHEMA "http://xmlns.oracle.com/ord/meta/exif"
  ELEMENT "exifMetadata"
XMLTYPE COLUMN IPTCMETA
  XMLSCHEMA "http://xmlns.oracle.com/ord/meta/iptc"
  ELEMENT "iptcMetadata"
XMLTYPE COLUMN XMPMETA
  XMLSCHEMA "http://xmlns.oracle.com/ord/meta/xmp"
  ELEMENT "xmpMetadata"
;

The schemas define the structure of the XML data, and also implicitly, the structure of the storage in the database. This structure is in the form of an object-relational type. By default, the XML data is stored in structured storage.

To see what XML types were created by the table above we can describe them as follows.

SQL> describe ImagesWithMetadata;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------

 ITEM_ID                                   NOT NULL NUMBER
 EXIFMETA                                           SYS.XMLTYPE(XMLSchema "http:
                                                    //xmlns.oracle.com/ord/meta/

                                                    exif" Element "exifMetadata"

                                                    ) STORAGE Object-relational
                                                    TYPE "exifMetadata192_T"
 IPTCMETA                                           SYS.XMLTYPE(XMLSchema "http:

                                                    //xmlns.oracle.com/ord/meta/

                                                    iptc" Element "iptcMetadata"

                                                    ) STORAGE Object-relational
                                                    TYPE "iptcMetadataType94_T"
 XMPMETA                                            SYS.XMLTYPE(XMLSchema "http:

                                                    //xmlns.oracle.com/ord/meta/

                                                    xmp" Element "xmpMetadata")
                                                    STORAGE Object-relational TY

                                                    PE "xmpMetadataType100_T"
 IMAGE                                              ORDSYS.ORDIMAGE

If you want to go further, you can then describe the types that XML DB has created for the interMedia schema definition.

SQL> describe ORDSYS."iptcMetadataType94_T";
 ORDSYS."iptcMetadataType94_T" is NOT FINAL
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------

 SYS_XDBPD$                                         XDB.XDB$RAW_LIST_T
 recordVersion                                      NUMBER(38)
 objectName                                         VARCHAR2(4000 CHAR)
 editStatus                                         VARCHAR2(4000 CHAR)
 urgency                                            NUMBER(38)
 category                                           VARCHAR2(4000 CHAR)
 supplementalCategory                               ORDSYS.supplementalCategory9
                                                    5_COLL
 fixtureIdentifier                                  VARCHAR2(4000 CHAR)
 keyword                                            ORDSYS.supplementalCategory9
                                                    5_COLL
 contentLocation                                    ORDSYS.contentLocation97_COL
                                                    L
 instructions                                       VARCHAR2(4000 CHAR)
 dateCreated                                        DATE
 timeCreated                                        VARCHAR2(4000 CHAR)
 digitalCreationDate                                DATE
 digitalCreationTime                                VARCHAR2(4000 CHAR)
 byline                                             ORDSYS.byline99_COLL
 city                                               VARCHAR2(4000 CHAR)
 subLocation                                        VARCHAR2(4000 CHAR)
 provinceState                                      VARCHAR2(4000 CHAR)
 country                                            VARCHAR2(4000 CHAR)
 location                                           VARCHAR2(4000 CHAR)
 transmissionReference                              VARCHAR2(4000 CHAR)
 headline                                           VARCHAR2(4000 CHAR)
 credit                                             VARCHAR2(4000 CHAR)
 source                                             VARCHAR2(4000 CHAR)
 copyright                                          VARCHAR2(4000 CHAR)
 contact                                            ORDSYS.supplementalCategory9
                                                    5_COLL
 caption                                            VARCHAR2(4000 CHAR)
 captionWriter                                      ORDSYS.supplementalCategory9
                                                    5_COLL
 languageId                                         VARCHAR2(4000 CHAR)

You can go as far as you like down the tree, which resembles an XML tree, as some of the elements are expressed as types, collections, or collections of collections. For example, if we look at the keyword element, we see it is of type “ORDSYS.supplementalCategory95_COLL.” We can describe this type:

SQL> describe ORDSYS."supplementalCategory95_COLL"
 ORDSYS."supplementalCategory95_COLL" VARRAY(2147483647) OF
VARCHAR2(4000 CHAR)

This is a VARRAY type. Using XML schema definition XML DB annotations allows specification of the storage parameters for any of the types above. For example, if the captions in your images are greater than 4,000 bytes, you could specify storage of the caption in a CLOB instead of VARCHAR2(4000 CHAR).

To access information, you could use XML syntax as specified in a following section, or (not recommended) object-relational DOT notation used in types. It is best to refer to the XML elements in XML XPath format so that the XML schema storage can be changed without forcing the application to change. For example, the following SELECT statement is preferred.

SQL> select extract(iptcMeta,'/iptcMetadata/city/text()',
  2 'xmlns=http://xmlns.oracle.com/ord/meta/iptc')
  3  from ImagesWithMetadata;
Malibalipuram

The following is the nonpreferred DOT notation way to select the information (using the pseudo-type XMLDATA as the root element of the document). For example, if the storage of the XML document was changed to CLOB, this example SELECT would not work.

SQL> select t.iptcMeta."XMLDATA"."city" from
     ImagesWithMetadata t;
Malibalipuram

If you would like to override the storage of the XML data type, this can be done by modifying the schema provided by interMedia and registering this XML DB schema as a local schema in your own database schema. This schema would have to be compatible with the XML document returned from getMetadata(). The idea here would be to change the XML DB parameters to better suit your application, not to change the XML document structure itself. For example, to replace the schema definition with the contents of myOrdImage.xsd that will be used for future table creations that use the “http://xmlns.oracle.com/ord/meta/ordimage” schema, the following may be used:

BEGIN
  DBMS_XMLSCHEMA.registerSchema(
    SCHEMAURL => 'http://xmlns.oracle.com/ord/meta/ordimage',
    SCHEMADOC => bfilename('IMAGEDIR','myOrdImage.xsd'),
    LOCAL => TRUE,
    GENTYPES => TRUE,
    GENTABLES => FALSE,
    CSID => nls_charset_id('AL32UTF8'));
END;
/

The “STORE AS CLOB” clause can also be used to bypass the default-structured storage. This will cause the XML document to be stored in a simple CLOB. For example, if we wanted the IPTC data stored in a CLOB we would use the following syntax:

XMLTYPE COLUMN IPTCMETA STORE AS CLOB
  XMLSCHEMA "http://xmlns.oracle.com/ord/meta/iptc"
  ELEMENT "iptcMetadata"

The general idea of XML DB structured storage is to store part, or all, of the XML document in object-relational fields rather than embedded into a CLOB as raw XML data. By doing this, XML data can be changed, or used as an index, without repeatedly parsing the entire XML document, an expensive operation. The data can be retrieved at any time as an XML document by incorporating these fields into the output when the XML document is requested.

There are many differences and advantages and disadvantages to storing XML data as either structured or unstructured, too many to cover here. Please refer to the XML DB Developers Guide for more information.

The biggest advantages of using structured data are to be able to extract information, update the type and index more efficiently, as well as being able to index using a B*Tree or Bitmap index. Unstructured storage provides the most raw throughput of input and output of the entire document and is most flexible in what kind of document can be stored.

Extracting Image Metadata

All interMedia metadata is extracted in the form of XML database objects. All of the metadata present in the image is returned as a database XMLSequence type. This sequence is like an array of XMLType. Each XMLType in the sequence contains the XML for one of the metadata sources. To identify what the XMLType contains, use the XMLType getRootElement() method.

Image metadata is extracted with the interMedia getMetadata() method. This method returns an XMLSequenceType with the following possible metadata:

  • interMedia metadata (always returned, metadata from the interMedia object)

  • EXIF metadata

  • IPTC-IIM metadata

  • XMP metadata

If you want to use the non-interMedia object metadata for searching purposes, it must be placed in the database row. For interMedia metadata, you can either use the XML representation from getMetadata(), or create an index on the interMedia object attributes. The following example shows metadata being extracted from an image while the image is being loaded. The metadata is then placed into the row that is being created.

create or replace procedure load_img_and_metadata(id number,
                                                dir varchar2,
                                                  file_name
varchar2)
as
    imgobj     ORDImage;
    ctx        raw(64):=null;
    thisrowid urowid;
    meta       xmlType := null;
    xmeta      xmlType := null;
    imeta      xmlType := null;
    emeta      xmlType := null;
    metas      xmlSequenceType;
BEGIN
    insert into ImagesWithMetadata (item_id, image) values
         (id, ORDSYS.ORDImage.init())
        returning rowid, image into thisrowid, imgobj;

    imgobj.setsource('FILE', dir, file_name);

    imgobj.import(ctx);
     --
     -- Get the metadata, skipping interMedia metadata
     --
     metas := imgobj.getMetadata('exif'),
     if metas.count > 0 then emeta := metas(1); end if;
     metas := imgobj.getMetadata('iptc-iim'),
     if metas.count > 0 then imeta := metas(1); end if;
     metas := imgobj.getMetadata('xmp'),
     if metas.count > 0 then xmeta := metas(1); end if;

     --
     -- Put the metadata and the image into the row
     --
     update ImagesWithMetadata t set t.image=imgobj,
       t.exifMeta=emeta, t.iptcMeta=imeta, t.xmpMeta=xmeta
     where rowid = thisrowid;
     commit;

END;
/

Keywords and other information in the metadata can be selected from the metadata. Here are examples of selecting metadata information from XML data types.

  1. This example selects all the keywords for each row in the database. The keywords are returned as a concatenated string for each row:

    select t.xmpMeta.extract(
      '/xmpMetadata/rdf:RDF/rdf:Description/dc:subject/rdf:Bag/rdf:li/text()',
      'xmlns="http://xmlns.oracle.com/ord/meta/xmp"
            xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
            xmlns:dc="http://purl.org/dc/elements/1.1/" ').getStringVal()
         "Keywords" from
      ImagesWithMetadata t;
  2. This example selects all the keywords in the database row as individual rows in the result set for the image with the ID 1:

    select value(li).extract('rdf:li/text()',
      'xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"'
       ).getStringVal() "keywords" from ImagesWithMetadata t,
       TABLE(xmlsequence(extract(t.xmpMeta,
       '/xmpMetadata/rdf:RDF/rdf:Description/dc:subject/rdf:Bag/rdf:li',
       'xmlns="http://xmlns.oracle.com/ord/meta/xmp"
       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
       xmlns:dc="http://purl.org/dc/elements/1.1/" '))) li where t.id=1;
  3. This anonymous procedure extracts each keyword and prints it to the console:

    declare
       doc xmltype;
       cnt number;
       keyword varchar2(100);
    begin
    
       select xmpmeta into doc from ImagesWithMetadata where id = 1;
    
    
       select count(*) into cnt from TABLE(xmlsequence(extract(doc,
        '/xmpMetadata/rdf:RDF/rdf:Description/dc:subject/rdf:Bag/rdf:li',
        'xmlns="http://xmlns.oracle.com/ord/meta/xmp"
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        xmlns:dc="http://purl.org/dc/elements/1.1/" ')));
    
       for i in  1..cnt loop
         keyword := doc.extract(
        '/xmpMetadata/rdf:RDF/rdf:Description/dc:subject/rdf:Bag/rdf:li['
         || i || ']/text()',
               'xmlns="http://xmlns.oracle.com/ord/meta/xmp"
                xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
                xmlns:dc="http://purl.org/dc/elements/1.1/"
               ').getStringVal();
         dbms_output.put_line('keyword [' || i || '] = ' || keyword);
       end loop;
    end;
    /
    show errors;

Inserting Image Metadata

As well as extracting metadata, interMedia can insert metadata or modified metadata into the image. Only XMP metadata, the most flexible and extensible metadata standard, can be placed within the image binary using interMedia. After the image metadata is modified, the new values will be seen by users that access the image.

It is quite easy to insert XML metadata into an image, and in most cases there is existing metadata that is to be modified, so we will not have an example of inserting metadata into an image that has none.

The following example takes image XMP metadata from an original image and if it exists, places it in the thumbnail as well. This preserves the metadata.

DECLARE
   Img ORDSYS.ORDImage;
   Thumbnail ORDSYS.ORDImage;
   xmeta xmltype := NULL;
   metas xmlsequencetype;
BEGIN
  load_image1('t.img'),
  SELECT pict, thumb INTO Img, Thumbnail
       FROM imagemeta WHERE item_id = 1 for update;

  metas := img.getMetadata('xmp'),
  if metas.count > 0 then xmeta := metas(1); end if;

  Img.ProcessCopy('fileFormat=JFIF maxScale=800 800',
      Thumbnail);

  if not xmeta is null then
    Thumbnail.putMetadata(xmeta);
  end if;

  commit;

END;
/

A procedure can be written to add keywords to the XMP metadata. The following routine does this. It is a bit complicated, as it creates the XML fragment necessary for the keyword insertion. That is, it would be much simpler if we knew all the parent XML elements existed or at least one keyword was already present.

create or replace procedure add_keyword(id number, keyword varchar2)
as
    imgobj     ORDImage;
    thisrowid urowid;
    xmeta      xmlType := null;
    element    xmlType := null;
    prefix     varchar2(500) := '';
    suffix     varchar2(500) := '';
    elementXpath varchar2(500);
    tag       varchar2(20);

    dbg varchar2(2000);

    xmpNameSpaces varchar2(200) :=
       'xmlns="http://xmlns.oracle.com/ord/meta/xmp" ' ||
       'xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" ' ||
       'xmlns:dc="http://purl.org/dc/elements/1.1/"';

    ns_declarations varchar2(200) :=
          ' xmlns:dc="http://purl.org/dc/elements/1.1/" ' ||
          'xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" ';
BEGIN


    select image,  xmpMeta, rowid     into
           imgobj, xmeta,    thisrowid
    from ImagesWithMetadata where item_id=1 for update;


    elementXpath :=
              '/xmpMetadata/rdf:RDF/rdf:Description';

    -- Does the "rdf:Description node" exist?
    if  xmeta.existsNode(
          '/xmpMetadata/rdf:RDF/rdf:Description' ||
          '[namespace::* = "http://purl.org/dc/elements/1.1/"]'
          , xmpNamespaces) = 0
    then -- No, Start XML fragment starting at rdf:Description
         prefix := prefix || '<rdf:Description ' ||
                   ns_declarations || '>';
         suffix :=  '</rdf:Description>' || suffix;
         ns_declarations := ''; -- Namespaces have now been declared.
         if tag is null then tag := 'rdf:Description'; end if;
    else -- Yes rdf:Description exists, add to insert xpath
        elementXPath := elementXpath ||
            '[namespace::*="http://purl.org/dc/elements/1.1/"]';
    end if;

    if  xmeta.existsNode(
          '/xmpMetadata/rdf:RDF/rdf:Description' ||
          '[namespace::* = "http://purl.org/dc/elements/1.1/"]' ||
          '/dc:subject'
         , xmpNamespaces) = 0
    then -- No, Add XML dc:subject XML fragment
        prefix := prefix || '<dc:subject ' || ns_declarations || '>';
        suffix :=  '</dc:subject>' || suffix;
        ns_declarations := ''; -- Namespaces have now been declared.
        if tag is null then tag := 'dc:subject'; end if;
    else -- Yes, add to insert xpath
        elementXpath := elementXPath || '/dc:subject';
    end if;

    if  xmeta.existsNode(
          '/xmpMetadata/rdf:RDF/rdf:Description' ||
          '[namespace::* = "http://purl.org/dc/elements/1.1/"]' ||
          '/dc:subject/rdf:Bag'
         , xmpNamespaces) = 0
    then -- No, Add XML rdf:Bag XML fragment
         prefix := prefix || '<rdf:Bag ' || ns_declarations || '>';
         suffix :=  '</rdf:Bag>' || suffix;
         ns_declarations := ''; -- Namespaces have now been declared.
         if tag is null then tag := 'rdf:Bag'; end if;
     else -- Yes, add to insert xpath
         elementXpath := elementXPath || '/rdf:Bag';
     end if;
     --
     -- Finally put in keyword (but only if it does not already exist)
     --
     if xmeta.existsNode(
            '//dc:subject/rdf:Bag[rdf:li="' || keyword || '"]',
            xmpNamespaces) = 0
     then
         -- assemble XML fragment that needs to be inserted
         element := xmltype(prefix || '<rdf:li ' ||
             ns_declarations || ' >' ||
             keyword || '</rdf:li>' || suffix);
         if tag is null then tag := 'rdf:li'; end if;
         -- insert XML fragment
         select insertChildXML(xmeta,
              elementXpath,
              tag,
              element,
              xmpNamespaces)
         into xmeta from dual;
     end if;

     -- dbms_output.put_line(xmeta.getClobVal());
     imgobj.putMetadata(xmeta);

     update ImagesWithMetadata t set t.xmpMeta=xmeta,
           t.image=imgobj
     where rowid = thisrowid;
     commit;

END;
/
show errors;

Indexing Image Metadata

One of the advantages of using a database to store images is to be able to search for images in many ways, as well as having a centralized, manageable repository.

Indexing the XMLType may depend on how the XML table type is stored. The XMLType may be structured, that is shredded into components, or unstructured, that is any kind of XML data that can be described with an XML schema. Unstructured XMLType data is stored in a CLOB.

For structured XMLType data, indexes can be B*Tree, Bitmap, function-based, or text-based indexes. Text-based indexes can be either a more limited XML DB context index or a full-text index. For unstructured XMLType data, the indexes that can be used are function-based or text-based indexes (context and full-text indexes).

The IPTC-IIM and EXIF interMedia schemas are well structured, and have well-defined XML schemas, provided by interMedia, associated with them. They would be good candidates for XML DB-structured storage and indexing of structured fields.

The XMP standard allows for almost anything to be placed into the XML. The interMedia XMP schema definition allows all the endless possibilities to be stored in this schema. Because of this, the interMedia schema definition defines the XML data to be stored in a CLOB. To use the standard database indexes on XMP would require a user-based local schema definition to define structured storage for the fields to be indexed.

One side effect of storing XML in a CLOB, rather than structured fields, is that the XML will have to be parsed and the index value obtained from the XML data on a row insert/update. If the XML field is in structured storage, the value is simply obtained from the database field, and although the indexing syntax looks as if were function based, it is in actuality field based.

A query done without an index will result in a full table scan. Of course, this is very inefficient. For example:

SQL> explain plan for select item_id from ImagesWithMetadata where
  2          extractValue(IPTCMeta, '/iptcMetadata/location/text()',
  3                'xmlns="http://xmlns.oracle.com/ord/meta/iptc"') = 'India';
SQL>
SQL> select plan_table_output from
table(dbms_xplan.display('plan_table',null,'serial'));
Plan hash value: 474266177

----------------------------------------------------------------------------------------
| Id  | Operation         | Name               | Rows  | Bytes | Cost (%CPU)| Time      |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                    |     1 |     9 |     3    (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| IMAGESWITHMETADATA |     1 |     9 |     3    (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("IMAGESWITHMETADATA"."SYS_NC00738$"='India')

This plan indicates that an operation of “TABLE ACCESS FULL” will occur, this indicates the entire table will be scanned to find a row where the location is “India.”

The following example creates an index on XML data for the location of the picture, which makes finding a picture from a specific location much more efficient.

CREATE INDEX iptc_location_idx
  ON ImagesWithMetadata(extractValue(IPTCMETA,
    '/iptcMetadata/location/text()',
     ' xmlns="http://xmlns.oracle.com/ord/meta/iptc" ' ));

Using “explain plan,” we can see that this index will be a normal, more efficient database index search.

SQL> explain plan for select item_id from ImagesWithMetadata where
  2          existsNode(IPTCMeta, '/iptcMetadata/location/text()',
  3                'xmlns="http://xmlns.oracle.com/ord/meta/iptc"') = 'India';
SQL> select plan_table_output from
table(dbms_xplan.display('plan_table',null,'serial'));
--------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name               | Rows  | Bytes | Cost(%CPU)| Time       |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                    |     1 |     9 |     2    (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| IMAGESWITHMETADATA |     1 |     9 |     2    (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IPTC_LOCATION_IDX  |     1 |       |     1    (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("IMAGESWITHMETADATA"."SYS_NC00738$"='India')

One of the issues in indexing XML data is if the data to be indexed is one value or multiple values. Many index types do not allow for multiple values for a single database row. For example, the IPTC-IIM XML for an image may contain multiple keyword fields as in the following example.

<iptcMetadata xmlns="http://xmlns.oracle.com/ord/meta/iptc"
xsi:schemaLocation="http://xmlns.oracle.com/ord/meta/iptc
http://xmlns.oracle.com/ord/meta/iptc"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <recordVersion>2</recordVersion>
 <objectName>ar21319</objectName>
 <keyword>land</keyword>
 <keyword>monument</keyword>
 <keyword>nature</keyword>
 <keyword>scenery</keyword>
 <keyword>architectural</keyword>
 <keyword>architecture</keyword>
 <keyword>building</keyword>
 <keyword>place of worship</keyword>
 <keyword>religious building</keyword>
 <keyword>structures</keyword>
 <keyword>temple</keyword>
 <keyword>sacred place</keyword>
 <keyword>sanctum</keyword>
 <keyword>Asia</keyword>
 <keyword>India</keyword>
 <keyword>Malibalipuram</keyword>
 <keyword>Tamil Nadu</keyword>
 <keyword>night</keyword>
 <keyword>moonlight</keyword>
 <keyword>moon</keyword>
 <keyword>skies</keyword>
 <keyword>sky</keyword>
 <keyword>blue</keyword>
 <instructions>Newsmagazines Out</instructions>
 <dateCreated>2005-01-01</dateCreated>
 <byline>
   <author>Julie Doe</author>
   <authorTitle>Mugwum contract photographer</authorTitle>
 </byline>
 <city>Malibalipuram</city>
 <provinceState>Tamil Nadu</provinceState>
 <location>India</location>
 <transmissionReference>Sacred India</transmissionReference>
 <headline>Shore Temple, Malibalipuram, India</headline>
 <credit>Mugwum Press</credit>
 <source>Julie Doe / Mugwum Press</source>
   <copyright>©2005 Julie Doe / Mugwum Press, all rights
 reserved</copyright>
  <caption>The Shore Temple of the Seven Pagodas was built
 under Narsimha II of the Pallava dynasty between 7th and 8th
 century AD and is dedicated to Lord Shiva. It resembles the
 structure of the Dharmaraja rath, but its tower rises much
 higher (approx. five stories or ~ 60 ft. high) and its stupa
 spire is small and slender.</caption>
  <captionWriter>Jacques Brown</captionWriter>
</iptcMetadata>

If you were to try and create a B*Tree index on the keyword field, you would get an error message as is shown below.

SQL> CREATE INDEX iptc_keyword_idx
  2    ON ImagesWithMetadata(extractValue(IPTCMETA,
  3      '/iptcMetadata/keyword',
  4       ' xmlns="http://xmlns.oracle.com/ord/meta/iptc" '
));
CREATE INDEX iptc_keyword_idx
              *
ERROR at line 1:
ORA-02070: database does not support correlation in this
context

In the interMedia XMP XML schema, the keyword element is not specified, as the XMP type is quite flexible. However, it is typical for the keyword field to be present. An attempt to create a B*Tree index on this field would also be an error because the schema does not specify a single value for this field.

However, since the XML schema definition indicates that there can only be one <location> element, the following index creation will succeed because it refers to an XML field that can occur only once.

CREATE INDEX iptc_location_idx
  ON ImagesWithMetadata(extractValue(IPTCMETA,
    '/iptcMetadata/location',
     ' xmlns="http://xmlns.oracle.com/ord/meta/iptc" ' ));

If the XML to be indexed can result in multiple values, the XML DB context or full Oracle Text index can be used.

Oracle Text Indexing

Oracle Text can be used to index XML data. In the following example, a text index is used to create an index on XML data.

 create index iwnd_context_index on
      ImagesWithMetadata(IPTCMeta)
      indextype is ctxsys.CONTEXT
     PARAMETERS ( 'section group ctxsys.PATH_SECTION_GROUP' )
;

This index can now be used to perform a search on IPTC keywords as indicated below.

select item_id from ImagesWithMetadata where
contains ( iptcmeta, 'sky INPATH (/iptcMetadata/keyword) ') > 0;

Note that when the table is updated, the Oracle Text index must also be updated in another operation. In other words, the change of the Oracle Text index is asynchronous to the update of the indexed table. Oracle Text index updates are often performed on multiple updates rather than individual updates for performance reasons. These performance issues include index fragmentation and processing time. One way to synchronize the index is to call a procedure:

begin
    ctx_ddl.sync_index( 'iwnd_path_iptc_idx' );
end;
/

Also note that the Oracle Text index should be maintained for peak performance. It is typically necessary to rebuild the index to defragment the index and optimize query performance.

A context index (ctxsys.ctxxpath) is used to find columns that contain a particular xpath. As such, it acts as a primary filter to return only the rows that contain the xpath.

create index iwnd_path_iptc_idx
  on imagesWithMetadata( IPTCMETA )
  indextype is ctxsys.ctxxpath;

This will allow for index searches such as:

select item_id from ImagesWithMetadata where existsNode ( iptcmeta,
           '/iptcMetadata/keyword[ora:contains(text(),"sky")>0]',
           'xmlns:ora="http://xmlns.oracle.com/xdb"
            xmlns="http://xmlns.oracle.com/ord/meta/iptc"')
      > 0;

Note that the ctxxpath index is of limited use in cases where the xpath exists in all the indexed columns so its use here is a rather bad example if all of your images have IPTC keyword elements. It is best used for particular xpaths that occur rarely. The ora:contains clause is not evaluated with the index.

DICOM Metadata

The DICOM standard was created to standardize digital medical information. The information stored in a DICOM file can be images, waveforms (including audio waveforms), video, 3-D volumes, 3-D slices of volumes, and time-variant 3-D volumes. The DICOM standard replaced proprietary binary formats used by various medical equipment manufacturers for most of their medical diagnostic equipment.

Metadata support offered by interMedia allows for extraction of metadata from DICOM files. This information is handled in a similar manner to other types of image metadata by interMedia.

DICOM images can also be processed using interMedia like any other image format. For example, DICOM images can be converted to more Web-friendly formats, like JPEG, for easier perusal on Web pages.

DICOM metadata is stored in a non-XML format within the DICOM image. For interMedia, the DICOM metadata is extracted in an XML document form for easier processing. The namespace for this XML format is “http://xmlns.oracle.com/ord/meta/dicomImage.”

Creating a table to contain a DICOM image is almost exactly like creating a table for a nonmedical image as is shown in the following example.

create table diagnosticImages(id number primary key,
                                 desc VARCHAR2(80),
                                 DicomMeta XMLType,
                                 image ORDSYS.ORDIMAGE,
                                 thumb ORDSYS.ORDIMAGE)
LOB (image.source.localdata) -- store images with 32K chunk
  STORE AS (chunk 32K)
LOB (thumb.source.localdata) -- but the thumbnails with only
16K
  STORE AS (chunk 16K)
-- and bind the XMLType columns to the interMedia metadata
columns
XMLType column metadata
  XMLSCHEMA "http://xmlns.oracle.com/ord/meta/dicomImage"
  ELEMENT "DICOM_IMAGE";

The diagnosticsImages table will hold an image, which will include a DICOM image, a description, the DICOM metadata for the image, and a thumbnail of the image. The thumbnail will be converted to JPEG format to view on the Web. We declare the DICOM metadata column as having the interMedia DICOM namespace “http://xmlns.oracle.com/ord/meta/dicomImage” with the root element being DICOM_IMAGE. A procedure to populate a row into this table is given below.

create or replace procedure load_dicom_image(id number,
file_name varchar)
as
    img       ORDSYS.ORDImage;
    thmb       ORDSYS.ORDImage;
    ctx       raw(1):=null;
    dicomMetadata XMLTYPE := null;
    thisrowid urowid;
BEGIN

   -- Insert new row into the database
   insert into diagnosticImages (id, DicomMeta, Image, Thumb)
values (
          id,
          null,
          ORDSYS.ORDImage.init(),
          ORDSYS.ORDImage.init())
      returning rowid, image, thumb  into thisrowid, img,
thmb;

    -- Load the image file into the DB
    img.setsource('FILE', 'DICOMIMAGEDIR', file_name);
    img.import(ctx);

    -- extract DICOM metadata,
    dicomMetadata := img.getDicomMetadata('imageGeneral'),
    IF (dicomMetadata IS NULL) THEN
      DBMS_OUTPUT.PUT_LINE('metadata is NULL'),
    END IF;

    -- Create thumbnail
    img.processCopy('fileFormat=JFIF maxScale=800 800',
thmb);

    -- Update the table with the images and metadata
    update diagnosticImages t
         set t.image=img, t.thumb=thmb,
dicomMeta=dicomMetadata
    where rowid = thisrowid;
    commit;
END;
/

Note that for now the only type of metadata that can be extracted is the type “imageGeneral.”

There is no corresponding putDicomMetadata() method. The reason for this is that the DICOM specification discourages modification of DICOM metadata once the file is created.

Summary

Image metadata can be used effectively with a database. By using interMedia to extract metadata, this information become available to the database. The database XML facilities can be used to manipulate metadata, which then can be inserted into the image.

Extracted metadata can be used in a database index, so that relevant images can be found efficiently. Using XML facilities of the database, fields of interest can be extracted and placed into traditional database columns, that can be indexed, or the XML metadata can be indexed directly.

The amount of metadata within DICOM medical images is extensive. interMedia provides support for extraction of this DICOM metadata. This extensive amount of metadata can best be managed with a database.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.94.190