11

SEARCHING

11.1 INTRODUCTION

We now consider different methods of searching large amounts of data to find one particular piece of information. This chapter focuses on four general types of searching, that is, sequential, binary, indexed search, and hashing schemes. The purpose here is not to present an exhaustive review of all possible search techniques, but to highlight how the main technique can be implemented.

We first define some common terms. A table or a file is a collection of elements, each of which is called a record. Each record consists of a set of fields. One of the field has a special meaning called key field. Such key is also called an internal key. Sometimes there is a separate table of keys that include pointers to the records. Such keys are called external keys. For each file, there is at least one unique key, that is, no two records have the same key value and it is called a primary key. However, keys need not always be unique. Such a key is called a secondary key.

A searching method is a process that accepts an argument x and tries to locate a record whose key value is x. The process may return the entire record or it may return the address of the record. If searching for a record is unsuccessful, then a null value will return. We start our discussion with the simplest searching technique which is known as sequential search.

11.2 SEQUENTIAL SEARCH

A search that looks through a list from the top to bottom while checking each item for a match is a sequential search or linear search. This search is applicable to a table organised either as an array or as a linked list. The item to be matched is called a key. Let us assume that x is an array of n keys, x[0] through x[n-1], and r an array or record, r [0] through r[n-1], such that x[i] is the key of r[i]. The search begins at array index 0 and ends either when a match is found or the end of the array is reached. We want to return the smallest integer i such that x[i] equals key if such an i exists and -1 otherwise. The program segment may look as follows:

      for(i=0; i<n ; i+1)
 
            if(key == x[i])
  
                   return(i);
  
            return(-1);

The program segment is simple enough. It examines each key in turn and its index is returned if it matches with the search argument, otherwise -1 is returned.

The sequential search operates with the fact that the entries in the list are in order. If the key is the last entry in the list or if the key is larger than all the entries, then the algorithm will do n comparisons for n elements.

The following algorithm illustrates the idea of sequential search.

Algorithm 11.1: Sequential search

Input: An array A with n elements (1,…, n) and x is the item to be searched.

Output: The location of x in array A; otherwise return 0 if x is not found.

Step 1: Set i=l
Step 2: While i<=n and A [i] ≠ x do
Step 3: Set i = i+1
  end while
Step 4: if i>n then set i =0
  end if
Step 5: stop

The sequential search algorithm terminates with index i equal to the index of the first occurrence of x in A, if x is present, and equal to 0 otherwise.

The sequential-search algorithm is implemented with a C program as follows:

Example 11.1

 # include<stdio.h>
 /* Program for Sequential search */
 #define N   7
 int A[ ] = {20,17,18,7,5,6,19};
 main( )
 {
 int x, y;
 printf(“

 Enter the item to be searched :
”);
 scanf(“%d”, &x );
 y=sequential(x);
 if (y == -1)
 printf(“
 Element is not found 
”);
 else
 printf (“
 Element is found and its position is %d
”, y);
 }
 /* Function sequential */
 sequential(key) 
 int key;
 {
      int i;
      for(i =0; i<N ; i++)
      if(key == A[i])
            return (i); 
      return(-1);
  
 }

A sample run is shown below:

Input:    17

Output: Element is found and its position is 1

Input:    16

Output: Element is not found

11.3 BINARY SEARCH

The most efficient method of searching a sequential table without the use of indices or tables is the binary search. This searching technique is a divide-and-conquer method applicable to sort data items. The data can be in array or a file. When the data is a structure, a key field is used. The primary limitation of the method is that the data must be in sorted order. Let us first explain why this method is better than the sequential method.

Consider an array of elements in which they have been placed in some order. For example, a dictionary or telephone book may be thought of as an array whose entries are in alphabetical order. Suppose we want to search a name for finding his/her telephone number in a telephone book or a word in a dictionary. In sequential search, each item of the array is examined in turn and compared with the item being searched for, until a match occurs. If the list is unordered, the sequential search may be the only way to find anything in it. Such a method would never be used in looking for a name in the telephone directory since it may complicate the searching process. Instead of this, the book is opened to a random page and the names on that page examined. Since the names are ordered alphabetically, such an examination would determine whether the search should proceed in the first half or second half of the book. This is the basic idea of the binary search.

In the binary search, an array is divided into two parts. Now, compare the item being searched for with the item at the middle of the array. If they are equal, the search has been completed successfully. If the middle element is greater than the item being searched for, the search process is repeated for the left half of the array since it may appear anywhere in the first half. On the other hand, if the middle item is less than the item being searched for, then the process is repeated in the second half. This method reduces the number of elements yet to be searched into half for each comparison. For large arrays, this method is superior to the sequential search in which each comparison to the sequential search reduces the number of elements yet to be searched by onlyone. Let us illustrate the method with the following example.

Suppose we want to locate the data associated with key value 1649. We begin the search in 1129 1203 1211 1519 1609 1649 2821 3575 9279 9289

the middle of the list. In our example, first we search the key found at position 5. The key value 1649 is greater than the key (i.e., 1609) at position 5, we conclude that the key we want to find should be in between 6 through 10 if it is to be found at all. We again divide by of sum 6 and 10 and find that the key at position 8 is greater than the actual key. This process continues until we find the key 1649 at position 6.

11.3.1 Algorithm Binary Search

Input: A is an ordered array with n entries and x is the item sought.

Output: If x is not in A, return o; otherwise return the position of x in A.

Step 1: Set 1=1 and r=n
  / * 1 and r indicate the indices of the first */
  /* and last entries of the array respectively */
Step 2: Set flag = 0
Step 3: While 1 <=r and flag = 0 do
Step 4: set i = (1+r)/2 / * Index of middle entry */
Step 5: of x = A [i] then flag =1
  else
  if x <A [i] then r = i-1
  else 1 = i+1
  end if
  end while
  if not flag then i=0
  end

The idea behind the binary search algorithm is to eliminate half the entries with each comparison. Initially, we compare the searching element, say x, to the middle element of the list. If x is larger than the middle element of the list, then it is in the second half. If x is smaller than the entry in the middle of the list, the second half of the list is eliminated from consideration. If x is equal to the middle entry of the list, the searching process terminates. In each comparison the size of the section of the list that may contain x is cut in half. The same process continues until x is found or it is determined that x is not in the list. The binary-search algorithm requires Log2n comparison whereas the sequential or linear search requires n comparisons in the worst case.

The following program implements the binary-search algorithm.

Example 11.2

      #include<stdio.h>
      /* Program for Binary search */
      int x[ ] = {1,5,7,8,10,12,14,17,18,20,22,23,24};
      int n =12; 
      main ( )
            {
 
                  int y, i, key;
                  printf(“
 Enter element to be searched 
”);
                  scanf(“ %d", &key ); 
                  clrscr( );
                  for( i = 0 ; i <= 12; i++ )
                      printf(“%3d”, x[i]);
                  printf(“


”);
                      y= binary search(key);
                      printf(“

 % d
”, y);
            }
            /* Binary search function */
            binarySearch(key)
            int key;
            {
                  int l, r, m;
                  l = 0;
                  r= n+1;
                  while( r!=(l+l))
                      {
                           m=(l+r)/2;
                           if (x[m]<= key )
                                 l = m;
                           else
                                 r= m;
                  }
                  if (x[1]==key )
                  { printf(“
 Element is found and its position is
                  = %d
” ) ;
                  }
                  else
                  {
                      printf(“
 Element is not found 
”);
                  }
            }

A sample run is given below:

Input:

Enter element to be searched

17

Output:

1   5   7   8   10   12   14   17   18   20   22   23   24

Element is found and its position is = 7

11.4 INDEXED SEQUENTIAL SEARCH

Indexed sequential search approach is especially applicable in searching direct access secondary storage devices. The idea behind the use of an index is similar to the way we use an address book to find a person. This search technique generates record addresses by means of a table, called an index, which stores the key values and relative addresses of the records in a file. Given a key value, its relative address is sequentially searched for in the index, and if it exists, its relative address is picked from the index and the corresponding record is directly accessed using this relative address. A considerable amount of time is saved, but extra space is needed to store the index. It is rather like the use of a card index in a library. The user looks up for the name of the book he wants in the card index, and the index gives the catalogue number, which is like a relative address of the position on the shelves.

The algorithm used for index search is straightforward. The general strategy of an indexed search is to use the key to search the index, find the relative record position of the related data, and from there make only one access into the data. Since the parallel lists of keys and relative record positions require much less storage than what the data do, frequently the entire index can be loaded and permanently held in the main memory so that only one disk is accessed for each record being sought. For large indexes, large blocks of keys and associated pointers are to be manipulated for better searching efficiency.

The strategy for conducting indexed sequential search method (ISAM) is based on the following.

* Scan the index table for a key that is greater than or equal to the given key that is to be searched.

* Next follow the corresponding pointer to search sequentially for the corresponding key.

The following program implements the indexed sequential search technique for finding the target key.

Example 11.3

--------------------------------------------------------
                  Indexed sequential search
--------------------------------------------------------
      #include <stdio.h> 
      struct INDEX
            {
                  int kindex;
                  int*pindex;
      };
      struct INDEX index[10];
      int datafile[25], n, size;
      void main ( )
      {
            int i, j=0, found, x, indxsize, key;
            int * low, * high, *temp;
            clrser ( );
            printf(“Enter how many elements in the file (Not more
            than 25) = = ==>”);
            scanf(“%d”, &n );
            printf(“
 Enter the data: In sorted order 
”);
            for(i=0; i<n, i++)
                  scanf (“%d”, &datafile[i]);
            printf(“Enter the size of the index table = = = >”);
            scanf(“%d”, &size);
            clrscr( );
            printf(“
 The input data file of integers is as
            follows:”);
            printf(“
 = = = == = = = = == = ====
”);
            for( i=0, i<n; i++)
                 {
                 printf(“
 At address”);
                 printf(“%u<===>Element %d” --->%d”,&datafile(i),
                 i + 1, datafile[i]);
      for( i=size-1; i<n; i+=size)
            {
            index [j].kindex = datafile[i];
            index [j].pindx = datafile[i];
            j++;
      }
      if (n%size!=0)
      {
      index[j].kindex=datafile[n-1];
      index[j].pindex=& datafile[n-1];
      j++ ;
      }
      indxsize=j;
      printf(“
 The Index Table Form the Input file as follows:”);
      printf(“
 ======================================

”);
      printf(“DATA FIELD ADDRESS FIELD (POINTER)
”)
      printf (“________________ ”);
      for(i=0; i<j; i++)
      {
      printf(“
%5d%17u”, index [i] , kindex, index [i].pindex);
      }
      printf(“
 Enter the number to be searched = = = >”);
      scanf(“%d”; &key);
      found=0 ; x=0;
      while(i<indxsize) &&(found == 0))
      {
            if(index[x].kindex<key)
                 found=1;
            else
                 x=x+1;
      }
      if(x== 0)
         low = &datafile[0];
      else
         low = index[x-1].pindex;
      if(found ==1)
            high = index[x].pindex-1;
      else
      high=&datafile[n-1];
      temp = low;
      found =0;
      while(( temp <= high) && (found == 0))
      {
      if (*temp== key)
         found = 1;
      else
         temp++;
      }
      if(found ==1)
      printf(“
 Found at address %u: Element is %d”, temp, *temp);
      else
      printf(“
 Data not found”);
      getch( );
      }

Enter how many elements in the file (Not more than 25) =======>10

Enter the data: In sorted order

12 19 23 35 42 46 55 67 70 85

Enter the size of the index table===>4

The input data file of integers is as follows:

=================================

At address 1810 <=====> Element1 ——>12

At address 1812 <=====> Element2 ——>19

At address 1814 <=====> Element3 ——>23

At address 1816 <=====> Element4 ——>35

At address 1818 <=====> Element5 ——>42

At address 1820 <=====> Element6 ——>46

At address 1822 <=====> Element7 ——>55

At address 1824 <=====> Element8 ——>67

At address 1826 <=====> Element9 ——>70

At address 1828 <=====> Element10 ——>85

The index table formed from the input file is as follows:

=========================================

DATA FIELD ADDRESS FIELD (POINTER)
35 1816
67 1824
85 1828

Enter the number to be searched ====> 55

Found at address 1822: Element is 55

11.5 HASHING SCHEMES

In the previous method we assumed that the record being sought is stored in a table and it is necessary to pass through some number of keys before finding the desired one. In hashing methods we would like to have a table organization and search technique in which no unnecessary comparisons are needed for finding the desired key. If each key is to be retrieved in a single scan, the location of the record within the table can depend only on the key.

When the number of keys actually stored is relatively small to the total number of possible keys, hash tables become an efficient alternative to directly addressing an array, since a hash table typically uses an array of proportional size to the number of keys actually stored. For example, in an inventory control application, product identification codes are numbered from 10,000 onwards. We need to locate its position at (key-9999) in the list. The method is known as key-to-address transformation or hash function.

With direct addressing, an element with key k is stored in slot k. With hashing, this element is stored in slot h(k), that is, a hash function h is used to compute the slot from the key k. Unfortunately, it is also possible that two keys may hash to the same slot— a collision. Fortunately, there are effective techniques for resolving the conflict created by collisions.

A good hash function satisfies the assumption of simple uniform hashing, that is, each key is equally likely to hash to any of the m slots. Most hash functions assume that the keys are natural numbers. Various techniques are available for generating the hash functions.

In the division method for creating hash functions, we map a key k into one of the m slots by taking the remainder of k divided by m, that is, h(k) =k mod m. For example, if the hash table has a size 20 and the key value is 119 then h(k)=5. It requires only division operation and naturally works fast. Note that for a key value of 118 the hash function h(k) is same as for the key value 119. This indicates a collision. To avoid collision, it may be better to select the value of m as prime number.

In some applications, the key value is not an integer. For example, in the case of an insurance number, folding technique for creating hash function works better than the other techniques. The insurance number

567-96-1505

In the shift-folding method, the above number is viewed as three separate numbers to be added

  567

    96

1505

2177

The above number can be treated as the hash position itself or further hashing technique such as division remainder to get a final hash position in the desired range.

The shift-folding method has a great advantage for its ability to transform non-integers keys into integers suitable for further hashing action.

We now present the simplest collision-resolution technique, called chaining. In chaining, we put all the elements that hash to the same slot in a linked list. In a hash table T with m slots that stores n elements, the average number of elements stored in a chain is n/m. In worst case, all n keys hash to the same slot, creating a list of length n.

The following program illustrates the idea of searching a hash table using chaining techniques.

Example 11.4

---------------------------------------------------------
       HASH-TABLE SEARCHING BY THE METHOD OF CHAINING         
---------------------------------------------------------
      #include<stdio.h>
      #include<stdlib.h>
      #incude<alloc.h>
      #define B 13
      struct nodetype
      {
      int data;
      struct node * link;
          } ;
      typedef struct nodetype node;
      node *arr[15];
      int num, count=0; 
      char ch;
      node *root, *1, *ptr;
      void display( );
      node *search(int);
      void main( )
      {
      int i, j, item, y;
      randomize*( ); clrscr( );
      for(i=0; i<B, i++) arr[i]=NULL;
      printf(“How many data to be inserted in the hash table=====>*);
      scanf(“%d”, &num);
      for(i=0; i<num ; i++)
      {
             item = random(100);
             y=hash(item);
             if(arr[y]==NULL)
                 {
                   root = (node*) malloc(sizeof(node));
                   root—>data = item;
                   root—>link =NULL;
                   arr[y]= root;
                        }
      else
                  {
                   root=(node*) malloc sizeof(node));
                   root—>data = item;
                   root—>link = NULL;
                   ptr = arr[y];
                   while (ptr—>link! =NULL)
                        {
                             ptr=ptr—>link;
                        }
                   ptr—>link = root;
                  }
      }
      printf(“The hash table is as follows:
“);
      printf(“=======================
”);
      display( );
      printf(“ 
 Enter the data to be searched ======>”);
      scanf(“%d, &item);
      prt= search(item);
      if(ptr! =NULL)
      printf(“
 Data found: At position %d in hash table [%2d]”,
      count, hash(item));
      else
      printf(“Data not found”);
      getch;
      }
      hash(int n)
      {
      int p;
      p=n%B;
      return(p);
      }
      void display( );
      {
      int i;
      for (i=o; i<B; i++)
      {
            root= arr[i];
            printf(“Hash table [%2d] ===>”, i);
            while (root! NULL)
                  {
                        printf(“%d —>”, root—>data );
                        root=root—>link;
                  }
                  printf(“NULL
”);
            }
      }
      node *search(int n)
      {
      int found, i;
      node *p, *q;
      found =0;
      i=hash(n);
      p=arr[i];
      while(( p!=NULL) &&(!found))
      {
             count++;
             if (p—>data ==n)
               found =1;
             else
             p=p—>link;
      }
      if(found==1)
        return(p);
      else
        return(NULL);
      }

How many data to be inserted in the hash table =====>20

The hash table is as follows:

=========================

Hash table [0]======>91——>39——>NULL

Hash table [1] =====>NULL

Hash table [2] =====>41——>2——>NULL

Hash table [3] =====> 3——>3——>NULL

Hash table [4]=====>43——>82——>56——>NULL

Hash table [5]=====> 70——96——>57——>18——.NULL

Hash table [6] =====> 19 ——>45——>NULL

Hash table [7] =====>59——> NULL

Hash table [8] =====> 47——>47——>NULL

Hash table [9] =====>NULL

Hash table [10] ====>NULL

Hash table [11] =====>11——>NULL

Hash table [12] =====>77——.NULL

Enter the data to be searched =====> 57

Data found: At position 3 in hash table [5]

How many data to be inserted in the hash table ======> 20

The hash table is as follows:

=========================

Hash table [0] =====> 52——>0——>NULL

Hash table [1] =====>53——>14——>92——>NULL

Hash table [2] =====> NULL

Hash table [3] =====> 94——>42——>NULL

Hash table [4] =====> 82——> NULL

Hash table [5] =====>NULL

Hash table [6] ====> 84——>45——>84——>45——>NULL

Hash table [7] ====> 98——>72——>NULL

Hash table [8] =====>NULL

Hash table [9] =====>48——>NULL

Hash table [10] ====> 49——>NULL

Hash table [11] ====>11—>37——>NULL

Hash table [12] ====>25——>25——>NULL

Enter the data to be searched =====>67

Data not found.

In the next chapter, another important topic, trees, of data structure will be discussed.

EimagesXimagesEimagesRimagesCimagesIimagesSimagesEimagesS

  1. What are the advantages and disadvantages of the Sequential search algorithm?
  2. Binary search, a technique that takes advantage of the stored order of the list, takes an amount of work 0(log2 n). What is the maximum number of probes made by a binary search in a list of 128 elements?
  3. Determine the list size for which binary search becomes more efficient than sequential search?
  4. What are the main advantages of indexed sequential search over sequential search?
  5. The division hash function H(k) = k mod m, is usually a good hash function if m has no small divisors. Explain why this restriction is placed on m.
  6. A perfect hash function is one that causes no collisions. How many probe(s) is / are needed to locate an element that has a given key value.
  7. When the perfect hashing functions are feasible?
  8. Open address hashing method attempts to place second and subsequent keys that hash to the open table location into some other position in the table that is unoccupied. What are the main drawbacks of this method??
  9. External chaining has a linked list associated with each hash table address. Each element is added to the linked list at its home address. What are the advantages of external chaining over open address hashing technique?
  10. With a table size 50000, after how many insertion operations does, hashing with open addressing display the same behavior as binary search?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.172.115