19.5. Test Results

The benchmarking tests were run on an Intel Pentium III server with 450MHz, 256MB RAM, and two mirrored UW-SCSI hard disks. The clients included an Intel Pentium III processor with 350MHz and 128MB RAM. They were connected to the server by a 10-Mbit local area network. The storing strategies used a well-known relational DBMS, object-oriented DBMS, directory server, and native XML DBMS (for legal reasons the products cannot be named).

The documents used were automatically generated based on a DTD that defines the structure of project descriptions. The DTD contains 26 elements with a maximum nesting size of 8 and 4 attributes. XML documents based on this DTD contain information on the project members, such as names and addresses, as well as information on the publications, such as articles. This allows us to easily produce different-sized documents.

To compare the different database types, we run a set of tests that do the following:

  • Store XML documents

  • Extract complete XML documents

  • Delete complete XML documents

  • Extract parts of documents identified by the position of elements in the document

  • Replace parts of documents

The benchmarks were specified and run by Fellhauer (see Fellhauer 2001 for the complete list of test results).

19.5.1. Evaluation of Performance

We ran the benchmarks five times with every document of every size, dropped the best and the worst results, and took the average of the remaining three values as the result value. Table 19.7 shows the results of storing XML documents.

All figures in Table 19.7 measure the times for inserting the XML document into the database including the time consumed by the DOM parser. The object-oriented database is the best. The native XML database shows a relatively high growth rate, which we could not confirm for larger documents because of the limited space of our test installation—an 11MB document took 35MB disk space of the 50MB space of the test version. Almost surprising is the bad result of storing the 25MB document into the typed relational database: It took more than 12 hours. We would like to investigate this further to determine if the number of tables involved is the reason. We were not able to store the 64MB document in the databases due to the 256MB main memory. The DOM tree could be built, but the traversal of the tree resulted in permanent swapping. Table 19.8 shows the results of extracting complete XML documents.

Table 19.7. Test Results for Storing XML Documents
Size of XML DocumentsDirectory ServerNon-Typed Relational DatabaseTyped Relational DatabaseObject-Oriented DatabaseNative XML Database
125 KB18.1 s19.8 s28.3 s4.3 s10.5 s
500 KB55.7 s42.2 s61.5 s9.5 s38.1 s
2,000 KB90.0 s74.4 s123.6 s20.7 s166.3 s
8,000 KB361.0 s251.4 s983.6 s107.6 s906.8 s
16,000 KB386.8 s713.4 s2,774.7 s213.9 s[*]
32,000 KB1,512.7 s> 12 hours> 12 hours1,167.3 s[*]
64,000 KB> 12 hours> 12 hours> 12 hours> 12 hours[*]

[*] Failed because of license restrictions.

The native XML database shows the best results, better than the directory server, which is known for its fast read access. It is not surprising that the relational databases consume a lot of time for reconstructing the documents. We do not know whether the size of the unique element table in the nontyped relational database will produce a result worse than the typed relational database result. The number of tables could influence the results for the 8MB document, especially when we apply proprietary statements for selecting hierarchically connected entries.

To extract parts of the XML documents, we ran queries that determine elements in the XML document and return these elements or parts of their content. All databases show similar results independent of the size of the document; there is no sequential search. The relational databases are the fastest, where the difference between the nontyped and the typed approaches is due to the runtime of consulting the DTD. Table 19.9 shows the test results for the query “Select the first heading of the third publication of a project determined by a project number.”

The nontyped relational database shows surprisingly good results especially for the selection of small document parts. It leaves the directory servers, which are known for fast read accesses, far behind. The poor results of the object database are caused by the reconstruction and searching of the DOM tree.

Table 19.8. Test Results for Extracting Complete XML Documents
Size of XML DocumentsDirectory ServerNon-Typed Relational DatabaseTyped Relational DatabaseObject-Oriented DatabaseNative XML Database
125 KB11.8 s26.9 s31.0 s12.9 s8.7 s
500 KB22.2 s81.1 s75.9 s26.5 s9.5 s
2,000 KB39.1 s307.4 s275.6 s49.9 s12.7 s
8,000 KB153.4 s2,369.7 s1,620.3 s175.2 s28.7 s
16,000 KB206.7 s--232.2 s-
32,000 KB413.4 s--904.2 s-

Table 19.9. Test Results for Extracting Parts of XML Documents
Size of XML DocumentsDirectory ServerNon-Typed Relational DatabaseTyped Relational DatabaseObject-Oriented DatabaseNative XML Database
125 KB3.7 s0.2 s3.4 s12.9 s8.9 s
500 KB3.6 s0.2 s3.6 s24.9 s9.5 s
2,000 KB3.7 s0.2 s3.5 s45.4 s12.4 s
8,000 KB3.8 s0.2 s3.6 s154.4 s235.1 s
16,000 KB3.6 s0.2 s3.5 s199.7 s-
32,000 KB3.6 s--396.7 s-

Finally, Table 19.10 shows the results of updating parts of an XML document—for example, the whole person element determined by the last name and the project number should be replaced by a new person element given as are XML document.

The native XML database shows dramatically decreasing performance. The poor results of the object database are due to the bad performance of the search functions applied to the persistent DOM tree.

Table 19.10. Test Results for Replacing Parts of an XML Document
Size of XML DocumentsDirectory ServerNon-Typed Relational DatabaseTyped Relational DatabaseObject-Oriented DatabaseNative XML Database
125 KB17.8 s7.3 s5.9 s809.0 s19.6 s
500 KB17.3 s7.2 s5.8 s798.9 s52.0 s
2,000 KB17.2 s7.3 s5.8 s798.7 s195.9 s
8,000 KB17.2 s7.2 s5.8 s794.4 s198.9 s
16,000 KB17.1 s7.4 s5.6 s796.7 s692.3 s
32,000 KB16.9 s--795.8 s-

19.5.2. Evaluation of Space

Table 19.11 shows the disk space each database management system uses to store the schema and the XML documents of different sizes. The typed relational database approach defines a table for each element type, which increases space for the schema. However, disk space to store the XML documents is very efficient.

The directory server and the native XML database produce a lot of overhead, when compared to the original XML document.

19.5.3. Conclusion

The benchmarks have shown that the nontyped relational database approach has advantages over all other solutions. The weak point is the reconstruction of complete XML documents, which should be improved. As long as a standardized XML query language does not support inserting and updating functionality, the reconstruction of XML documents will be an important operation.

We do not know whether the bad results of the searching function of the object-oriented database system are representative for this database type. But it supports our belief that searching large object trees will cause large loading times due to the techniques the systems apply. To avoid this, special indexing techniques like B-trees have to be applied. Although content management systems based on object-oriented databases implement this improvement, we used the bare object-oriented database approach to show its capabilities.

Table 19.11. Disk Space Usage in Kilobytes
Size of XML DocumentsDirectory ServerNon-Typed Relational DatabaseTyped Relational DatabaseObject-Oriented DatabaseNative XML Database
Schema/0 KB4406723,4845125,178
125 KB2,064117118615760
500 KB4,9528205461,5362,088
2,000 KB11,2403,1642,1823,7387,060
8,000 KB40,84815,15710,54713,72228,780
16,000 KB53,28026,56218,47622,630> 50,000[*]
32,000 KB120,024--43,991-
64,000 KB-----

[*] Space restriction of 50 MB.

The directory server shows disappointing results compared to the relational database. The expected very fast reading times could not be achieved. There might be some improvements in the mapping of the DOM tree into the directory information tree, too. Also the space consumed by the directory server is critical. Additional experiments will be necessary to determine the reasons.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.249.105