INTRODUCTION

THIS IS THE FIFTH EDITION OF A BOOK that has proven popular with professional developers and academic institutions. It strives to impart knowledge on a subject that at first was seen by some as just another fad, but that instead has come to maturity and is now often just taken for granted. Almost six years have passed since the previous edition — a veritable lifetime in IT terms. In reviewing the fourth edition for what should be kept, what should be updated, and what new material was needed, the current authors found that about three-quarters of the material was substantially out of date. XML has far more uses than five years ago, and there is also much more reliance on it under the covers. It is now no longer essential to be able to handcraft esoteric configuration files to get a web service up and running. It has also been found that, in some places, XML is not always the best fit. These situations and others, along with a complete overhaul of the content, form the basis for this newer version.

So, what is XML? XML stands for eXtensible Markup Language, which is a language that can be used to describe data in a meaningful way. Virtually anywhere there is a need to store data, especially where it may need to be consumed by more than one application, XML is a good place to start. It has gained a reputation for being a candidate where interoperability is important, either between two applications in different businesses or simply those within a company. Hundreds of standardized XML formats now exist, known as schemas, which have been agreed on by businesses to represent different types of data, from medical records to financial transactions to GPS coordinates representing a journey.

WHO THIS BOOK IS FOR

This book aims to suit a fairly wide range of readers. Most developers have heard of XML but may have been a bit afraid of it. XML has a habit nowadays of being used behind the scenes, and it’s only when things don’t work as expected or when developers want to do something a little different, that users start to realize that they must open the hood. To those people we say: fear no longer. It should also suit the developer experienced in other fields who has never had a formal grounding in the subject. Finally, it can be used as reference when you need to try something out for the first time. Nearly all the technologies in the book have a Try It Out section associated with them that first gets you up and running with a simple example and then explains how to progress from there.

What you don’t need for this book is any knowledge of markup languages in general. This is all covered in the first few chapters. It is expected that most of the readership will have some knowledge of and experience with web programming, but we’ve tried to spread our examples so that knowledge could include using the Microsoft stack, Java, or one of the other open source frameworks, such as PHP or Python.

And just in case you are worried about the Beginning part of the title, that’s a Wrox conceit that applies more to the style of the book than to your level of experience. Many of the concepts covered, especially in later chapters, are from the real world and are far from the Hello World genre.

WHAT THIS BOOK COVERS

This book aims to teach you all you need to know about XML — what it is, how it works, what technologies accompany it, and how you can make it work for you, from simple data transfer to a way to provide multi-channeled content. The book sets out to answer these fundamental questions:

  • What is XML?
  • How do you use XML?
  • How does it work?
  • What can you use it for?

The basic concepts of XML have remained unchanged since their launch, but the surrounding technologies have changed dramatically. This book gives a basic overview of each technology and how it arose, but the majority of the examples use the latest version available. The examples are also drawn from more than one platform, with Java and .NET sharing most of the stage. XML products have also evolved; at one time there were many free and commercial Extensible Stylesheet Language Transformation (XSLT) processors; for example, XSLT is used to manipulate XML, changing it from one structure to another, and is covered in Chapter 8, but since version 2 appeared the number has reduced considerably as the work needed to develop and maintain the software has risen.

HOW THIS BOOK IS STRUCTURED

We’ve tried to arrange the subjects covered in this book to lead you along the path of novice to expert in as logical a manner as possible. The sections each cover a different area of expertise. Unless you’re fairly knowledgeable about the basics, we suggest you read the introductory chapters in Part 1, although skimming through may well be enough for the savvier user. The other sections can then be read in order or can be targeted directly if they cover an area that you are particularly interested in. For example, when your boss suddenly tells you that your next release must offer an XQuery add-in, you can head straight to Chapter 9. A brief overview of the book is as follows:

  • You begin by learning exactly what XML is and why people felt it was needed.
  • We then take you through how to create XML and what rules need to be followed.
  • Once you’ve mastered that, you move on to what a valid XML document is and how you can be sure that yours is one of them.
  • Then you’ll look at how you can manipulate XML documents to extract data and to transform them into other formats.
  • Next you deal with storing XML in databases — the advantages and disadvantages and how to query them when they’re there.
  • You then look at other ways to extract data, especially those suitable to dealing with large documents.
  • We then cover some uses of XML, how to publish data in an XML format, and how to create and consume XML-based web services. We explain how AJAX came about and how it works, alongside some alternatives to XML and when you should consider them.
  • We follow up with a couple of chapters on how to use XML for web page and image display.
  • Finally, there’s a case study that ties a lot of the various XML-based technologies together into a real-world example.

We’ve tried to organize the book in a logical fashion, such that you are introduced to the basics and then led through the different technologies associated with XML. These technologies are grouped into six sections covering most of topics that you’ll encounter with XML, from validation of the original data to processing, storage, and presentation.

Part I: Introduction
This is where most readers should start. The chapters in this part cover the goals of XML and the rules for constructing it. After reading this part you should understand the basic concepts and terminology. If you are already familiar with XML, you can probably just skim these chapters.
Chapter 1: What Is XML? — Chapter 1 covers the history of XML and why it is needed, as well as the basic rules for creating XML documents.
Chapter 2: Well-Formed XML — This chapter goes into more detail about what is and isn’t allowed if a document is to be called XML. It also covers the modern naming system that is used to describe the different constituent parts of an XML document.
Chapter 3: XML Namespaces — Everyone’s favorite, the dreaded topic of namespaces, is explained in a simple-to-understand fashion. After reading this chapter, you’ll be the expert while everyone else is scratching their heads.
Part II: Validation
This part covers different techniques that help you verify that the XML you’ve created, or received, is in the correct format.
Chapter 4: Document Type Definitions — DTDs are the original validation mechanism for XML. This chapter shows how they are used to both constrain the document and to supply additional content.
Chapter 5: XML Schemas — XML Schemas are the more modern way of describing an XML document’s format. This chapter examines how they work and discusses the advantages and disadvantages over DTDs.
Chapter 6: RELAX NG and Schematron — Sometimes neither DTDs nor schemas provide what you need. This chapter discusses two other methods by which you can check if your XML is valid, and also includes examples of mixing more than one validation technique.
Part III: Processing
This section covers retrieving data from an XML document and also transforming one format of XML to another. Included is a thorough grounding in XPath, one of the cornerstones of many XML technologies.
Chapter 7: Extracting Data from XML — This chapter covers the document object model (DOM), one of the earliest ways devised to extract data from XML. It then goes on to describe XPath, one of the cornerstone XML technologies that can be used to pinpoint one or many items of interest.
Chapter 8: XSLT — XSLT is a way to transform XML from one format to another, which is essential if you are receiving documents from external sources and need your own systems to be able to read them. It covers the basics of version 1, the more advanced features of the current version, and shows a little of what’s scheduled in the next release.
Part IV: Databases
For many years there has been a disparity between data held in a database and that stored as XML. This part brings the two together and shows how you can have the best of both worlds.
Chapter 9: XQuery — XQuery is a mechanism designed to query existing documents and create new XML documents. It works especially well with XML data that is stored in databases, and this chapter shows how that’s done.
Chapter 10: XML and Databases — Many database systems now have functionality designed especially for XML. This chapter examines three such products and shows how you can both query and update existing data as well as create new XML, should the need arise.
Part V: Programming
This part looks at two programming techniques for handling XML. Chapter 11 covers dealing with large documents, and Chapter 12 shows how Microsoft’s latest universal data access strategy, LINQ, can be used with XML.
Chapter 11: Event-Driven Programming — This chapter looks at two different ways of handling XML that are especially suited to processing large files. One is based on an open source API and the examples are implemented in Java. The second is a key part of Microsoft’s .NET Framework and shows examples in C#.
Chapter 12: LINQ to XML — This chapter shows Microsoft’s latest way of handling XML, from creation to querying and transformation. It contains a host of examples that use both C# and VB.NET, which, for once, currently has more features than its .NET cousin.
Part VI: Communication
This part has five chapters that deal with using XML as a means of communication. It covers presenting data in a way that many different systems can utilize and then shows how web services can make data available to a variety of different clients. It concludes with a discussion on how complex data can be described in a standard way that’s accessible to all.
Chapter 13: RSS, Atom, and Content Syndication — This chapter covers the two main ways in which content, such as news feeds, is presented in a platform-independent fashion. It also covers how the same XML format can be used to present structured data such as customer listings or sales results.
Chapter 14: Web Services — One of the biggest software success stories over the past ten years has been web services. This chapter examines how they work and where XML fits into the picture, which is essential knowledge, should things start to go wrong.
Chapter 15: SOAP and WSDL — This chapter burrows down further into web services and describes two major systems used within them: SOAP, which dictates how services are called, and Web Services Description Language (WSDL), which is used to describe what a web service has to offer.
Chapter 16: AJAX — The final chapter in this section deals with AJAX and how it can help your website provide up-to-the-minute information, yet remain responsive and use less bandwidth. Obviously XML is involved, but the chapter also examines the situations when you’d want to abandon XML and use an alternative technology.
Part VII: Display
This part shows two ways in which XML can help display information in a user-friendly form as well as in a format that can be read by a machine.
Chapter 17: XHTML and HTML 5 — This chapter covers how and where to use XHTML and why it is preferred over traditional HTML. It then goes on to show the newer features of HTML 5 and how it has removed some of these obstacles.
Chapter 18: Scalable Vector Graphics (SVG) — This chapter shows how images can be stored in an XML format and what the advantages are to this method. It then shows how this format can be combined with others, such as HTML, and why you would do this.
Part VIII: Case Study
This part contains a case study that ties in the many uses of XML and shows how they would interact in a real-world example.
Chapter 19: Case Study: XML in Publishing — The case study shows how a fictional publishing house goes from proprietary-based publishing software to an XML-based workflow and what benefits this brings to the business.
Appendices
The three appendices contain reference material and solutions to the end-of-chapter exercises.
Appendix A: Answers to Exercises — This appendix contains solutions and suggestions for the end-of-chapter exercises that have appeared throughout the book.
Appendix B: XPath Functions — This appendix contains information on the majority of XPath functions, their signatures, return values, and examples of how and where you would use them.
Appendix C: XML Schema Data Types — This appendix contains information on the numerous built-in data types defined by XML Schema. It shows how they are related and also how they can be constrained by different facets.

WHAT YOU NEED TO USE THIS BOOK

There’s no need to purchase anything to run the examples in this book; all the examples can be written with and run on freely available software. You’ll need a machine with a standard browser — Internet Explorer, Firefox, Chrome, or Safari should do as long it’s one of the more recent editions. You’ll need a basic text editor, but even Notepad will do if you want to create the examples rather than just download them from the Wrox site. You’ll also need to run a web server for some of the code, either the free version of IIS for Windows or one of the many open source implementations such as Apache for other systems will do. For some of the coding examples you’ll need Visual Studio. You can either use a commercial version or the free one available for download from Microsoft.

If you want to use the free version, Visual Studio Express 2010, then head to www.microsoft.com/visualstudio/en-us/products/2010-editions/express. Each edition of Visual Studio concentrates on a specific area such as C# or web development, so to try all the examples you’ll need to download the C# edition, the VB.NET edition, and the Web edition. You should also install service pack 1 for Visual Studio 2010 which can be found at www.microsoft.com/download/en/details.aspx?id=23691. Once everything is installed you’ll be able to open the sample solutions or, failing that, one of the sample projects within the solutions by Choosing File image Open image Project/Solution . . . and browsing to either the solution file or the specific project you want to run. As this book went to press Microsoft was preparing to release a new version, Visual Studio 2011. The examples in this book should all work with this newer version although the screenshots may differ slightly.

CONVENTIONS

To help you get the most from the text and keep track of what’s happening, we’ve used a number of conventions throughout the book.


TRY IT OUT
The Try It Out is an exercise you should work through, following the text in the book.
1. They usually consist of a set of steps.
2. Each step has a number.
3. Follow the steps through with your copy of the database.
How It Works
After each Try It Out, the code you’ve typed will be explained in detail.


image WARNING Boxes with a warning icon like this one hold important, not-to-be forgotten information that is directly relevant to the surrounding text.


image NOTE The pencil icon indicates notes, tips, hints, tricks, and asides to the current discussion.

As for styles in the text:

  • We highlight new terms and important words when we introduce them.
  • We show keyboard strokes like this: Ctrl+A.
  • We show filenames, URLs, and code within the text like so: persistence.properties.
  • We present code in two different ways:
We use a monofont type with no highlighting for most code examples.
We use bold to emphasize code that's particularly important in the present context.

SOURCE CODE

As you work through the examples in this book, you may choose either to type in all the code manually, or to use the source code files that accompany the book. All the source code used in this book is available for download at www.wrox.com. When at the site, simply locate the book’s title (use the Search box or one of the title lists) and click the Download Code link on the book’s detail page to obtain all the source code for the book. Code that is included on the website is highlighted by the following icon:

image

Listings include the filename in the title. If it is just a code snippet, you’ll find the filename in a code note such as this:

filename


image NOTE Because many books have similar titles, you may find it easiest to search by ISBN; this book’s ISBN is 978-1-118-16213-2.

Once you download the code, just decompress it with your favorite compression tool. Alternately, you can go to the main Wrox code download page at www.wrox.com/dynamic/books/download.aspx to see the code available for this book and all other Wrox books.

ERRATA

We make every effort to ensure that there are no errors in the text or in the code. However, no one is perfect, and mistakes do occur. If you find an error in one of our books, like a spelling mistake or faulty piece of code, we would be very grateful for your feedback. By sending in errata you may save another reader hours of frustration and at the same time you will be helping us provide even higher quality information.

To find the errata page for this book, go to www.wrox.com and locate the title using the Search box or one of the title lists. Then, on the book details page, click the Book Errata link. On this page you can view all errata that has been submitted for this book and posted by Wrox editors. A complete book list including links to each book’s errata is also available at www.wrox.com/misc-pages/booklist.shtml.

If you don’t spot “your” error on the Book Errata page, go to www.wrox.com/contact/techsupport.shtml and complete the form there to send us the error you have found. We’ll check the information and, if appropriate, post a message to the book’s errata page and fix the problem in subsequent editions of the book.

P2P.WROX.COM

For author and peer discussion, join the P2P forums at p2p.wrox.com. The forums are a web-based system for you to post messages relating to Wrox books and related technologies and interact with other readers and technology users. The forums offer a subscription feature to e-mail you topics of interest of your choosing when new posts are made to the forums. Wrox authors, editors, other industry experts, and your fellow readers are present on these forums.

At http://p2p.wrox.com, you will find a number of different forums that will help you not only as you read this book, but also as you develop your own applications. To join the forums, just follow these steps:

1. Go to p2p.wrox.com and click the Register link.
2. Read the terms of use and click Agree.
3. Complete the required information to join as well as any optional information you wish to provide and click Submit.
4. You will receive an e-mail with information describing how to verify your account and complete the joining process.

image NOTE You can read messages in the forums without joining P2P but in order to post your own messages, you must join.

Once you join, you can post new messages and respond to messages other users post. You can read messages at any time on the web. If you would like to have new messages from a particular forum e-mailed to you, click the Subscribe to this Forum icon by the forum name in the forum listing.

For more information about how to use the Wrox P2P, be sure to read the P2P FAQs for answers to questions about how the forum software works as well as many common questions specific to P2P and Wrox books. To read the FAQs, click the FAQ link on any P2P page.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.17.18