3

Working with the web

Introduction

Structure of documents is important for any sort of digital output but there needed to be a catalyst for the rapid growth in digital products. There had been database products available for some time but CD-ROMs formed the first big wave of digital products in the early 1990s. However, it was the growth of the web that became the main driver for rapid development in digital products from publishers.

The web and HTML

The internet itself developed as early as the 1960s as part of a US government initiative and grew by the 1980s, creating a network of computer networks. However, the way internet sites talk to each other and link together most commonly takes place via the World Wide Web, which developed in the early 1990s. The web exists because of the development of hypertext mark-up language (HTML), which is the formatting language of the web; it allows documents to be linked via web pages and, from that, information can be accessed across the web by anyone with a web browser.

Essentially HTML allows information to be understood in a web browser. It is how something appears on the web, not what it is or how it is structured (XML is key to that) but how it looks. If you have an article in your data warehouse, HTML coding will mean that the article can be pulled into the web environment and displayed in the appropriate way within the website (e.g. with headings as headings, sections of text as sections of text, etc.). It is a text-based coding system and it does not cover pictures; however, you can call a picture in to display using HTML. It is important to realise it is not a style package; you use style sheets to define your own style on the top (though it has a default style), but it describes the appearance so that the computer knows how to format the information for it to appear on the web.

In order to define the style you want, a cascading style sheet (CSS) is necessary. This is the style sheet used to describe the presentation of material held in a mark-up language like HTML. It is most commonly used for websites and this holds the information about font, layout and colours (for instance) for the web page design that is to be applied to the marked-up content to create the desired website appearance. A CSS can be adapted for different sorts of rendering (e.g. on different size screens, or into print). A marked-up document can have a CSS linked to it, but a user may have a CSS on their machine that will override it, which can cause problems sometimes and can need checking where publishers need to be sure the content is accurately presented.

Once the internet developed in this way, publishing companies could deliver products direct to customers, whether linking to a database of journals with a front end or opening up specially designed web pages for content. This made a significant change to the publishing value chain as physical products (even electronic ones such as CD-ROMs) did not need to be produced and distribution mechanisms changed as the product could be delivered to a computer over the internet. We will see how these products changed and developed in the subsequent chapters, but here we will look at the current web technologies relevant to the development of these products.

Html5

HTML5 is the latest incarnation of HTML. HTML has been revised to cope with more complex websites and more easily encompass multimedia, which forms much more of the experience on the web now. It is not yet finalised but is usable.

The key benefits are:

  • it can cope more effectively with the wide variety of multimedia
  • the audio visual elements can be easily integrated, as can drawings, which can be created in the new canvas-defined areas
  • it can be read across by a variety of computers and devices
  • it can incorporate other applications more easily too, like clocks or currency converters or geo-location, etc.

The importance of this development for publishers is that it is should be much easier to insert video or audio material without relying on other pieces of software (plug-ins) to be bought into the website in order to play or make use of that video and audio material. A key case in point is that Flash, a common piece of software that can be used to play video, is not supported by Apple, so users on iPads (as an example) are not able to see certain parts of a website which might display a video using Flash. HTML5 should overcome this problem. There is flexibility to do more with a website more easily, something to bear in mind when developing a new product.

Other aspects of HTML5 that are benefits over the older systems are:

  • it is better at handling errors
  • greater attention is paid to the fact it must be independent of any specific device, given that it is in the area of devices that so much development has taken place over the past few years
  • it is also better for offline storage
  • it has a lot more flexibility to create form controls such as systems for automated form filling
  • it can cope with more new content elements (such as footers, headers, etc.) that allow for more subtle structuring of pages

All of this makes the development of web pages much more flexible: for some products it may be that the development of an app is less necessary if the web pages can be made to be as powerful. And, crucially, it aims to preserve the level of user-friendliness of HTML so it can be used by anyone.

Web 2.0 and social networking

HTML5 is the language used to create the web pages. However, there have been more fundamental developments in the World Wide Web. Web 2.0 saw the development of user-centred design and interoperability, allowing individuals to interact much more directly, whether creating and distributing their own material or creating or participating in social media environments; it has changed the way we can communicate and collaborate, and has empowered us to make more use of the web with user-generated content (as opposed to passively consuming websites and the products held on them). In publishing the biggest opportunity here in the first instance has been to generate much more online marketing activity. Publishers have made use of sites such as Facebook, YouTube and Twitter to create conversations and develop relationships with certain customer groups, growing brand image as a whole as well as generating PR for particular titles. We shall explore the relevance of user-generated content within the new web in Chapter 13 when we explore self-publishing.

The semantic web

The newest stage of web development currently is the progression to the semantic web (sometimes called web 3.0). This next stage of the evolution of the web is aiming to increase the ability to connect different sorts of formats and data types by including more semantic information within data in order for machines to process information more effectively. The growing sophistication of metadata is important here. Some data is difficult for a machine to understand, contextual issues might need to be considered or different sorts of data might have been used to create a new single tool (like an app). A more sophisticated way of understanding this data is therefore necessary; the idea is to move on from the current position where data is held in silos of different data types; these need to be much more integrated, creating a web of linked data.

With the semantic web therefore it should become easier to find and link very different sorts of data sources because the relationship between them is described in such a way as to facilitate these links. Common formats help this integration of data types and taxonomies, and ontologies provide a basic structure. In many ways it should mean that keyword searches become less important as there will be much more sophisticated ways of searching across different sorts of data that are context sensitive, so users are more likely to reach the precise sort of information they require. Additional aspects of this include storing data in formats that will be future proof, and it should allow users of the data to integrate and reuse data more effectively rather than simply view it (something that may give rise to more copyright issues).

Linked data

Linked data allows the publishing and connecting of structured data on the web and is a central aspect of these developments. If the early web was based around documents, structured and fixed in databases, phrases such as the ‘web of data’ try to describe the way the semantic web links open data sets together. Structured data is essentially any data that is organised and searchable. Documents have, in general, been made available as masses of raw data, but even with XML and HTML mark-up much of its structure and semantics will have been lost. Where both documents and data are linked, the web of data enables new types of applications. You can browse data sources and then move along links in those to other data sources, rather than moving in and out of fixed data sets; in this way linked data sits on top of an unlimited global data space consisting of all sorts of data types, from blogs to scientific trial data, from music to video footage. An example of all this would be the ability to link something that was said by a professor in a video to marks (i.e. writing) made on a whiteboard and with a chapter in a textbook as well as a few diagrams and formulae from yet another source. Linked data will be particularly important going forward.

Conclusion: the more flexible web

What web development like this can mean for publishers is that they will have a more effective way of building products, with the added value of enriched content and much more sophisticated searches. But it can do more than that: it is a way to extend the power of a search. The results from a search do not simply have to be listed back to the customer with a percentage based around likelihood of relevance. They could, in the new semantic web, be presented to the user in a visual format, showing all the connections and so maybe making it easier to reach the key information wanted.

It should also allow publishers to design and reuse their content effectively so that, for instance, they can exploit the workflow of professionals wanting certain sorts of data presented to them in different ways, with differing levels of interactivity, depending on their particular use at that particular time in the day. This is possible now but the semantic web is potentially able to increase the opportunities here. Publishers can look at more ways to develop new products and build different businesses, creating tools out of existing data sets, for instance, or building new commercial models after data mining or integrating other forms of data (from video to raw statistics). The reuse of data of course will have key implications for the issues surrounding copyright, as we will see in Chapter 10, but for the moment it is important to recognise how fertile a development this can be for publishers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.86.183