“Hello Laura, it is nice to meet you. Please (producing a business card) visit our website and find out all about me or my company.” Or, in an email, “Can you believe this? See it here.” Or, “Look at their menu, I am very much looking forward to our dinner together.” These days, no serious professional or organization does not already have, or plans to have, a site on the World Wide Web (Web).
A website gives you your own domain name (BigBadWolf.com, for example). Then you’ll have email addresses in the form of [email protected]. Don’t forget to add your website and email address to your business card. You’ll be able to upload webpages and other information to your site for everyone to use. And that’s just the beginning of the advantages.
The Web service uses the client and server model, as explained in Section 5.9. Web servers and clients communicate with the HTTP (Hypertext Transfer Protocol) and HTTPS (Secure HTTP) application-level protocols (Figure 5.9), riding on top of TCP/IP. The Hypertext Markup Language (HTML) is used to author webpages. A resource available on the Web is retrieved by its Web address known as a URL (Universal Resource Locator). For example, the CT website is at http://computize.org.
The Web’s global impact has brought us numerous advantages, large and small. And it is transforming society at all levels as untold number of people actively blog, microblog, and otherwise express themselves with online media. It goes without saying that an understanding of the Web and how it works will be good for everyone.
A key factor for the Web’s great success is the low cost of putting information on it. You simply find a Web hosting service to upload files for your website. Any Internet host with a good Internet connection can provide Web hosting. A Web host provides space for websites to store webpages, pictures, audio and video files, and any programs that produce Web content on the fly. More importantly, the Web host also runs a Web server program through which the stored websites can be accessed on the Web.
There are a number of Web servers, but Apache™, the open source Web server from apache.org, is the most dominant. According to Web Technology From Computing to Computational Thinking Surveys (March 2015), market shares for major Web servers are Apache™ (58.4%), NGINX™ (23.3%), and Microsoft-IIS® (13.2%).
A Web server program listens to a specific networking port on the server host and follows the HTTP and HTTPS to receive requests and send responses. The standard HTTP port is 80, but can be some other designated port, such as 8080. The standard HTTPS port is 443.
In response to an incoming request, a server may return a static document from files stored on the server host, or it may return a document dynamically generated by a program, such as a PHP script, indicated by the request (Figure 6.1).
A single-thread server handles one incoming request at a time, while a multithread server can handle multiple concurrent requests. A server host may have multiple copies of a Web server running to improve the handling of requests.
On a server host, webpages and other files to be accessed from the Web must be placed in the Web server’s document space before they become available on the Web. The top folder of the document space is known as the server root. To further control access, files and folders inside the document space must also be given correct access permissions before the Web server can deliver them onto the Web.
Another critical factor for the Web’s success is easy-to-use clients for browsing the Web.
We access the Web from anywhere on the Internet with desktops, laptops, tablets, and smartphones. Popular Web browsers (Figure 6.2) are:
In terms of popularity, you have Google Chrome and Firefox enjoying 56% and 30% of the market, 1 respectively.
Browsers make surfing the Web a great experience. They keep track of your browsing history, remember and organize your bookmarks, safely keep your user IDs and passwords for different sites, cache Web data for speed, supply multiple tabs (such a convenience), and allow a high degree of user customization, including browser home page (URL loaded automatically when the browser starts), fonts, colors, helper applications and more.
Accessing the Web on the go is important and fun. Android tablets and smartphones come with Google Chrome. Apple devices come with Safari. Windows phones come with IE Mobile. Businesses also provide dedicated mobile apps to access their information and services.
The Web got its start in the late 1980s. In 1989, Tim Berners-Lee at the European Laboratory for Particle Physics (CERN) started to develop a suite of technologies to make the Internet truly accessible and useful to people.
Berners-Lee also wrote the first Web browser and server.
The simplicity of HTML makes it easy to learn and publish webpages.
It caught on. In 1992-1993, a group at NCSA (National Center for Supercomputing Applications, US) developed the Mosaic visual/graphical browser (Figure 6.3). Mosaic added support for images, nested lists, and fill-in forms, and it fueled the explosive growth of the Web. Several people from the Mosaic project helped start Netscape (Figure 6.3) in 1994. At the same time, the W3
Consortium (W3C) was formed and housed at MIT as an industry-supported organization for the standardization and development of the Web.
The Web uses Uniform Resource Locators (URLs) to identify (locate) many kinds of resources available on the Internet. URLs are used by Web browsers to request and retrieve information. We know URLs can locate webpages. But they can also identify pictures/images and audio and video media, as well as Internet/Web services.
A URL usually has the form scheme ://serverhost : port/pathname? query_string that consists of several parts. Let’s break the URL down.
Here are some examples,
http://www.kent.edu | (Ket State U. homepage) |
http://w3.org/ | (W3C site) |
https://chase.com | (Chase Bank secure site) |
http://computize.org/example.html | (CT site example page) |
https://amazon.com/.../home?ie=UTF8 | (amazon.com after login) |
ftp://webtong.com | (Public FTP webtong.com) |
file:///C:/Users/pwang/Desktop/a.jpg | (Picture on local Desktop) |
URLs are critical in Web operations. You can enter any valid URL into the Location box of any Web browser to reach the target resource. When a URL specifies a directory, a Web server usually returns an index file, typically named index.html, for that directory. Otherwise, it may return a list of the filenames in that directory. Thus, for example, the URL http://cnn.com is the same as http://cnn.com/index.html.
URLs are also used in webpages to link to other webpages and resources, inside or outside a particular website. The cross-links among webpages globally form a worldwide web structure. Because of its importance, many applications, including email readers, PDF readers, text/document editors, presentation tools, and shell windows, recognize the http URL and, when you click on it, will launch the default Web browser.
Within an HTML document, you can link to another document served by the same Web server by giving only the pathname part of the URL. Such links are examples of relative URLs.
Be aware of the implicit context. It can bring convenience and efficiency or confusion and misunderstanding.
Implicit contexts are everywhere and happen all the time. When dialing a local phone number we may skip the country code and sometimes even the area code. When addressing a domestic letter, we do not have to indicate the country. When mentioning an address to a friend, we save our breath on the country, state, or even city. On the Internet, inside the domain cs.kent.edu, we can refer to the host tiger.cs.kent.edu as simply tiger.
When building a website, it is advisable to code webpages using URLs relative to the current page as much as possible. This makes it easy to reorganize the file/folder structures of a website and to move the entire website to another location on the local file system or to a different server host.
When communicating, the parties must use the “same implicit context” or misunderstandings can happen. For example, “You should be honest” can be stating a principle (editorial “you”) or an accusation. A day of the week is relative to the week. Therefore, people must pay attention to the sent date and time when reading email or text messages (CT: PAY ATTENTION TO DETAILS, Section 4.6.1). But why depend on others being careful? Instead of terms such as “Saturday,” “yesterday,” “tomorrow,” or “next week”, we should always state a specific date and time in our messages.
In sly advertisements, clever manipulation of the implicit context is often used to mislead. No wonder why legal documents are so lengthy, repetitive, and formal.
According to the URL specification (RFC1738), only the following characters may be included directly within a URL.
Other characters (such as SPACE, NEWLINE, , “, and so on), reserved characters not used for URL syntax, and non-ASCII (Section 2.6.1) UNICODE characters (Chinese characters, for example) may cause problems (unsafe) if used directly in a URL. To include such a character, it should be encoded following percent encoding rules.
U+738B | Έ | %E7%8E%8B |
U+58eb | i | %E5%A3%AB |
U+5f18 | & | %E5%BC%98 |
See Demo: PercentEncode on the CT website for an interactive tool.
Realize and provide a clear indication when the same entity is to perform a different function.
It is not unusual for a person to have multiple roles to play. For example, a policeman can be on or off duty. A door can be an entrance or an exit. A road can be oneway or twoway. Often, the distinction is important to avoid confusion, and we use different hats, labels, signs, or uniforms as indications. In case of the one-way street, the sign may mean life or death.
In computing, characters often must do multiple duties. This is simply because there are not enough characters on the keyboard to satisfy all the varied needs in different situations. For example, in many programming languages, character strings are enclosed in double quotes (“). But that begs the question: “What if a double quote is part of a string?” The problem is caused by “ performing double duty as delimiters of strings and as just a character. And the solution? Place a in front of “ to escape it from being treated as terminating a string. The JavaScript code
str_a = “The double quote (”) character.”;
is an example. Now, the BACKSLASH () doubles as an escape character, itself must then be escaped in a string:
str_b = “The backslash (\) character.”;
The escape character performs the same function as a hat or a label for the next character in a string. The URL percent encoding is basically the same story—the % character escapes a 2-character sequence that represents an arbitrary byte. And, again, the character % must itself be percent encoded (%25) to be part of a URL.
HTML (the Hypertext Markup Language) is used to structure webpage contents for easy handling by Web clients on the receiving end. From its simple start in 1989, HTML has been constantly evolving and maturing. Beginning with HTML 4.0, the language has become standardized under the auspices of the W3C (World Wide Web Consortium), the industry wide open standards organization for the Web. Subsequently, by making HTML 4.0 compatible with XML (eXtensible Markup Language Section 9.7.1), XHTML became a widely used new standard. Today, the Web is moving toward HTML5, the next-generation HTML standard, which brings many new features and APIs (application programming interfaces). HTML5 makes it easier to provide dynamic user interactions and promises to transform the Web into an even more useful and powerful tool.
A document written in HTML contains ordinary text interspersed with markup tags and uses the .html filename extension. The tags mark portions of the page as heading, section, paragraph, quotation, image, audio, video, link, and so on. Thus, an HTML file consists of two kinds of information: contents and HTML tags. The HTML code provide webpage organization and structure information to make automatic processing of the contents easier. An HTML tag takes the form <tag>. A begin tag such as <h1> (level-one section header) is paired with an end tag, </h1> in this case, to mark content in between. Table 6.1 lists some frequently used tags.
TABLE 6.1 Some HTML Tags
The following is a sample HTML5 page (Demo: Sports):
Figure 6.4 shows the Big on Sports page displayed by Firefox.
Organize information in a document by identifying and delimiting its parts. Marked-up documents are easier to use, exchange, and process mechanically.
The idea is hardly new or surprising. Take this textbook, for example; we organized it into chapters, sections, and subsections. It has a table of contents, and an index, among other things. The organization is achieved through headers, page formats, and other visual conventions.
For textual documents, such as webpages, markup elements or tags are used to indicate the start and end of parts, such as headings, paragraphs, tables, images, quotations, links, and so on. A marked-up document can be easily transmitted and processed by applications on receiving host computers.
While HTML takes care of page structure, the way information is actually presented (visually or otherwise) to the end user is controlled by the Web browser, user-defined styling preferences, and styling rules that come with the webpage.
Styling rules are coded in Cascading style sheets (CSS) and attached to different parts of a webpage. Style rules are usually placed in files separate from the webpage. Isolating page styling from page structure makes it easy for Web designers to reuse styling rules in different pages and to enforce consistent visual styling over an entire website.
For example, if we want to make all level-one headers dark blue, we can use this CSS rule:
hl { color: darkblue }
Thus, HTML makes webpages easy to read by programs, while CSS makes them easy to read by humans.
CSS has also evolved through the years to provide more features and functions for various styling needs. The current standard is CSS3. Experiment with HTML and CSS with the Demo: CodeTester at the CT site.
Web hosting is a service for individuals and organizations to place their websites on the Web. Hence, publishing on the Web involves:
Colleges and universities host personal and educational sites for students and faculty without charge. Web hosting companies provide the service for a fee.
Commercial Web hosting can provide secure data centers (buildings), fast and reliable Internet connections, specially tuned Web hosting computers, server programs and utilities, network and system security, regularly scheduled backup, and technical support. Each hosting account provides an amount of disk space, a monthly network traffic allowance, email accounts, Web-based site management and maintenance tools, and other access, such as FTP and SSH/SFTP.
Put your website to serious use. Manage it appropriately. Update it diligently. Make it an integral part of your organization.
A website is far more than a static online advertisement. It is a window to the world. Take advantage of all that the Web can do to make your organization more effective and efficient. Integrate the access, modification, and management of your site into your business operations. Always make sure information on the site is up-to-date.
To host a site under a given domain name, a hosting service associates that domain name to an IP number assigned to the hosted site. The domain- to-IP association is made through DNS servers and Web server configurations managed by the hosting service.
Be critical; don’t believe everything online. Avoid spreading untruth.
Easy online sharing is a powerful and positive force in the digital age. With all kinds of information available on the Web, Internet, and by email, we must also be keenly aware of an unpleasant fact, that not all such information is accurate or even true. The good news is, by digging a little deeper (a few Web searches, for example), you can usually find out. Too many have unwittingly sent onward to friends rumors, falsehoods, or baseless claims, even causing bad information to go viral sometimes. Let’s not participate in such silliness.
Webpages are usually prepared and set in advance to supply some predetermined content. These fixed pages are static. A Web server can also deliver dynamic pages that are generated on the fly by programming on the server side. Dynamic pages bring many advantages, including:
Dynamic webpages are not magic. Instead of retrieving a fixed file, a Web server calls another program to compute the document to be returned and perhaps perform other functions. As you may have guessed, not every program can be used by a Web server in this manner.
A Web server invokes a server-side program by calling it and passing arguments to it and receiving the results thus generated. Such a program must conform to the Common Gateway Interface (CGI) specifications governing how the Web server and the invoked program interact (Figure 6.5).
A CGI program can be written in any programming language as long as it follows the CGI specification and can be invoked by the Web server. The Web server and a CGI program may run as independent processes and interact through interprocess communication. Or, the external program can be loaded into the server and run as a plug-in module.
The dynamic generation of pages is made simpler and more integrated with webpage design and structure by allowing a webpage to contain active parts (Figure 6.6) that are treated by the Web server and transformed into desired content on the fly as the page is retrieved and returned to a client browser.
The dynamic (active) parts in a page are written in some kind of notation to distinguish them from the fixed parts of a page. The ASP (Active Server Pages), JSP (Java Server Pages), and the popular PHP (Hypertext Preprocessor) are examples. With PHP, the active parts are enclosed inside the bracket <?php ... ?> and embedded directly in an HTML page or other types of Web document. For example, inside an active page, code such as
<p>Today’s date is: <?php echo(date(“l M. d, Y”)); ?><p>
may appear. The date is dynamically computed and inserted in the HTML paragraph. Here is a result line the code would generate:
Today’s date is: Wednesday Sep. 2, 2015
When active pages are treated by modules loaded into the Web server, the processing is faster and more efficient, compared to external CGI programs. PHP usually runs as an Apache module and can provide excellent server-side programming and support.
Dynamic webpages are often generated from information stored in databases (Section 9.10). A database is an efficiently organized collection of data for a specific purpose. Database systems use the standard SQL (Structured Query Language; Section 9.10.2) for access and update of information in databases (Figure 6.7).
Examples of databases abound: employee database, membership database, customer account database, airline (hotel) reservation database, user feedback database, inventory database, supplier or subcontractor database, and so on.
Relational database systems (RDBMS; Section 9.10.1) support the management and concurrent access of relational databases. A relational database is one that uses tables to store, organize, and retrieve data.
In today’s fast-moving world, online access to databases is increasingly important for businesses and organizations. Using a Web interface to provide such online access has become the norm. In addition to providing access to databases, many websites also employ databases for their own purposes, such as user accounts, product inventory, blogging and forum support, just to name a few.
Modern browsers make the Web useful for everyone by providing a convenient user interface that usually supports keyboard, mouse, and touch screen interactions as well as video and audio presentations.
The actions of a Web browser can be defined and controlled by programming within a webpage. Such programming can supply customized user experiences and make webpages more responsive and useful for end users. The programs execute within the browser, which runs on the client host, the computer used to access the Web. For all major browsers, JavaScript is the standardized scripting language for client-side programming. Because the JavaScript language standard has been developed and maintained by the ECMA (European Computer Manufacturer Association), the language is also known as ECMAScript (ecma-international.org).
With JavaScript, a webpage can define reactions to user interface events (Section 4.6.2), verify correctness/completeness of user input, exchange information with the page’s server while displaying a page, change/update and otherwise manipulate the page display, and much more. Because JavaScript runs on the client, it takes advantage of the processing power of the client host and can potentially lessen the load on the Web server.
As we have mentioned, Web browsers and Web servers communicate following HTTP, the Hypertext Transfer Protocol. It does not matter which browser is contacting what server; as long as both sides use the same protocol, everything will work.
In the early 1990s, HTTP gave the Web its start. HTTP/1.0 was standardized in the first part of 1996. Important improvements and new features have been introduced in HTTP/1.1, and it is now the stable version.
HTTP is an application layer (Figure 5.9) protocol that sits on top of TCP/IP, which provides reliable two-way connection between the Web client and Web server. We don’t need all the details to understand the basics of HTTP.
A query and a response form an HTTP transaction. Each transaction stands alone and has no protocol-provided means to be correlated with any other transaction. Figure 6.8 illustrates an HTTP transaction.
A simple HTTP transaction goes as follows:
HTTP governs the format of the query and response messages (Figure 6.9). Basically, each query or request consists of an initial line, one or more header lines and an optional body. The initial line and header lines are textual (ISO-8859-1). Each line should end in RETURN and NEWLINE, but it may end in just NEWLINE.
The initial line identifies the message as a query or a response.
The Web borrowed the content type designations from the Internet email system and uses the same MIME (Multipurpose Internet Mail Extensions) defined content types. Hundreds of standard MIME content types are listed at the IANA site (iana.org/assignments/media-types/media-types.xhtml).
The content type information allows browsers to decide how to process the incoming content. HTML, text, images, audio, and video may be handled by the browser directly. Other types, such as PDF and Flash, are usually handled by plug-ins or external helper programs.
When using a browser to access the Web, the HTTP messages between it and the Web servers are kept behind the scenes. But it is possible to expose these messages and gain real experience with HTTP. See Demo: Http at the CT website.
While HTTP transmits information in the open, HTTPS (HTTP Secure) is a secure protocol that simply applies HTTP over a secure transport layer protocol Transport Layer Security (TLS 1.2) that is derived from the earlier Secure Sockets Layer (SSL). See Section 7.2 for more information on how this security feature works.
An important improvement of HTTP 1.1 over HTTP 1.0 is the introduction of caching for HTTP responses. On the Web, a great deal of contents are not changing often with time. These include static webpages, images, graphics, styling code, scripts, and so on. Saving a copy of such data can avoid a lot of unnecessary work of requesting and retrieving the same data over and over again from origin servers. Browsers (user agents) and caching proxy servers are able to serve data from their cache when they know or can verify that the data are still current and unchanged on the servers where they originated.
A caching proxy server accelerates requests by providing contents from its cache. Caching proxies keep local copies of popular resources so large organizations can greatly reduces their Internet usage and costs, while significantly enhance performance. Most ISPs and large businesses employ caching proxies.
The HTTP caching scheme significantly cuts down round-trip Web traffic to origin servers and reduce response time to users. This explains why it is slower the first time you visit a website. Then it is lightning fast when you visit again.
Use cache to increase efficiency and speed. In many situations, significant improvement may result from storing the right items in a cache.
Take phone numbers for example. You remember certain frequently used numbers in your head (the cache) but have to consult your phone list, or even the white pages for other numbers. You have your favorite pots and pans handy in the kitchen and many others stored in the basement. You have important items in your wallet/purse, and many other things you don’t carry with you. And we already know that computer memory is organized into on-chip cache, RAM, and hard disk, a multilevel caching scheme.
Toward the end of 2013, when a team of super coders were helping to rescue and fix the healthcare.gov site, one immediate technique they used was introducing a database cache so that frequent queries could be separated from other queries into the huge database. The database cache reduced congestion, and they were able to lower the average page access time from 8 seconds to about 2 seconds. Later, with continued improvements, the access time was reduced to below 0.35 seconds. The rescue work may well have saved the Affordable Care Act from disaster.
The simplicity of HTML makes it seem deceptively easy to create websites. Well, that may be true for very simple webpages with basic information. But, an inviting, attractive, and effective website requires much more effort and expertise.
A website is often a combination of online advertisement, product and service information, sales, shipping, and customer service, as well as other business functions, such as recruiting and investor relations.
Walk in the shoes of users of your product. Let user-centered thinking guide product design and development.
Make sure your users will not say: “These people seem to have never used their own product!”
For Web development, in addition to carefully preparing the text, image, and multimedia contents, a well-designed website must:
To achieve these goals, it takes expertise in usability, visual communication design, site architecture and navigation, and copy editing, as well as programming techniques. And it takes time and effort to test, debug, and deploy. In other words, great websites need professional help and cooperation on the part of the site owner. Sometimes, even development professionals can mess up. The infamous healthcare.gov website launch debacle in October 2013 is an example.
The Web is so easy to access and contains so much information that the answer to almost any question is just a Web search away.
Many search engines are available, but Google remains predominant (Figure 6.10; source: netmarketshare.com).
A search engine makes finding information on the Web easy by working hard to gather data about what’s where online and organize the collected data into indexes for efficient search. Because data online change constantly, the job of a search engine is never done. It must roam the Web continuously to update its indexes. The equipment, algorithms, and the exact ways a search engine works are closely guarded secrets.
Generally, search engines use automated robots and/or manual submissions to collect indexing information. A robot or crawler is software that visits webpages and follows links in them to recursively visit connected pages. Meta information about visited pages, together with manually submitted data, are organized and deposited in databases. When a user enters a search request, the search is conducted in the databases, not on the open Web.
The answer is “Google.”
Doing research, curious about something, or just playing trivia? Don’t bother, just Google it first.
When doing a search, try to be precise about what you are looking for. Start with the information type such as “sports,” “science,” “finance,” “health,” “entertainment,” “politics,” and so on. If you are looking for something local, be sure to add the location such as “Ohio” or “Kent, Ohio.” Follow that with specific keywords for the search. Often, you’ll get what you want in the first try. If not, you can refine your search accordingly. Getting desired information immediately online and determining the reliability of such information is a skill everyone needs to develop.
The Internet and Web are open. Therefore, not all information is correct or accurate.
So, don’t believe everything you see or read on the net. Use common sense. Double check. You’ll soon get the correct information.
Most people think of the Web as a vast collection of webpages ready to be visited. That is certainly true. But the Web is more than that. It is also widely used to make computing powers available to remote clients via HTTP. Such computational services are known as Web services.
A Web service is a resource or program on the Web that can be invoked with an HTTP request. A Web service usually computes a result based on the request input and sends back a result in a well-defined format. The most widely used Web service result formats are XML and JSON (JavaScript Object Notation). A Web service is like a remote procedure, it runs on a remote computer and provides specific results useful for clients. But, unlike remote procedures, Web services will always use HTTP for request and response (Figure 6.11).
In the early days of Web services, the SOAP (Simple Object Access Protocol) was widely used. Contemporary Web service and client developers prefer REST 2 (REpresentational State Transfer), which is an academic term for URLs, query strings, and request/response bodies used in HTTP. In other words, REST-based Web services receive HTTP POST/GET requests and send back results, often in XML or JSON, in an HTTP response body.
Application programs running on a computer connected to the Internet can easily make requests to Web services and obtain results to be used in the application. For example, a news reader application can obtain news feeds (the simplest kind of Web service) via HTTP requests and then use the results (RSS 3 documents in XML) for displaying and retrieving news articles. On a Web host, we can use a server-side script, in PHP for example, to dynamically obtain information from Web services. Such server-side scripts can also be invoked by webpages, originating from the same server, via AJAX from the client side 4 . This way, a webpage can access indirectly any Web services available.
There are many Web services throughout the world. Here are some examples:
Online listings and directories for Web services are available. The Web Services Directory at programmableweb.com is an example.
In summary, a number of technologies enable the Web to work as it does. These include networking protocols, data encoding formats, clients (browsers), servers, webpage markup and styling languages, and client-side and server-side programming.
The Web can deliver text, images, animation, audio, video, and other multimedia content. Standard and proprietary media formats, tools, and players are also part of the Web. The World Wide Web Consortium (W3C) is a nonprofit organization leading the way in developing open Web standards.
Core Web technologies recommended by the W3C include:
The items listed here represent open/industry standards and best practices that are not private or proprietary.
With the modern Web, you can get up-to-the-moment news, find answers to any questions, satisfy any curiosity, and air your views on forums, blogs, and social media. So, fire up your browser, get online, and enjoy the Web. Life is sweet!
1 2014 statistics by w3cschool.
2 See CT: CAPTURE THE STATE, Section 4.9.
3 Rich Site Summary
4 For security reasons, browsers only allow JavaScript to network back to its originating host.
18.191.174.168