Search Engines

The way that some search engines work is very different from subject directory types of search engines. While humans often times organize and catalog subject directories around topics, some search engines rely on computer programs called spiders or robots (a jargon term is bots) to “crawl” the Web and automatically log the words on each page. This is very different from relying on a directory structure to locate information.

With this type of search engine, keywords related to a topic are typed into a search “box.” The search engine then typically scans its database and returns a file with links to Web sites containing the word or words specified. Because these databases are very large, search engines often return thousands of results. Without search strategies or techniques, finding what you need will be like finding a needle in a haystack.

A search engine is a searchable database of Internet files collected by a computer program, typically called a wanderer, crawler, robot, bot, worm, or spider. Indexing is created from the collected files, for example, title, full text, size, URL, and so on. There are no selection criteria for the collection of files, though evaluation can be applied to ranking schemes that return the results of a query.

A search engine might be called a search engine service or a search service. As such, this dynamic type of search engine consists of three main components:

  • Spider: Program that traverses the Web from link-to-link, identifying and reading pages

  • Index: Database containing a copy of each Web page gathered by the spider

  • Search and retrieval mechanism: Technology that enables users to query the index and that returns results in a schematic order

There are two major types of commercial search engines:

  • Individual: An individual engine that utilizes a spider to collect its own searchable index—also, directory search engines consisting of directories and sub-directories assembled by search engine company staff.

  • Meta: A meta-search engine that simultaneously searches multiple individual search engines. A meta-search engine does not have its own index but uses the indexes collected by the spiders of other search engines. This type of engine is covered later in this chapter under the section “Meta-Search Engines.”

To use search engines efficiently, it is essential to apply techniques that narrow the search results and push the most relevant pages to the top of the results list. In the next section, there are a number of strategies for finding what you need quickly and boosting your overall search engine performance.

Identify Keywords

When conducting an Internet search, decompose the search topic into key linguistic concepts (i.e., words and/or phrases). A standard language for performing searches and connecting words and phrases in a search command is known as the Boolean operator language. The Boolean operators, AND, OR, NOT (or AND NOT), and NEAR tell search engines which keywords you want your results to include or exclude and whether you require that your keywords appear close to each other.

They are named after George Boole, an Englishman who invented these operators as part of a system of logic in the mid-1800s. (If he could only see what his logic language invention is being used for today.)

Since the Boolean operators are simple English words, they are intuitive, easy to utilize, and even simple to remember—that is, for those individuals able to speak the English language. For example, to find information on what is known about blocking television commercials in the communications industry, a set of primary search keywords might be

television commercial blocking

Boolean AND

Connecting search terms with the AND Boolean operator tells the search engine to retrieve Web pages containing ALL the keywords:

television AND commercial AND blocking

The search engine will not return pages with just the word television. Neither will it return pages with the word commercial and the word blocking. The search engine will only return pages where the words television, commercial, and blocking appear. These words must all appear somewhere on the page. Thus, the AND operator helps to narrow the search results as it limits the results to pages where all the keywords appear.

Boolean OR

Linking search terms with an OR operator tells the search engine to retrieve Web pages containing ANY and ALL keywords instances:

television OR commercial OR blocking

When an OR operator is used, the search engine returns pages with a single keyword, several keywords, and all keywords. Thus, OR expands your search results. Use OR when you have common synonyms for a keyword. Surround the entire OR expressions with parentheses for best results. To narrow results as much as possible, combine OR statements with AND statements. For example, the following search statement locates information on television commercial blocking capabilities:

television AND (commercial OR advertisement) AND (blocking OR filter)

Boolean AND NOT

The AND NOT operator tells the search engine to retrieve Web pages containing one keyword but not the other:

television AND commercial AND blocker AND NOT (vchip OR “v-chip”)

This example instructs the search engine to return Web pages about the first three terms but not about the other vchip or v-chip terms. Use AND NOT when you have a keyword that is ambiguous. The need for AND NOT often becomes noticed after you perform an initial search and realize this ambiguity is an issue. Another non-related example is search the term “Venus.” If your search results contain irrelevant results such as Venus the Planet, rather than Venus “The Goddess of Love,” then use AND NOT to prune it; you can also consider using AND NOT to filter out any undesired Web sites.

Implied Boolean: Plus and Minus Signs

In many search engines, the plus and minus symbols can be used as alternatives to full Boolean AND and AND NOT operators. The plus sign “+” is the equivalent of AND, and the minus sign “-” is the equivalent of AND NOT. Also note that there is no space between the plus or minus sign and the keyword.

A simple search requires the use of plus and minus rather than AND and AND NOT:

+television+commercial+blocker-vchip

+venus-planet

Use the search engine's simple search capabilities for implied Boolean (+/-) searches, and use the search engine's advanced search capabilities for full Boolean (AND, OR, AND NOT) searches.

Exercise in Implied Boolean Search: Plus and Minus Signs

Go to http://ixquick.com.

Consider the question: Does violence on television have an effect on children?

Enter query search: +violence+television+children.

Note that Ixquick will first attempt a Boolean AND search. If this is not successful, it will attempt to find documents with any of your search terms. Putting in the plus signs +s in the actual query should ensure that all of the terms appear in the search results.


Phrase Searching

Surrounding a group of words with double quotes tells the search engine to only retrieve documents in which those words appear side-by-side. Phrase searching is a powerful search technique for significantly narrowing your search results, and it should be used as often as possible:

“television commercial blocker”

“tv commercial blocker”

For best results, combine phrase searching with implied Boolean (+/-) or full Boolean (AND, OR, and AND NOT) logic. The following shows the same example in two different expressions of Boolean:

+“television commercial”+blocker+filter-vchip

(television AND commercial) AND blocker AND filter AND NOT vchip

As you will note, these are identical search command phrases written in two different ways: Implied Boolean and Full Boolean.

Plural Forms, Capital Letters, and Alternate Spellings

Most search engines interpret lowercase letters as either upper- or lowercase characters. Thus, if you want both upper- and lowercase occurrences returned, type your keywords in all lowercase letters, as it does not matter.

Like capitalization, most search engines interpret singular keywords as singular or plural. If you want plural forms only, be sure to make your keywords plural.

A few search engines support truncation or wildcard features that allow variations in spelling or word forms. The asterisk “*” symbol tells the search engine to return alternate spellings for a word, at the point that the asterisk appears. For example, the term advertis* returns Web pages with advertisement, advertising, advertiser, and advertise.

Field Search

Field searching is one of the most effective techniques for narrowing results and getting the most relevant Web sites listed at the top of the results page. A Web page is composed of a number of fields, such as title, domain, host, URL, and link.

The effectiveness of your search increases as you combine field searches with phrase searches and implied Boolean logic. For example, if you wanted to find information about television commercial blocking, try

+title:“television commercial” WITH blocker AND NOT vchip+device

title:“Television Commercial” AND Blocker AND Filter

This TITLE SEARCH example instructs the search engine to return Web pages where the phrase Television Commercial appears in the title and the words Blocker and Filter appear somewhere on the page. Also note that in these examples, there is no space between the + or -.

Domain Search

In addition to the title search, other helpful field searching strategies include the domain search, the host search, the link search, and the URL search.

DOMAIN SEARCH allows you to limit results to certain domains such as Web sites from the United Kingdom (.uk), educational institutions (.edu), or government sites (.gov).

+domain:uk+“Tivo Wireless” +Internet

+domain:com+“Tivo Wireless” +Internet

+domain:org+“Tivo Wireless” +Internet

The current, most widely accepted U.S. domains are published as the following:

.com— A commercial business

.edu— An educational institution

.gov— A governmental institution

.org— A non-profit organization

.mil— A military site

.net— A network site

Most Web sites originating outside of the U.S. have a country domain indicating the country of origin. For a current list of all country domains, visit the Internet country-codes site.[3]

[3] For more information, please reference http://ftp.ics.uci.edu/pub/websoft/wwstat/ country-codes.txt.

Link Search

Use LINK SEARCH when you want to know what Web sites are linked to a particular site of interest. For example, if you have a home page and you are wondering if anyone has put a link to your page on his or her Web site, use the LINK SEARCH command. Researchers use link searches for conducting backward citations:

link:www.ibm.com

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.116.50