Other Engines

This book is not large enough for a discussion of every general search engine on the Web—there are literally hundreds of them. But I do want to take one more section and discuss some search engines that don't need their own chapters but are worthy of discussion.

Open Directory Project—http://www.dmoz.org

The Open Directory Project, also known as the ODP, is a searchable subject index of over four million sites. Like Yahoo, it's a searchable subject index. Unlike Yahoo, it contains no other properties—no news search, no mailing lists, no nothin'. Furthermore, it's a volunteer effort, maintained by an army of over 60,000 editors managing over half a million categories.

It's an amazing effort, and for the most part it's a very good directory. But because volunteers maintain the site, quality sometimes appears uneven; some categories are more active than others. Search is with plain keywords, with a Boolean default of AND. (You can also specify NOT with - and phrases with quotes.) If you go to the advanced search you'll think that your advanced search options are fairly limited. The advanced search allows you to limit your results by top-level category or limit your search to site listing or to categories (so you might search for “Dallas” and find the 42 categories that include the word “Dallas” but not the 4,000+ site listings that contain the word “Dallas”). You can also limit those search results to kid-and teen-appropriate sites.

If you go past the advanced search page and look on the search page, you'll see that you do have some search syntax options:

t: Search site titles only

u: Search site URLs only

d: Search site descriptions only

Of these three, the u: syntax is the least useful unless you want to get a sense of how many pages in a particular domain are indexed by the Open Directory Project. The ODP also offers the ability to list all pages from a certain domain, if you just search for the domain name itself. For example, searching the Open Directory Project for AskJeeves.com finds one category and one listing for a page at AskJeeves.com. (Beware of sites listed under multiple domain names; the Ask Jeeves search engine itself is actually listed under ask.com.)

The ODP also allows some stemming; you can add a wildcard (*) to the end of a word and get all the variants for that word. The search term mili* will find military, militia, etc.

The ODP doesn't get all the press that Google or Yahoo do, but it's important because of how often its data is used. The ODP makes its data freely available to other sites who want to use it, so you'll see ODP categories included all over the place. Knowing how to search that data will help you get around far more sites than just the ODP itself.

Gigablast—http://www.gigablast.com

Have you ever heard of Gigablast? Probably not. Gigablast is not a corporation but a guy named Matt Wells who's happened to put together quite a good search engine. Unlike most search engines, Gigablast defaults to the Boolean OR (and warns you about it in the search results, putting those results that do have all your search words at the top of the results and separating them from other results with a blue bar). Gigablast also supports not (AND NOT) as well as or (OR).

Skiing AND NOT "cross-country" AND NOT snowboarding

is a legitimate search in Gigablast.

Gigablast, though it's a full-text search engines, indexes far fewer pages than Google. So don't expect to get nearly as many results. You can use the following special syntaxes:

suburl: keyword is in the URL

site: query is from the specified site (you must specify an entire domain—like unc.edu—and not just a top-level domain like edu)

url: searches for an entire URL

title: searches for a keyword in Web page titles

ip: searches for results from a specified IP address

link: searches for pages that link to a specified URL

type: searches for Web pages of a specified type. Types include PDF (type:pdf), Microsoft Word (type:doc), Microsoft Excel (type:xls), Microsoft PowerPoint (type:ppt), Postscript (type:ps), and plain text documents (type:txt).

Gigablast's search results look at first like Google's. There's a place for a cached result (though in this case it's called an archived copy) as well as information about the title and URL of the page. But there's also information about when the page was indexed (which in most cases Google does not provide) and an “older copies” link that leads to the Internet Archive, a vast repository of archived Web content gathered over a long period of time.

02-01. Gigablast is not as well known as Google, but it has many interesting search result features. (Image from http://www.gigablast.com/search?k7h=552375&q=%22web+search%22.)


Gigablast has gotten some attention thus far, but not as much as I feel it might. This search engine is constantly undergoing improvement; if it indexes a lot more pages it could become a legitimate—and very independent—Google contender.

LookSmart/WiseNut—http://www.looksmart.com and http://www.wisenut.com

LookSmart and WiseNut are owned by the same company, but they're different search engines. LookSmart is a searchable subject index, while WiseNut is a full-text index.

LookSmart defaults to the Boolean AND; it is a basic keyword search with not much in the line of special syntaxes. Descriptions of sites are pretty good, but since listing in LookSmart nowadays requires money, your search results will be slanted more toward businesses and those who can afford to pay for a search engine listing.

WiseNut defaults to Boolean AND, though you can't be sure how many results you might get as it provides no count of how many pages are in its index. An advanced search (called WiseSearch) allows you to use a series of query boxes to specify allowed words, disallowed words, and phrases in the query. There's a preference page that lets you tweak how the search results display, but there's not much else in the way of advanced searching.

Because WiseNut doesn't indicate how many pages it indexes, it doesn't seem to offer much in the way of advanced searching, and because when I tried to review how to submit a page (to see if it was free or not—nonfree submission search engines have a very different index than those that allow free submissions) it timed out on me, I tend to put this one at the bottom of my searching heap. As a searcher you want to know how many pages are available in an index and how the pages get there (Are they paid for? Are the submissions free? Is the index purchased from somewhere else?), and if that information isn't available, that should send you a warning sign.

Ask Jeeves/Teoma—http://www.askjeeves.com and http://www.teoma.com

Back during the Internet boom days, Ask Jeeves was truly dynamic and innovative. And they proved it by doing things like advertising their search engine on fruit labels (this was back during the boom time, remember). What was innovative about their offerings was that they allowed a searcher to do natural language searching: enter a question instead of a search query (“Why is the sky blue?”) and Ask Jeeves would attempt to answer the question.

However, that was then and this is now, and while Ask Jeeves does attempt to answer natural language questions, it also provides you sponsored (paid for) Web results in addition to regular Web results. Ask it why the sky is blue and it'll tell you at the top of the page. It'll then give you a sponsored Web result (at this writing for an eBay affiliate) and from there give you results that mostly seem to come from pages that contain the words “blue” and “sky.” So while you can get good answers to natural language questions (if you ask one it knows—don't ask it “Why does an elephant have a long nose?”), the relevance of its Web search results, in my opinion, leave a bit to be desired. If you really want to search an Ask Jeeves engine and you don't have a natural language question, use Teoma.

TIP

Is it always better to use the search engine that indexes the most pages? In other words, is bigger always better? In a word, no. No search engine indexes the entire Internet, and not every page indexed by a search engine is going to be relevant or useful. Your focus should be on getting the most useful results possible, not just getting the most results.


Teoma to me is much more interesting than Ask Jeeves, though also much newer. It's a full-text search engine that indexes about a billion pages.

Teoma defaults to Boolean AND, with a much more robust advanced search than WiseNut's. With the advanced search you can limit your search to the page title or URL, and limit results by domain or geographic region. You can also do limitations by date. Surprisingly enough the search Why is the sky blue? (no quotes) provides a reasonable result on the first page, though you also get a lot of irrelevant pages containing the phrase “blue sky.”

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.0.13