7. Trapping Topic-Based Information

In the last chapter, we looked at trapping internal information—information such as who’s linking to your site, or who’s reading your RSS feed.

While that kind of trapping has its place, especially if you’re trying to make a living from your Web site, a broader use for information trapping lies in trapping external, topic-based information. You might want to use the external information you trap to create content for your site, pursue a degree, educate yourself, and so on. The possibilities for the kinds of topic searching you can do are almost endless.

Unfortunately, so are the resources on the Internet! So now it’s time to look at some large categories of information that you can monitor, from really broad, like general search engines, to more narrow, like government sites. Along the way, I’ll provide some hints for searching particular kinds of sites and some things to keep in mind when you’re building queries. In later chapters, we look at other trapping options, including multimedia sources.

For the most part, your largest data pools will come from general search engines, and your smallest pools will derive from very focused sites like city and state resources. Because of the different kinds of information offered at different types of sites, you have to build your traps a bit differently for each resource. But the underlying theory is the same: complete but manageable.

Monitoring General Search Engines

General search engines, the original foundation for information search on the Internet, are starting to show their age. They’re large, unstructured, chaotic pools of information that can be really tough to trap. On the other hand, they are the original foundation for Internet search and therefore people who have no idea what an RSS feed is, or how to get listed in a specialty search engine, will make really sure that they’re listed in a general search engine. And that, in a nutshell, is why you should consider trapping in them, even though getting the perfect query together and monitoring results can be a bit frustrating.

Google

General search engines are by definition going to have very large data pools. And Google’s no exception. Google (google.com) has stopped providing a page count for its search engine, but the last time it did, Google had over 8 billion documents in its index, and I’m confident that index hasn’t gotten any smaller! The advantage to that is that Google indexes astonishing amounts of material, from every possible corner of the Web.

You can monitor general search engines not only for links to your sites, as we discussed in the last chapter, but also for long queries focused on unusual, narrow topics. When applying the theory of onions to full-text search engines, make sure your topics are as narrow as possible. Google’s query limit is 32 words—you want to push that limit!

Building your queries

Start building your Google queries using as many words as possible. Remember, you can always remove words later. Use as many of the special syntax options as you can think of. The title and site syntax will probably work best. Avoid inurl—it’s difficult to guess which words might be included in the URL of a Web site.


Hints

• Explore the Advanced Search page to see if it can help you. For example, say you’re searching mostly in English and your query has a few French words. Limit your search to pages in English and see how that changes your results. You might also want to limit the country your results are coming from, if applicable, though in my experience that limits your results too much.

• If you put a tilde (~) in front of a word for which you’re searching, Google will find synonyms for the word in addition to the word itself. For example, ~television would find news, TV, network, etc. You can tell which words are matching your tilded query because they’re bolded in the search result snippets. Try experimenting with that. For much fun, search for a word with a tilde and then exclude the word itself, like this: ~television –television.


Trapping

For e-mail alerts of new pages added to Google’s index, use the Google Alerts feature (google.com/alerts) we talked about earlier in Chapter 4. If you want your results in RSS format, use Google Alert (googlealert.com), which we also talked about in Chapter 4.

Possibilities

• Try medical terms using the site:edu syntax.

• Try academic terms with the same syntax.

• Try any topical keyword combined with the intitle syntax.

• Use quotes to group your words into phrases as much as possible.

• Is there any word that will help your search if you make sure that it doesn’t appear in your queries? Try that by putting a minus in front of the word (-birds).

Yahoo

When it comes to search engine wars, the two biggies duking it out are Yahoo and Google. Because of that, Yahoo (yahoo.com) is being very proactive in developing new syntax, updating its index of Web sites, and making its Web search easier and easier to use. Yahoo started as a search engine directory, and while the directory still exists, Yahoo is much more of a full-text search engine now.

Building your queries

Like Google, Yahoo has a variety of syntax available and a large query limit. Maximize both as much as you can. Yahoo’s also got a couple of interesting ways to narrow your results, such as searching for results only in Creative Commons (a “some rights reserved” alternative to full copyright, “all rights reserved” protection) content. There’s another option for searching Yahoo’s subscription-based content. You can get to Yahoo’s Advanced Web Search page directly at http://search.yahoo.com/search/options.


Hints

• Use quotes for phrases.

• Exclude words from your search results when appropriate.

• Use the special content searches with more general queries to narrow your results.

• Note that Yahoo’s results show when a site has an RSS feed. This could be a useful RSS feed discovery tool for you.


Trapping

Yahoo does not offer alerts for page search results, so set Yahoo’s preferences to show 100 results per page. Then run your search, and afterward monitor that page of results. That monitored page will look for changes in only the top 100 results, which is another reason why you need to make sure your query is narrow enough.

Yahoo also offers RSS feeds for search results, though you have to jump through a couple of hoops to create them—namely, you need to edit a URL. Here’s how:

The base URL for getting an RSS feed of Yahoo’s search results looks like this:

http://api.search.yahoo.com/WebSearchService/rss/webSearch.
xml?appid=yahoosearchwebrss&query=keyword

You need to change keyword to the appropriate query word. If you want to use quotes or several words, you need to make sure they’re encoded properly (encoded just means writing characters in a certain way so that the browser can read them in the URL).

For instance, you want to combine words using the plus sign (+). To search for “three blind mice,” use three+blind+mice. To remove words, use the minus sign (-). To remove “mice” from the equation, for example, use three+blind-mice. To establish phrases, use %22 at the beginning and at the end of your query. For example, to find the phrase “three blind mice,” use %22three+blind+mice%22.

Possibilities

Try to see what you can pull out of Yahoo’s specialty searches:

• Is there potential for articles related to your topic?

• Should you be searching Creative Commons content to see if there’s material you can license and use on your site or quote from on your site?

In addition to the Web site, Yahoo has a directory, which you can also monitor. Monitoring directories, as we’ve discussed, is a little different than monitoring full-text search engines. Your queries have to be more general. In the case of Yahoo, it’s also easier because of some of the RSS feed offerings.

Yahoo Directory

Yahoo’s Directory (dir.yahoo.com) is just what it sounds like: a searchable subject index of sites and descriptions. In my experience, it is not as dynamic as Yahoo’s search engine, but is still useful to monitor. And Yahoo makes it a little easier too, depending on what you’re looking at.

Building your queries

For searching most directories, I don’t bother to build queries. Instead I peruse the directory structure and find the subcategories that most accurately reflect what I’m interested in.

Say I’m interested in lions. I browse the directory, from Science to Animals to Mammals to Lions. The subject “lions” has its own page, but there’s also a category for Ligers and Tigons (lion/tiger hybrids). As of this writing, there are only five links on this page, so it would be easy to monitor this URL using a page monitor.

In general, the deeper you go into the subcategories, the shorter the list of sites for each subcategory will be. But if you must monitor a general category, I recommend checking to see whether Yahoo Directory has an RSS feed for the latest update. The directory doesn’t have feeds for every category and subcategory, but it does have an extensive number of feeds for higher-level categories. You can see it at http://dir.yahoo.com/rss/dir/index.php. For the most part, these feeds don’t go more than one or two levels deep.


Hints

• If you can’t find, or immediately think of, a category that’s relevant to your topic, search for a couple of topic keywords. Related directory categories will appear at the top of the search results and might give you some hitherto-unconsidered ideas of other directory pages to monitor.

• Remember that quotes in phrases work in a directory as well, as does the ability to exclude keywords.


Trapping

If you want to monitor for new sites in one of the more general categories, check to see if Yahoo has an RSS feed for that category. In the case of the sub-sub-subcategories, they usually have a sufficiently small number of links in which you can use a page monitor to check them for changes (Figure 7.1).

Figure 7.1. This page is small enough that you can monitor it for changes.

Image

Possibilities

• A directory is your chance to use more general keywords than you would with a full-text search engine. So experiment with all those two-and-three-word queries that were getting you too many results in the Google searches. What directory categories are they matching up to? Are they categories that would be worth monitoring?

• What happens when you search for as general a topic as you can? Are you coming up with useful directories? Would it be worth it to dig down into some subdirectories and see what you find?

Ask

Several years ago, Ask Jeeves (the company has retired the butler and changed its name to Ask) was the serious also-ran of the search engine world—a definite third-tier search engine. Now, more and more, Ask (ask.com) is a definite player (Figure 7.2).

Figure 7.2. Ask has a very basic home page. Tools on the right point you to additional search options.

Image

Ask used to be known as a “natural language search engine,” that is, you could enter your search as a question and Ask would interpret your search and (hopefully) find what you were looking for, either on the Web or in its own index of questions. Now it’s more of a full-text engine, having integrated technology from Teoma, a search engine company that Ask purchased a few years ago.

Building your queries

Ask has a pretty spartan home page. There is an advanced search available at ask.com/webadvanced, however. Options on the Advanced page include the ability to narrow searches by the last time pages were updated or by their geographic location. You can also specify what words must or must not appear in a search, as well as words that should appear in a search. The “should search” makes your search a little fuzzier—it weighs results which have “should words” higher in the results, but doesn’t completely eliminate those results which don’t have the “should words.” If you want to use words that could really focus your topic but also have the potential to torch relevant results, take advantage of Ask’s “should search.”


Hints

• Take a look at the options to narrow your search on the right side of the results page. They vary a lot in how useful they are. Some of them are actually interesting and potentially useful, while others are downright ridiculous. Ditto for the related names search in the same place. Use these additional results as a spur for inspiration, but don’t expect them to be useful all of the time.

• Notice that each of Ask’s results has a little pair of binoculars next to it. Hold your mouse over that pair of binoculars and a little pop-up window displays, showing you what that page looks like (Figure 7.3). This can come in handy when you’re trying to get a quick idea of a site but don’t want to visit or check for page load.


Figure 7.3. Hold your mouse over the binoculars and you can quickly get an idea of whether a page is useful or not.

Image

Trapping

Ask doesn’t offer RSS feeds or e-mail alerts. As a workaround, visit the Preferences page (ask.com/webprefs) and change your result number to 100. Then do the searches you want and save the results in a page monitor. To narrow your search further, you can also use the Advanced Search page to restrict your results to more recently updated pages.

I am less excited by Ask’s current status (though it’s good) than by its future possibilities. In the past couple of years, the company has made huge strides in the online search engine world, mostly by doing things a little bit differently than the established search engines. Watch Ask for tools that will be useful to you, the information trapper.

Possibilities

Take advantage of Ask’s search suggestions and quick preview tools:

• Can Ask’s search suggestions help you refine your queries at all?

• Ask’s Preview tool is the place to quickly review large numbers of sites, so that you don’t have to visit each one. Take advantage of the binoculars feature.

Microsoft Live Search

Yahoo and Google are the major search engines right now. But of course things can change very quickly. And when a five-ton gorilla like Microsoft enters the scene, you can almost count on things changing. Microsoft entered the search engine arena with MSN Search, but it has rebranded its site as Microsoft Live Search (live.com/?searchonly=true). This new search engine should be on your list of resources to trap, partially because Microsoft is working incredibly hard to become a major player, and partially because it’s put together a few interesting innovations (Figure 7.4).

Figure 7.4. Microsoft Live Search.

Image

Building your queries

You might think at first glance that Live Search doesn’t have any special syntax. Microsoft certainly doesn’t go out of its way to highlight them on the front page. There are some available, though. Try the following syntax:

Contains. Looks for pages containing the file types you specify. For example, contains:mp3 would find pages that link to MP3 files. This is different from Google’s filetype operator, which would find the MP3 files themselves.

Intitle. Looks for words in the title.

Inbody. Looks only for words in the page body, not in the page title. Useful when you’re looking for words common to an HTML infrastructure, such as “title.”

Link. Finds pages that link to the specified URL.

Linkfromdomain. This is an unusual syntax! Linkfromdomain finds those links that are coming from a specified domain. Wondering who your competitors are linking to? Use this syntax to find out.

Prefer. Sort of a midway point between “or” and “and.” Add prefer: to a query keyword if you would like to have it in your results, but it isn’t absolutely necessary. (Got any query words you’re not absolutely certain about? This is the syntax for you.)


Hint

Live Search does have that single query box on the front page that searches several sets of data, including news and images. But Live Search doesn’t have an advanced search page. Instead it offers advanced search options on the search result pages. One of the options is a set of sliders that lets you change some of your search options without having to specify syntax or query words.


Trapping

Unlike Google and Yahoo, Live Search offers its search engine results in RSS format. To get an RSS feed for your search results, just add &format=rss to the end of any URL in your Live Search search results, like this:

http://search.live.com/results.aspx?q=woodworking&format=rss

Possibilities

Because Live Search offers RSS feeds, you can monitor a lot of searches without worrying about using page monitors or getting false positives. Take advantage of this feature and stock up with queries.

Open Directory Project

The Open Directory Project (dmoz.org) is one of the reasons my hair is rapidly turning gray. It offers so much and yet is so frustrating at the same time. On the one hand, Open Directory Project comprises a group of editors working together to build a searchable subject index, and I agree that the world needs more searchable subject indexes. On the other hand, it’s sometimes a bear to search, and you never know when the categories are going to be updated. Despite the drawbacks, in some ways I feel it’s the best directory out there, better even than Yahoo Directory.


Tip

Open Directory Project is the directory on which the Google Directory is based. But in my experience, the data in the Open Directory Project is updated much more often than the data in the Google Directory.


Building your queries

You can browse the Open Directory Project (ODP) just like you browse Yahoo’s Directory. However, it may not be as fruitful. Earlier in the chapter, we used an example of browsing for “lions” in Yahoo’s Directory, where we browsed from Science to Animals to Mammals to Lions. Using Open Directory Project, the pathway looks like this:

Top: Science: Biology: Flora and Fauna: Animalia: Chordata:
Mammalia: Carnivora: Felidae: Panthera: Lion

I could get as far as Animalia if I were browsing, but after that I’d be lost. I thought a “chordata” was something you play on a guitar. I recommend browsing through the categories until you’re stumped. Then use the search engine at the top of the page to search only within that category. From there you’re directed to the right places. Browsing and then searching is worth it: as of this writing, the Open Directory Project has more lion pages than the Yahoo Directory does.


Hint

Let your queries get a little more general, but drill down as much as you can into the directory structure before you start using the search box. It will save you a lot of time.


Trapping

The ODP doesn’t offer RSS, nor does it offer e-mail alerts. You have to choose the categories you want and then feed them to page monitors.

Possibilities

The ODP is a little different from the Yahoo Directory. In addition to indexing Web sites, it also indexes RSS feeds relevant to a category, as well as articles relevant to a category. So when you’re monitoring here, you’ll not only be able to keep an eye out for relevant and often very credible content, but you’ll also learn about RSS feeds that contain content relevant to the category. For these reasons, and because of the fact that I find this directory in some instances to be better populated than the Yahoo Directory, I recommend trying a few extra categories here. The extra monitoring time will be worth it.

We’ve covered five different directories and search engines here. But are those all the possible ones? Heavens no; there are hundreds out there! To cover them all would take up far too much space, but here are a couple more that I think you should at least consider when you’re preparing to trap the general Web. I don’t go as in-depth with these as I did with the first-tier search engines, but there’s enough information here that you should be able to do some experimentation on your own.

A9

Brought to you by Amazon, A9 (a9.com) offers several different types of searches, including Web, Images, Reference, Movies, and Books. The Web search results are provided by Google, massaged a bit by Amazon and Alexa (a service that provides traffic and popularity information about Web sites). Check out the book search with narrow queries. You’ll get “inside the book” results featuring book passages containing your search query (Figure 7.6).

Figure 7.6. Using A9 you can do a keyword search and get results from within books.

Image

No RSS feeds or e-mail alerts, unfortunately, so you have to monitor result pages.

Once thing you’ll notice when you visit A9 is a series of checkboxes that lets you apply other resources to the search results—resources like reference, news, yellow pages, and more (Figure 7.7).

Figure 7.7. A9 offers the ability to search a huge number of resources.

Image

You can get a giant overview of these extra searchable resources—over 450 of them—at opensearch.a9.com/searches.jsp.

There are so many resources here that there’s another search engine available just to find them. If you’re a registered user of A9, you can add these resources to A9 as extra columns on your search. If you’re not an A9 user, you can add these sources to your search temporarily. It’s easy to get overwhelmed here. On the other hand, it’s a simple way to dip your toe in the wealth of available specialty search engines.

Gigablast

Gigablast (gigablast.com) is a small company compared to Yahoo or Google, and contains a Web index that is a bit smaller as well. I’m including it here because there’s constant work going on to update it and add more features.

Gigablast is a full-text search engine that offers a number of search options, including for blogs, travel information, and government information. Search results provide related phrases and query words, and sometimes even related sites (Figure 7.8).

Figure 7.8. Gigablast shows related phrases at the top of its search results.

Image

Note that the Advanced Search page has an option to search within a list of sites you specify, which is handy if you want to search and monitor the results from a group of sites instead of just one. Gigablast includes “Giga Bits” in its search results, which may help give a few ideas for additional query words.

Monitoring News Search Engines

General search engines have the advantage of trying to index large quantities of the Web. They have the disadvantage of dealing with huge sets of information, of dealing with unstructured Web pages, and with indexing Web pages that haven’t been updated in months or years.

News search engines counterbalance these disadvantages. The sets of information they index are much smaller, they are pretty much all news, and most of the news search engines don’t keep their index for more than 30 days. If the topic you want to trap is in any way related to current events, news search engines are the way to go. Even if they aren’t—you’re into those old woodworking tools, for example—it’s worth it to set some traps at news search engines. You never know when a newspaper’s Lifestyle or Hobby section is going to interview some woodworker with the world’s largest old tools collection.

As you might imagine, several of the search engines offer their own news searches, but there are independent ones as well. We look at both in this section.

Yahoo News

Yahoo News (news.yahoo.com) looks somewhat overwhelming when you first get to the front page. There’s lots of news links, multiple tabs to various types of news, and even pointers to several other sites on Yahoo. Don’t worry about any of that. What you want to do is search, and searching will cut through all this extraneous stuff. The Advanced Search page is located at search.news.yahoo.com/usns/ynsearch/categories/advanced/index.html.

Building your queries

When you’re using the Advanced Search page, set the results to sort by date instead of by relevance (Figure 7.9).

Figure 7.9. The Advanced Search form has many options. Make sure you’re using the one to sort by date.

Image

The rest is up to you. Note that Yahoo divides the news into categories, which is one way to narrow your searches without having to figure out query words, and might be useful to those of you doing tricky searches. Say you’re trying to track news about certain pharmaceutical businesses. Using Yahoo News’ advanced search, you can search for the word pharmaceutical in the business category and get a start on a good query. From there you’d probably want to add the names of drugs, companies, and so on that you were trying to track. But the business category would ensure that you were getting more business-oriented results, and not science or medical results.


Hints

• Take advantage of the category offerings of Yahoo News.

• Use the location syntax. If the topic in which you’re interested has a geographical area, the location syntax can be a blessing. The Advanced Search page notes that you can search by state and by country, but in many cases you can also search by city as well. Try entering a search term and enter Austin in the location box. Notice that your search results are restricted to media in Austin, including television stations and newspapers. Nifty, huh? This is great for localized information trapping.


Trapping

For some reason news search engines have better trap mechanisms overall than Web search engines. Yahoo offers news alerts via the Yahoo Alerts system, which we talked about earlier, or via RSS feeds. RSS feeds are available on the right side of the front news page. Yahoo News also has topical RSS feeds, which you may find useful depending on how general your topic is. You can get a full list of those feeds at news.yahoo.com/rss.

Possibilities

• Yahoo News’ search indexes a lot of general news searches but puts them into categories. If possible, try developing a query that relies on the categories to do the heavy work and lets you search using more general keywords.

Sometimes the location search lets you search for media sources in a single city. See if it’s possible to work that into your search.

• Don’t forget to take advantage of Yahoo News’ wonderful RSS feeds.

Google News

For a long time Google News (news.google.com) was my favorite news search engine. And while other competitors have entered the scene, it still has a lot to offer, with over 4,500 sources and both e-mail alerts and RSS feeds available.

Building your queries

Google News has an excellent advanced search available at news.google.com/advanced_news_search. Here you can narrow your search by location (sources available in a particular country, or state in the United States). You also can narrow your query word search to the headline of the story (great if you’ve got a keyword that just can’t be narrowed down).


Hint

Google News, as you’ve seen, has some great options for advanced searching. Take advantage of them. The title search is immeasurably useful when you’re having trouble narrowing down your query. In addition, I find the ability to narrow my search to sources in a particular state or country also useful. Later in the chapter we’re going to look at monitoring the news and issues of a country. Google News’ location syntax (as well as Yahoo News’ advanced search) is a big part of that.


Trapping

Google News offers e-mail updates via its Google Alerts product, as well as RSS and Atom feeds. At the end of every results page, check out the left side of the page. There, under the page navigation, are text links to RSS and Atom versions of your search.

Possibilities

Google claims to have 4,500 news sources in its index, and has claimed this for years. My guess would be that it uses a lot more. Try some of your more obscure queries on the wide range of content indexed here.

MSNBC

Unfortunately, the MSN news search (newsbot.msnbc.msn.com) does not offer an advanced search form or even much searching help. However, it does offer quick searching, RSS feeds, and a source list at newsbot.msnbc.msn.com/s/publishers.aspx so that you can determine what you’re actually searching through.

Building your queries

You’re not going to have a lot of choices with MSNBC and there’s not a lot of guidance for special syntax. So just use as many words as you can to narrow your query.


Hint

It’s challenging to really narrow your query without special syntax, but try. Go over the list of sources to make sure this is an engine you want to search.


Trapping

Despite the fact that the home page for MSN News Search is on MSNBC, the results page is on Live Search. As with Live Search, just add &format=rss to the end of your search results URL to get an RSS feed.

Possibilities

News search engines are quickly becoming an expected feature of Web search engines, but there are other specific engines too—ones that have indexes that go back further than 30 days and ones that index things that might be tough to find on the Web.

FindArticles

FindArticles (findarticles.com) is affiliated with the LookSmart search engine. But as a search engine, LookSmart isn’t all that popular nowadays, whereas FindArticles is. It indexes over 10 million articles, many of which FindArticles claims can’t be found on the regular Web. Some of the articles require payment, but FindArticles also has extensive free information as well.

Building your queries

Use the Advanced Search form. There are a couple of unusual options here. The most useful one for you is the ability to browse the publications list and select which publications you’re going to search, from 1UP to the St. James Encyclopedia of Pop Culture. You can search one periodical at a time or many (Figure 7.10).

Figure 7.10. This compact Advanced Search form lets you list a huge number of periodicals in your search.

Image

Another useful feature is the Show Premium Content option at the bottom of the page. Premium content is content that will cost money to look at. I usually turn that off, although for academic, legal, and medical searches, you might want to consider leaving it on.


Hints

• It will take some time to scroll through all the available publications and pick out which ones you want, but this time investment is worth being able to search a very specific set of publications. If you don’t have the patience or can’t get enough information from the publication names to know if they are appropriate for your topic, see if you can use the category listings below the publication listing.

• You can restrict your results by how many pages the article has. If you’re trapping a topic area that might get you a lot of false positives—say, if you’re searching for some aspect of business and you keep finding announcements about promotions and business moves and such—you might want to set this to get results of over one page. That way you can avoid all the roundup articles of Joe Smith moving to this company and Jane Doe getting promoted to that CEO job.

• Be sure to set your results so that they sort from newest to oldest!


Trapping

FindArticles offers RSS feeds. At the bottom of the search results is the ubiquitous orange button and a link to an RSS feed.

Possibilities

• FindArticles makes it very easy to search at a detailed level by source and even by article size. Make the most of these options. Try doing general searches among a particular set of sources.

• Occasionally try searching for premium (paid) content and see what kind of information you find.

• Try a few very specific searches—FindArticles indexes lots of vertical-market publications.

HighBeam Library

HighBeam Library (highbeam.com/library/) is a paid service. You can search for free—and I recommend you do—so that you get a good sense of whether HighBeam is useful to you or not. If you want to access full articles, it is going to cost you. But there is enough here, it is inexpensive enough, and the trapping options are sufficiently broad that I think it’s worth it.

Building your queries

Again, go straight to the Advanced Search page. Like FindArticles, you can pick publications to search. Unfortunately, you can include or exclude only five publications at a time. Note that you can also include or exclude certain types of information, such as newspapers, images, books, transcripts, maps, and so on, rather than choosing from categories of information, such as health, business, sports, and the like (Figure 7.11).

Figure 7.11. HighBeam lets you search not only by specific articles, but also by publication type.

Image

Unless you’re researching a very unusual topic, you can safely leave out the maps. The almanacs and dictionaries are questionable as well.


Hint

Check out the Web search. HighBeam offers a Research Group option, which lets you set up a specific set of resources that you want to search. At first glance, the available resources don’t look like a lot, until you realize that you have the opportunity to choose aggregate sources like Google News, and very large sources like PR Newswire. Check the sources here and see if they’d be useful to you.


Trapping

HighBeam offers several opportunities for trapping searches. You can set up an RSS feed based on keyword searching (and actually you can set up RSS feeds for publications, which will cover new articles released in the publications you specify). You may also set up e-mail alerts based on keyword searches. And you can save your searches for later use and review them if you’ve got something that isn’t quite worth an alert, but you want to save it for later.

All this convenience isn’t free (although a free trial is available). If you wish to subscribe to HighBeam, it will cost you $19.95 for a monthly subscription or $99.95 a year.

Possibilities

• HighBeam’s advanced search is a wonderful thing. Take advantage, if you can, of extremely narrow queries like the author search: is there a noted journalist in your field that you could be monitoring?

• Restricting your searches by media type; running a more general search and restricting the results to, say, just image and transcripts, could yield some very interesting results.

Hoovers

Hoovers (hoovers.com) has a well-deserved reputation as a premium service that provides business information, but what’s less known is that it has both a nice news search engine and a business search engine—and it offers e-mail alerts!

Hoovers is not cheap. Subscriptions start at $599 a year and go up to over $10,000 a year (discounts are offered to nonprofits, and periodically special offers are available). However, if you’re a business searcher you’ll find a lot to love about Hoovers.

Hoovers actually gives you the opportunity to do three types of information trapping: Saved Searches, Watch Lists, and e-mail alerts. Let’s take these one at a time. This information-trapping tool gives you the opportunity to get general news alerts and specific company information at the same time.

Saved Searches

The Saved Search page lets you build several different types of saved searches, including company searches, such as companies within a certain radius of a certain place with a certain number of employees, and so on (Figure 7.12).

Figure 7.12. Want to do business monitoring within a specified area? The Saved Search is a “must try.”

Image

The nuts and bolts of all the different types of Saved Searches is a little outside the scope of this book, so I encourage you to investigate Hoover’s offerings. Meanwhile let’s take a look at the more prosaic trapping features.

Hoovers’ Saved Searches can build a custom news search that lets you search both by keyword and company stock ticker. You won’t have the option to choose between publications, but you can choose whether you want to search for news stories, press releases, or both.

You can also search for companies that will or are in the process of filing for an IPO. You can search underwriters, location (state or metro area), and industry. And the stock screener lets you specify a series of data points about publicly traded companies and receive listings of stocks that match those data points (again, this is a little outside the scope of this book, but worth investigating because this type of information isn’t easy to find on the Web).

Watch Lists

Hoovers’ Watch List feature is much simpler than Saved Searches. Enter a company name or a stock ticker, and you get the option to sign up for significant developments (major news concerning the company, provided daily), news alerts, and press releases (either delivered daily or as they’re available) (Figure 7.13).

Figure 7.13. Hoovers lets you combine stock symbols with keywords for e-mail alerts.

Image

In addition, there’s also the option to add a keyword to the watch list. Be sure to take advantage of this if your interest in a company is specific—for example, if you want to monitor Coke to see when it starts using Splenda in its diet drinks, enter Splenda in the keyword box.

E-mail alerts

The Watch Lists and Saved Searches have e-mail components, but they’re also saved to a My Hoovers page that aggregates all that information. On the other hand, e-mail alerts are just what they sound like: major e-mail. The e-mail alerts screen lets you choose e-mail alerts for stock ticker symbols (or even multiple symbols within the same story) or specified keywords. You can have news alerts delivered to you as they occur or in a daily digest (you specify the hour). As you might expect, the available news search slants toward the business and the general, but it’s fairly extensive.


Warning!

If you get Hoovers e-mail alerts sent to you “as they occur,” you could get a lot of e-mail!


Building your queries

Hoovers does not offer much in the line of fancy syntax, so stick with general keywords and phrases.


Hints

• Hoovers gives you an opportunity to do monitoring for stock tickers combined with keywords. Take advantage of this feature.

• If you’re interested in business, try some of Hoovers’ more abstract searching using the IPO targeting and company builder features.


Trapping

Hoovers does offer a lot of e-mail alerts, but some of its searches are relegated to the My Hoovers page. I would make as much use of the e-mail alerts as possible and remind myself to periodically visit the My Hoovers page.

Possibilities

Hoovers costs a lot of money, so if you don’t take advantage of as many features as possible, you’re wasting some serious cash. If you do any kind of business monitoring, you’ve got to make the most of this site. How could you best monitor using a combination of stock symbols and keywords? Are there any ways you could make really unusual searches, such as the ability to find a company within a specified radius, applicable to your search?

Northern Light

Northern Light (nlresearch.com) used to be both a Web search engine and a news search engine that offered free news alerts. The site has undergone a couple of incarnations since that time and has now become a paid service, but it’s worth the $9.95 a month, especially if you’re looking for news in periodicals that cater to a particular industry. There is a 30-day trial period available, and searching the available libraries is free. Accessing full-text articles will cost you, however.

Building your queries

Visit Northern Light’s Search Help page at nlresearch.com/help.php. Northern Light offers a wide variety of advanced searching, including the ability to use stemming in your searches. Stemming involves searching for a string of letters with a special character at the end—usually an asterisk. A search engine that allows stemming will find all versions of that string of letters. For example, searching for moon* would find moon, moons, moonlight, moonbeam, and so on.


Hints

• Take advantage of Northern Light’s ability to narrow the searches by industry (see the trapping notes that follow).

• If you’re interested in monitoring a particular industry or business within an industry but you’re having a hard time generating the perfect query, Northern Light’s search alerts menu is a blessing.


Trapping

Once you’re registered and logged in to Northern Light, you’ll see several tabs. Click My Alerts, and a page displays that has a section labeled My Search Alerts. From there, you’ll see a Create New Alert button. When you click that button, a form displays for creating a narrowly focused alert (Figure 7.14).

Figure 7.14. A lot to fill out, but you can really narrow your queries.

Image

You can limit your search by business function (sales and marketing, human resource management, and so on). You can also limit your search by Websites, News, and News Archives, but I recommend keeping the search focused on Publications, which finds everything. Use the Preview Results button to check and make sure you’re getting a good number of results—not too many, and not zero. (Use the radio button on the query page to ensure that you’re getting results sorted by date and not relevance, so you can easily see how many results for your search have been generated in the last few months.)

When you’ve adjusted the query and the publications so you’re getting a useful number of results, click the Save Alert button. You’ll be kicked back to the original Search Alerts page, only this time there is a saved alert available. You can review any results since you last looked at the alert, as well as edit the alert (change the search parameters) or delete it.

One caveat: when Northern Light sends you e-mail alerts, they give you only a link back to the content, not a summary of the content in the articles found. Furthermore, the information is sent in an HTML e-mail. If you use an e-mail account that is not HTML-capable, then you won’t be able to link back to Northern Light’s alerts.

Possibilities

We’ve looked at search engines that break out searches by publication, and another engine that breaks out searches by kind of content. Northern Light breaks down its sources by kind of research. See if you can use specific types of research when building your research queries.

News searching is the cornerstone of information trapping. When you’re trapping at news search engines, you’re referring to a constantly updating pool of information that is, on the whole, far more credible than the entire Web. But the downside is that you’re missing things like personal commentary and information and commentary on topics that have not attracted the notice of the media. For these types of searches, you need to be monitoring blogs.

Searching Blogs

Blogs don’t have the credibility of established media, but as a resource for commentary, discussion, unusual perspectives on current events, and coverage of topics that might not have hit the radar of the media, they are invaluable. True, you will have to turn your—ahem—bullpuckey detector way up. On the other hand, you’ll find resources and perspectives that you won’t find in major media in a month of Sundays.

Feedster

We looked at Feedster (feedster.com) earlier in the book. It’s great for finding RSS feeds, and blog commentary as well!

Building your queries

Feedster indexes only RSS feeds, which means that its advanced search should be hideously complicated and let you do all kinds of strange and very detailed searching. But it doesn’t. It’s a very basic keyword search with the option to find links, entries from a particular feed, etc.

That’s not to say that there isn’t a couple of special syntax available. You can search for words in a title using a title syntax like this: title:keyword. You can search for one word or another using or. Remember that you’re searching blog commentary, so you might want to use more casual language or even try introducing a few misspellings into your queries if your search would seem to warrant it. For example, if you’re doing searches for layman commentary on medical issues, you might use less formal terms.


Hints

• Don’t be afraid to do a lot of different queries. Searching commentary isn’t like searching news stories.

• Try using slang and other informal language when doing your search.


Trapping

Feedster offers two ways to keep up with new search results. The first one is via RSS feed, which is the one I use. You’re also supposed to be able to get new results via e-mail alerts, but I’ve never had good luck with that. I set up the alerts, but I’ve never actually received any e-mails. I recommend sticking to the RSS feeds.

Possibilities

Unfortunately, Feedster and other search engines tend to get overwhelmed with results from splogs, or spam blogs. To avoid this, be sure to make your queries as specific as possible.

IceRocket blogs

IceRocket (icerocket.com) offers a variety of different searches that I encourage you to experiment with. However, we’ll focus on blog searching.

Building your queries

IceRocket doesn’t offer a bunch of special syntax, so focus on keywords. Take a look at the Advanced Search page to see if you can take advantage of searching by post title or tag (which I talk about in future chapters).


Hint

When you’re running some experimental queries, pay attention to IceRocket’s Search Results page. It shows you what “tags” are used in the posts that you’re finding, and those tags may in turn help you build better and more detailed searches.


Trapping

Check out the right side of the page for RSS feeds of your search.

Possibilities

Alhough IceRocket’s search offerings are rather thin, I like what it offers after the search results. Click the Trend It button at the top of the search results to see how popular your query term has been in blog posts (Figure 7.15).

Figure 7.15. IceRocket lets you view the popularity of your query term.

Image

This is a great way to find more popular query words that you can use with your topic. You may end up using this more than IceRocket’s blog search itself!

Blogdigger

Blogdigger (blogdigger.com) isn’t the best-known blog search engine in the world, but it’s got a solid pool of blogs that it searches and offers a couple of interesting features.

Building your queries

Like IceRocket, Blogdigger doesn’t offer much in the way of special syntax. Focus on keyword searches.


Hints

• In addition to a general blog search, Blogdigger offers Blogdigger Local, which allows you to narrow your search to a city and state. I find this works better when you search for a brand name or a proper name instead of a general search (if searching Colorado, using “Denver Post” works better than using “newspaper”). If you’re running a local business, this is a terrific way to get blogging feedback on your industry in your area.

• Blogdigger also offers Blogdigger Groups, which allows you to combine several blogs together (as long as they have RSS feeds). This is a handy way to review several blogs at once via an RSS feed. My only recommendation is that you make sure the blogs are not overly active, or you’ll have a hard time keeping up with them.


Trapping

Like IceRocket, the RSS feed for keyword searches is on the right side of the Search Results page. There’s a link for an Atom feed there, too.

Possibilities

• Blogdigger has low volumes of results, but also very low volumes of blog spam.

• Feel free to use more general queries here.

• Is there any way that your searching could take advantage of the Blogdigger Local feature?

Google Blog Search

With blogs being so hot, it’s no surprise that Google has picked up on the interest and made available its own blog search (google.com/blogsearch). And though in some respects it has a ways to go before it catches up with established blog search engines like Technorati, it’s still a useful addition to your monitoring toolkit.

Building your queries

You can build your queries using Google’s advanced blog search at google.com/blogsearch/advanced_blog_search. If you’re a frequent user of Google, this should look rather familiar; a lot of Google’s Web search technology transfers to the blog search. However, this doesn’t mean that you should use the blog search the same way you use a Web search. URL searches, for example, aren’t going to be very useful. Instead, try narrowing your query by limiting words to a blog post’s title, or by the language of the blog.


Hints

• Google’s blog search, for some reason, defaults to providing results in order of relevance. Use the link on the right side of the page to get the results by date.

• Use the “References” link with each search result to see other blog posts that link to the one in the search result.


Trapping

Google’s blog search makes trapping easy. At the bottom of each search result page are four links. Two are for getting the results as an Atom feed (10 or 100 results) and two of them are for getting the results as an RSS feed (again, 10 or 100 results). I recommend you stick with the ten results. If you’ve generated a search that provides 100 results at a time, your query is probably not narrow enough!

Possibilities

If you’ve used Google for any length of time, the syntax of Google’s blog search will be very familiar to you. Take advantage of that and go for building more complex queries.

Sphere

Sphere (sphere.com) is a newer blog search that’s got a very nice set of search results.

Building your queries

At first blush, Sphere doesn’t seem to have much in the line of special syntax, but if you check out the Hint page at sphere.com/tips, you’ll see a variety of available syntax, including page title, blog names, and domain name (Figure 7.16).

Figure 7.16. Sphere does not offer a lot of special syntax, but has an excellent Tips page.

Image


Hint

If you’re hunting more for blogs than for individual entries, try the Featured Blogs offering, which suggests blogs based on your keyword search. Sphere also offers blog profiles, which provide information about the activity level and content of each blog in the search result.


Trapping

Look for the orange RSS button at the top of the search results page.

Possibilities

• If you’re having a hard time finding blogs that match your topic, use Featured Blogs in conjunction with a couple of general keyword searches.

• If you are thinking about tracking a single blog and don’t know if it’s worthwhile, use the Blog Profiles feature to see how active the blog is and how much data you could potentially get from it.

Managing Keyword-Searchable RSS Feeds

I spend a lot of time in this book talking about RSS feeds, and hopefully at this point you’re as enthusiastic about them as I am. Okay, that may be a little too much to ask. But hopefully you find them as interesting as I do.

RSS feeds are one thing. But the next level beyond that—the next idea—is keyword-based feeds. Keyword-based feeds, as you might remember, are feeds based on searches on the query words that you specify. So instead of very general feeds—all the national news from CNN, for example—you can get feeds for just the town or the mayor of the town, or even the mayor’s favorite hobby.

Having RSS feeds that are very specifically focused on just the topics in which you’re interested will, as you might imagine, save you a lot of time when you’re checking your information traps. On the other hand, when you decide that you want to generate and use many keyword-based RSS feeds, you run into another problem: how to efficiently generate keyword-based RSS feeds. The problem is that you end up having to do so many of them.

Take Google News, for example. You might go to Google News and decide that you want to get three of its general RSS feeds: one for technology, one for business, and one for science. Compare that to getting keyword-based RSS feeds. You might decide that there are fifteen keyword-based RSS feeds that would be relevant and useful to your topic. So you’d have to run each of those fifteen searches and save them as an RSS feed. Now multiply those fifteen searches by every resource that offers keyword-based RSS feeds, and you’d spend an astonishing amount of time just setting up the feeds. I have attempted to solve that problem with a tool called Kebberfegg.

Using Kebberfegg

Kebberfegg, which you can find at kebberfegg.com, is a free service that attempts to make it very easy to set up keyword-based RSS feeds across many resources—over 3 dozen as of this writing (Figure 7.17). You can view your generated feeds as a plain HTML list, or you may get them as an OPML file that you can import into an RSS feed reader. (Think of an OPML file as a bookmark file for RSS feed readers.)

Figure 7.17. Kebberfegg can quickly generate lots and lots of RSS feeds.

Image

Let’s take a look at how it works. It’s really easy:

1. Enter in the query box the words you want to search. Then notice that beneath the query box is the category list.

2. Generate a keyword feed list for one category, or select multiple categories by pressing the Ctrl key. (You could also select all categories by pressing Ctrl + A, but I don’t recommend it—I’ll tell you why in a minute.)

3. Notice beneath the category selection the keyword feed sources from each category are displayed. You also have the option to receive the results in HTML or OPML. Choose HTML for now.

When the query is finished, you get a result that looks like Figure 7.18.

Figure 7.18. Kebberfegg gives you a list of keyword-based feeds in several different formats.

Image

4. For each result, you can add the feed to My Yahoo, forward it to your e-mail via RSSFwd, or a couple of other options. You can even just look at the plain RSS feed!

Using Kebberfegg generates a lot of RSS feeds. You may decide you don’t want as many, or you may decide that one RSS feed provides better results than the other ones you’ve generated. Because of that I recommend you not just dump the feeds into your RSS feed reader and prune them later. Instead, use the Add to My Yahoo button on each feed to get a preview of what the feed will look like. (You don’t have to have a Yahoo account to do this.) This will give you a page with the feeds’ last five headlines. A quick glance at those is often enough to let you know whether you want to monitor that feed or not.


Tip

Sometimes the My Yahoo preview doesn’t work, especially when there are no results for your keyword search. In that case, I would set the ones aside that look interesting but don’t work in Yahoo, and preview them one-by-one in an RSS-capable browser like Firefox.


Using OPML results

Instead of generating HTML files, advanced users may want to check out the OPML results, which generate a set of RSS feeds that you can import into an RSS feed reader. Getting your results this way will generate a list that you can save to your computer and then import into your feed reader. But the same warning applies: import these but look them over quickly before you integrate them into your monitoring routine. You’ll save yourself a lot of time in the long run by taking a few minutes now.

Breaking up your searches

Kebberfegg makes it really easy to generate keyword-based feeds: if you run three searches generating feeds from all categories, you’ve generated 100 feeds! Because of the very different sources from which you’re generating RSS feeds, you may want to break up your searches. For the tag site searches, you may want to get more general, since you’re searching only keywords. In fact, you may want to generate your keyword-based RSS feeds for tag sites separately from the other available keyword-based feeds, since you won’t feel the need to compromise between the narrow feeds made for news search engines, general search engines, and blog search, and the general feeds for tags and other few-keyword searches.

Kebberfegg can generate a lot of feeds for you, and save you a lot of time, but you’ve got to remember the rules: generate queries that are as narrow as possible considering what you’re searching, and make sure you review the feeds before you dump them all into your feed reader.

A Few More Sources?

So far we’ve looked at trapping on general search engines, news search engines, and blog search engines, and we’ve taken a look at a tool that allows you to generate several keyword-based feeds at one time. It’s good to have Web search engines in reserve to filter for and find the minutiae on your topics of interest. News search engines will keep you updated on recent happenings, and the blog search engines will point you toward commentary and that under-the-media-radar material.

Yet you’ve barely scratched the surface. If you use these three types of searches as the foundation of your trapping, you’ll be off to a good start. But if you use them to the exclusion of anything else, you’ll have a problem. There are many more types of search engines that could be monitored, but in this chapter we look at two more: commercial and governmental. We then look at finding other search engines that match your interests.


Tip

Because the rules for searching them are so different, we look at two other kinds of trapping—conversations and tags—in Chapter 9. In Chapter 8, I discuss the idea of trapping multimedia.


Trapping Commercial Information

Why would you want to find commercial information? Maybe there’s something you want to buy. Maybe you want to monitor Amazon for a certain book. Maybe you want to see how well your industry’s items sell on eBay. Maybe you want to know when prices are going up or down. I know some researchers disdain monitoring retail and commercial sites for information, saying, “Those sites are only about prices.” But thanks to supply and demand and capitalism, prices can give you an idea of how popular something is (or isn’t), and if demand for things is trending up or down. And besides, don’t you really want to know when the next Terry Pratchett book is coming out?

Amazon

It might seem weird to start with Amazon (amazon.com) since it’s an actual store, not a price aggregation or comparison site. But Amazon is huge, it sells practically everything, and it’s popular enough that customers can in some ways reflect the tastes of the Internet as a whole—at least the Internet in that country which Amazon is selling to (Amazon has several versions for several different countries).

Searching Amazon

Amazon has the usual single query box at the top of its pages, but that isn’t all Amazon offers by a long shot. Many of its categories of information have their own advanced search engines. Let’s look at books, since that’s what Amazon is known for. Take a look at the Advanced Search page (the direct URL is amazon.com/exec/obidos/ats-query-page/) and you can see that you can search for a huge variety of features, including ISBN, keyword, format, reader age, and so on. Other categories offer feature searches more relevant to their formats. The advanced search for DVDs is amazing (Figure 7.19)!

Figure 7.19. Amazon’s Advanced DVD Search offers a huge number of options.

Image

You won’t get these kinds of search options when you’re actually building your searches, but I recommend you use them when you’re making your test queries, if any of them are at all relevant to your search. They’ll bring to light books (or DVDs or electronics or CDs or whatever) that you might not have thought of, and might help you generate more query words. Asking you to use the wonderful advanced search options that Amazon gives and then telling you to go back and use the more limited alert services sounds counterintuitive, I know. But I want you to at least see what the possibilities are for your topic in the category you’re searching before you have to limit yourself with the trapping tools.

Third-party Amazon RSS

YayWasTaken.com offers a generator for Amazon keyword-based feeds at yaywastaken.com/amazon/. Because the feeds are keyword-based, they’re more extensive than the ones offered by Amazon. It also offers images, descriptions, and more information than Amazon’s feeds. Using the service is simple. Start by entering this URL:

yaywastaken.com/amazon/amazon-rss.asp?keywords=keyword

Replace keyword at the end of the URL with whatever keywords you’re interested in. Aside from that, the usual rules apply: get really specific and run some test queries (on Amazon) first.

If you don’t want to hack URLs but instead want to fill out a form and get a feed, try Paul Bausch’s Amazon RSS feed generator at onfocus.com/amafeed/. Specify the keyword and department in which you’re interested, and how you want to sort, and the generator will create a feed for you. You can also generate a feed based on Amazon’s power search syntax, if you’re familiar with it.

Page monitoring at Amazon

Maybe you’re really interested in a keyword-based search, or maybe you’re interested in monitoring an entire category. I feel for you! You have one more option, though it may not work that well for you. You have to use a page monitor. You can also use a page monitor if you don’t want to use RSS feeds. It’s just a bit tricky.

Run the keyword search of your choice. Your keyword should, on its list of results, generate a list of categories on its left that shows you which categories have results that match your search. Pick the category of your choice.

Say, for example, I want to monitor for software related to woodworking. I run a keyword search and pick “software.” There are only a few results and they all fit on one page, so I can just put this URL into a page monitor and I’ll be set (Figure 7.20).

Figure 7.20. An unusual keyword plus a smaller category equals easier monitoring.

Image

But say I wanted to monitor Home and Garden, which has well over 100 results. The first thing I would do is take a quick look-see at the results and see if there’s any keyword I can use to remove results en masse. I notice that there are several parts available from Woodstock International, and they’re really not what I’m looking for.

I change my search to woodworking -woodstock. That instantly reduces my search results to a manageable one page. Beyond that if I chose I could narrow my results more by choosing a subcategory (kitchen, home and decor), eliminating more query words, etc.

So use a combination of narrow query words and eliminating brand names and words that describe what you don’t want and try to get your results down to one page. Then put that page in a page monitor. It isn’t perfect, but it’s a possibility. Use Amazon’s alert offerings whenever you can, or carefully put together an RSS feed via a third-party offering.

eBay

I know I said that Amazon offers everything, but it’s nothing like eBay. eBay, an online auction site, not only offers many commercially available items, but items you may not have been aware were for sale, like advertising space on someone’s forehead, ghosts, religious icons found in slices of toast, and more. There are several ways to monitor eBay, from services that the site itself offers to third-party services. To use the services that eBay offers, it’s best that you have an account.

Searching eBay

Just like Amazon, you start with eBay’s advanced search. There are two major differences, however: eBay’s advanced search covers its entire site, and you can use the advanced search to build your traps.

I love eBay’s advanced search. It’s very different from any advanced search I’ve ever seen. You have some amazing search options here, including searching for items within a certain distance from a specified zip code or major city, searching for maximum and minimum prices, and searching for items that are listed with specific currencies (Figure 7.21). There are also more prosaic ways to search, like by category.

Figure 7.21. eBay’s advanced search even offers the ability to search by location.

Image

How you should use this search depends on what your goals are. First go through and narrow the search as much as you can. Start with the categories, then move to where the item is located. If it’s suitable, use the feature that lets you narrow your search to a specified radius. It can narrow your results tremendously. Run a few test queries and see what kind of results you’re getting.

Then you have a decision to make. Are you monitoring these items with the intention of possibly purchasing something in the future? Do you just want to get an idea of what’s ending up being listed online? Then you’re all set.


Tip

If you’re a retailer looking for wholesale lots on eBay, you may have noticed that sometimes the wholesale bidding gets very fast and very furious. I find that setting a trap for lots that have a “Buy it Now” price can give me a good heads-up on sets of items that can be bought immediately instead of having to go through a bidding war.


Maybe you’re a seller instead. Maybe you’re not interested in when items get listed, but instead want to know how much items are being sold for. That way you know what to price your items for, or maybe you want to get a sense of what a fair bid price would be since you’re planning to buy something expensive.

In that case, go to the top of the Advanced Search form and tick the Completed Listings Only box, which is right under the query box. You only receive an alert when a completed listing for your keywords shows up on your search results. Then you know whether an item sold or not, what it sold for, how many bids, etc.

Once you’ve generated a query, review the results. If they look good (useful to your topic, not too many of them) choose Add to Favorite Searches at the top of the page. You’re asked to name your new search. When you’ve chosen all that, you’re set. eBay will e-mail you whenever a new auction (or a completed auction, depending on your settings) matches your specified queries, for as long as you have indicated.

I love eBay’s ability to turn the advanced search into an e-mail alert. But there are other ways to track eBay as well. If you’re interested in RSS feeds, you’ll be pleased to know there’s a third-party service that generates eBay RSS feeds for you, in addition to eBay’s RSS feed offerings.

eBay RSS feed offerings

For a while, eBay has offered RSS feeds of general information about its site—announcements and so forth. But now it’s offering RSS feeds of its search results. Do a query, and then search at the bottom of the page for a line that’s marked Tools. You’ll see a little orange RSS feed icon on that line. That’s it! You’re all set.

eBay third-party services

RSS Auction, at www.rssauction.com, actually offers two levels of RSS feed generation. The first one looks an awful lot like eBay’s advanced search page! Fill that one out and specify how long you want the feed to be active (from 1 to 12 months) and RSS Auction will generate a feed for you. The second one will generate a feed of items being offered by one particular seller. With both searches you have the chance to mail the RSS feed to you (though I don’t know why you’d do that; eBay’s e-mail alerts are just fine). RSS Auction also sends out alerts for searches of Buy.com and Half.com.

Shopping and price researching can be fun, but let’s shift gears. Let’s shift gears a lot. Let’s go from things you might want to pay for to things you’ve already paid for via your taxes: government services. While government sites are not as advanced as some sections of the Internet in offering monitoring services and RSS feeds, they’re getting better all the time.

Trapping Government Information

One thing I like about tracking government information is that it’s so practical; by setting a few information traps you can become a more informed citizen and learn things that help you take advantage of government services in your community. Let’s start with the smallest common unit of government sites, the city site, and then expand to state sites. We’ll then look at national services and do a quick overview of international government sites.

City sites

Maybe you know the Web site for your city. In case you don’t, let me teach you a couple of tricks. Take the name of your city and plug it into a full-text search engine as part of the phrase "city of". Every time I try this trick it pops the official city site up to the top of the results. If you’re not sure your results represent the official site, look for one of three things: a URL ending in .us, a URL ending in .gov, or a description along the lines of “official site of the city of.” If you’re getting official sites, but not in the state for which you’re searching, add the state name as a keyword ("city of Springfield" Missouri).


Tip

Sometimes you’ll be interested in a city that’s fairly small. It may be so small it really doesn’t have a Web site, or there’s nothing much there. In this case, you may want to visit the state’s Web site. Often states have sections that contain information on their various municipalities.


What are you looking for?

Offerings at various city Web sites vary a lot. Generally the larger the city is, the greater the offerings are, but any city site that’s active at all generally has something. It’s important to be very specific about what you’re looking for.

Depending on your topic, you might look for events calendars, press releases, city council meeting notes, adverse weather announcements, warnings of scams and other crimes, or announcements of changes to city services. That’s not to say there isn’t much, much more at the average city site, such as online services to provide permits and such, searchable databases of city codes, budgets, contact lists, and so on.

Trapping the information

How deeply you want to trap on a city site depends on what you want to use it for. Want to be a better-informed, more active citizen? Then look for the What’s New page and put a page monitor on it if there’s no option for e-mail alerts or RSS feeds. Want to keep close tabs on what the city council is doing and the business of the city? Watch the press release pages and monitor the page that has the city council meeting notes.

City sites vary on how many built-in trapping mechanisms they offer. Sometimes they offer a newsletter that covers the entire site. Sometimes they offer e-mail alerts of press releases and alert type information (weather issues, etc). It’s rarer that RSS feeds are offered, but more and more feeds are being added to city sites all the time. For the most part, though, I suspect you’ll have to rely on your own page monitors. State Web sites have far more information, and from what I can tell are better prepared with alert services.

State sites

Finding a state Web site is simple. Just enter the URL state.xx.us in your browser, in which xx is the postal code of a state (NY, PA, SC, FL, and so on). Note that this does not work for DC (the address for the District of Columbia is www.dc.gov), or Puerto Rico. Sometimes this URL will redirect to a new one (if you start at state.hi.us, you’re taken to Hawaii.gov) but this URL will point you to the right place. As you would imagine the state sites offer a whole lot more in the line of services than city sites. They’ve got lots of services, lots of database lookups, and lots of communications from election officials.

Let’s use as an example the state of Wisconsin. It’s located at wisconsin.gov.


Tip

If you’re getting stuck finding state information, try govengine.com. This site divides out state government data into easy-to-parse lists of information. There are also some countrywide, but state-focused, resources here as well.


As you might imagine, a state’s Web site can be overwhelming. Not only does it have to encompass a state’s government, but also tourism issues, business considerations, and non-governmental services and opportunities for citizens (like job banks). Because of this, it’s easy to go to a government Web site and have your brain lock up!

Instead, let your eye wander over the front page. Some government sites have their information broken out by audience—businesses, visitors, residents, and so on. Others delineate their site by services. And still others go for recent news and pointers to featured offerings.

In the case of the state of Wisconsin, it’s a mix. The front page has news and updates in the middle, services and information divided into several broad categories on the left, and quick links to presumably popular products and services on the right. For you, the trapper, there are a couple of specific things to look for.


Tip

Sometimes it’s easier to find these things in a text-only site instead of a page that is heavy with graphics. Because of the laws regarding access for the disabled, most government sites have text-only versions. Look for a prominent “text only” link toward the top of the page.


The first thing to look for is some kind of press release page. Many time press releases for all state agencies and offerings are jammed together. Generally less useful is a What’s New page, which in my experience focuses more on the state’s Web site instead of offline services. Sometimes—more and more frequently—state sites will offer RSS feeds for their Web sites (some even offer e-mail alerts—try to take advantage of that).

The second thing to look for is some kind of membership offering. Some government Web sites offer free registration so that you can create your own My State page, which allows you to specify what you’re interested in and get everything on a customized page. Sometimes that’s it—no e-mail alerts or RSS feeds—but a customized page is easier to monitor and provides more information.

In a few cases, depending on what state you’re looking at and what you’re looking for, you may find that nothing really meets your needs for monitoring. In that case, you can either monitor the agency home page that most closely matches your interests or you can monitor the home page of the site itself. Neither is ideal, but either one is better than not looking at the site at all. You can also try searching the state Web site for your query words, and monitoring the query results, but search engine quality across state Web sites varies a lot. Be careful.

Federal sites

As you can see, the state Web sites are a leap in complexity compared to the city Web sites. So you might imagine that the national sites are another leap of complexity. And that is the case, but I think there’s been a longer, more consistent effort on the part of the U.S. government to develop its Web presence than there has been for the states as a whole.

There are many places from which you can monitor U.S. government information, but there’s one place in particular where I like to start looking.

FirstGov

FirstGov (firstgov.gov/) is an effort by the U.S. government to create a portal of easily accessible government information. And it’s done a pretty good job! Of course, it’s a tough task. There are tons of government agencies and what seems like infinite programs and departments in which you might be interested. Don’t despair! Start with the RSS Feed page, move to the A-Z Agency List and then try the search engine. Note that the search engine is not searching every agency Web page. It’s just searching an overview list of the types of agencies available. So keep your queries more general.

The RSS Feed page. Yes, the federal government does offer RSS feeds! And as you might expect it offers a lot of RSS feeds! This aggregation feed (firstgov.gov/Topics/Reference_Shelf/Libraries/Podcast_RSS.shtml) points you to the RSS feed pages of various agencies. If you don’t have very detailed, esoteric needs, you might be able to stop here, get a couple of government agency feeds, and be all set. But if that’s not the case, well, proceed to the agency list.

The A-Z Agency List (firstgov.gov/Agencies/Federal/All_Agencies/index.shtml). This page includes a comprehensive list of government agencies, from the 9/11 Commission to the Animal and Plant Health Inspection Service, and from the Office of Thrift Supervision to the Wyoming state, county, and city Web sites. It does include pointers to state resources, so this is another place to look if the state homepage is confusing you. If you know the name of the agency, you can go straight to it, but otherwise you can do a little browsing. Sometimes there won’t be an agency that covers exactly that you’re looking for. In this case, head to the full-text search engine for a little general searching.

FirstGov’s full-text search engine is available from a query box on the upper-right part of each page (Figure 7.22).

Figure 7.22. The A-Z Agency List with FirstGov’s ever-present search box on the upper right.

Image

After spending several chapters encouraging you to get more detailed in your searches, I’m going to back up and ask you to get more general.

Describe in a word or two what you want. If you’re looking for census data, type census. If you’re looking for information on Alaskans, type Alaskans. If you’re looking for information from a particular state, include the state name.

I’ve found that the underlying descriptions of the agencies are adequate, so just doing a general search usually locates the information I want. Even though FirstGov offers a huge amount of data, it’s still a very small data pool compared to the entire Web. You may have to do a little experimenting to find what you want, but I don’t think finding the agency or department relevant to your interests will take long.

Trapping department-level information

As you move between different department homepages, you’ll discover two things: the departments do not have a common design and they vary a lot in their trap-worthiness. Let’s look at the Department of Agriculture, for example (Figure 7.23).

Figure 7.23. The USDA Web site. Notice the news is located in the middle of the page.

Image

Visiting this front page you see no indication of RSS feeds or newsletters to which you may subscribe. What fresh information do you see? You see a pointer to current food recalls. This is a page you could monitor (or get the available alerts by e-mail). Returning to the front page, notice that there’s an Announcements and Events box on the front page, which would also benefit from a page monitor. Notice too that there’s a Newsroom link that provides you with updates on department events—another candidate for a page monitor.

On the other hand, the U.S. Maritime Administration site puts headlines, announcements, and program listings all on one page—the front page—which makes it a very rich source for a page monitor.

In general, you want to keep an eye out for virtual “press rooms,” events announcements, calendars, and anything that begins with the phrase “see our latest.” Some places also offer newsletters—those are generally for consumer-level information. If you’re looking for something higher-level or more esoteric, you’re going to be doing a lot of page monitoring. You will not be able to rely on the newsletters.

The next level of trapping government information is international sites. Most countries have Web sites, but the quality and scope vary even more than they do among city, state, and federal sites.

You have a secret weapon, though. I’ll show you what it is in a minute.

Monitoring International Sites

Alas, you can’t go to Google and search for "the country of X" and get the official country site. Often you get gunk. So you have to rely on a third-party site that aggregates official country sites. Try the CIA Factbook, at cia.gov/cia/publications/factbook/index.html. It provides extensive information about each of the countries of the world, including a map, demographic information, and governmental information (Figure 7.24).

Figure 7.24. The CIA Factbook looks at Aruba.

Image

But the important thing it provides is the “official” name of the country, which you can then use on a search engine.

For example, the conventional long name for Argentina is Argentine Republic. If you do a search for Argentina on Google, you will get over 519 million results. However, if you do a search for "Argentine Republic", you will get only 305,000, and the first result at this writing is a page about Argentina from Embassy.org (a very good resource for US embassy information) that points to an official Argentina embassy site, and from there you can continue to the country.

Keep it simple

You might remember earlier in this chapter my observation that city Web sites can be fairly complex, state Web sites can be more complex, and U.S. government Web sites often are even more complex. So you may be assuming that foreign government sites will be the most confusing of all due to your possible nonfamiliarity with that country’s government or language. That assumption is partly true and partly false.

Any official government Web site has the potential to be mind-bogglingly complex. But because your interests are probably not going to be in minutiae that might be buried on such a site, it may not be that difficult to find what you’re looking for. Except in extremely unusual situations, you will probably not be interested in the day-to-day issues of a country (and if you are, you most likely are familiar with the government or language of your country of interest). Your interest more likely will be broader—a country’s environmental record (but not how it handles recycling in one province), or its industries (but not the activities of one bakery in one town).

If you do have such focused interest, then I would presume that you’re familiar enough with the country to be able to navigate the offerings of its Web site without too much difficulty. If you aren’t, then try to get the information you need from news search engines. Confine your queries to the media of that country, and try to do keyword searches. Sometimes you can get information from keyword searches of the news that it’s tough to get from a country’s Web site.

Type of content

What will you find on the average country’s Web site? It varies a lot. For some countries, you’ll barely find a Web presence. For other countries, you’ll find very complex offerings that span tens of millions of pages of content.

Let’s take for example the Republic of Iceland, which has a Web site at government.is/. Information on the front page includes pointers to various ministries, a link page, and even a pointer to Iceland’s constitution. The middle of the front page is kind of a What’s New for Iceland (Figure 7.25).

Figure 7.25. The Republic of Iceland Web site has all the news located on the center of the front page.

Image

This is a well-designed, extensive Web site, especially for a country of less than 300,000 people. On the other hand, a country’s site may be not much more than a page of information about embassies and general demographics about the country, with pointers to third-party links. It will vary a lot.

Where to look and what to monitor

What you end up monitoring on a country’s Web site depends on what interests you. Often you can narrow your interest down to a certain ministry or easily described topic (agriculture, business, demographics, for instance). If that’s the case, try to find the ministry or department in that country’s governmental structure that addresses that topic. For example, I’m interested in digital information collections provided by countries, so I tend to monitor the pages for a country’s national library. RSS feeds tend to be scarce in my experience, so you need to use page monitors.

If you have more of a general interest in a country, you’re going to have quite a task. If the front page has the latest news on the country’s workings, like Iceland, monitor that. If it doesn’t, try to find the What’s New page (in my experience there’s usually some permutation of a What’s New page). But don’t rely just on the country’s site. Find the major media for that country and monitor its headlines—the front page of that media’s Web site or RSS feeds. You might also want to find that country’s embassy in your country and monitor it too, but really, a general interest in a country is a huge thing to try to encompass. If there’s any way at all to do it, narrow it down.

When You Want What Isn’t There

My space is limited in this book—there are only so many trees in the world. And you may find that the specific topic for which you want to do information trapping is not presented in this chapter. In this case, you can start your own hunt for search engines and other large-index resources to monitor. I can suggest two techniques for doing that: the easy, incomplete way, or the much more difficult, but more thorough way.

The easy, incomplete way

The easy, incomplete way is to visit some general searchable subject indexes like Yahoo Directory or Dmoz, and browse through the directory to the more general categories for your topic. From there, look for a subtopic called Web Directories, or Directories, or Search Engines (Figure 7.26).

Figure 7.26. Getting photography pointers to information collections from the Yahoo Directory.

Image

This list of Web directories will introduce you to deep sources of information about your topic that will include What’s New pages to monitor, pointers to news sources and search engines, and other data-rich places at which you can trap information. The one downside is that searchable subject indexes usually do not include everything, and you may miss some things.

The more difficult, but thorough way

With the more difficult way you may find more sites, but it will be, well, more difficult. Start with a full-text search engine and run a query about your topic, but use modifier keywords to slant your results toward search engines and information collections. What kind of modifier keywords? Terms like search, or what's new, or category Advanced Search are good query phrases to use when you want results that are geared toward finding search engines.

What you’re trying to add to your topic words are words that would normally appear on a search engine or directory’s home page, or on a search engine or directory’s search page.

When you’re narrowing a query by using a lot of additional words, try to use more general words. The combination of tightly focused topic phrases and more general additional phrases should yield results that contain directories of interest, sometimes blogs, sometimes link lists, and sometimes even pages from various Yahoo properties. Figure 7.27 shows an example of this search run in Google.

Figure 7.27. The difficult way will find you both good stuff and gunk.

Image

You will find many resources this way, but at the same time you’ll find a lot of gunk not relevant to your search. You may also find that you have to rerun the searches again and again to catch various resources. You may end up finding much more material than if you searched only in a directory, but it will take you more time, and it will be more difficult.

Moving Beyond Text-Based Information

So far in this chapter we’ve looked at various categories of specific information that you might want to monitor, and explored a few ways that you can find even more specific search engines than the ones we’ve looked at thus far.

But up to this point it’s all been text-based. As I’m sure you’ve noticed, there’s far more than text posted online nowadays. There are images, audio, and even video. Thankfully, even as all these different types of information are being added, more search engines and resources are becoming available to trap them. In the next chapter we’re going to look at something loud, colorful, and fast-moving: multimedia trapping.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.97.47