PC World

Engines and Directories: The Basics

Despite the profusion of methods for searching the Net, the most common tools remain search engines and directories, each of which has its own merits. The main difference between the two is human intervention. Search engines use automated software "spiders" that crawl through the Web to collect and index the full text of pages that they find.

Directories, by contrast, rely on human editors to sift through pages, winnowing out inappropriate ones and categorizing sites by subject. Nothing goes into the directory unless an editor approves it, so you're unlikely to find Christmas recipes in a Yahoo category about ham radios. But since directories are crafted by hand, they are far less comprehensive than search engines. Conservative estimates put the current size of the Web at a billion pages (and gauge its growth at a million pages per day). Most engines don't come close to indexing the whole Net, but at press time Google claimed to have indexed a billion pages either partially or completely. The largest directory, called the Open Directory Project, is tiny in comparison, with a current index of only about 2 million sites.


Hybrid Search Sites

Directories and search engines have different virtues, so most major search sites meld aspects of each. AltaVista, for example, supplements its search engine with a directory that uses listings from the LookSmart and Open Directory indexes. Similarly, if you search at LookSmart or Yahoo, the site first provides results from its own directory, and then passes your query to a search engine.

Of course, many search sites do a lot more than provide an engine and a directory--Excite, Go.com, Lycos, Yahoo, and others have morphed into portals offering an array of services, including stock quotes, news, e-mail, shopping, and anything else to dissuade you from going elsewhere on the Web. That scattershot strategy keeps visitors glued to pages long enough to notice the banner ads, helping the sites make a buck. But while much of the stuff portals offer is useful or fun, these sites can become so cluttered with peripheral features that the search tools seem an afterthought.

Some search sites are returning to the basics. Google started the trend last year: Its sleek home page consists of little more than the Google logo, the field where you type in your queries, and a couple of buttons. More recently, AltaVista launched Raging Search, which offers the AltaVista search engine without the clutter of the AltaVista site. AltaVista says that searches on the two sites can yield different results, but our tests produced identical listings each time. In any event, pure search sites are excellent news for Web surfers who want to avoid the distractions of a full-service portal.

Search Engines: The Race for Relevancy

Logically you'd expect the search engine that indexes the most pages to have the best chance of finding what you need. But if an engine doesn't properly organize the sites it finds, the one you want may be buried beneath thousands of irrelevant links. What good does it do for an engine to deliver 6000 environmental links when you really want a site for the rock group Green Day?

At Lycos, for example, we searched for Ford Motor Company: The first three links consisted of one to Lycos's own motorcycle section, one to the Nature Company, and a third to Living.com, an online furniture store. At Ask Jeeves, which lets you pose queries in the form of a full statement or question, we asked Who is John Kerry? in hopes of finding a biography of the Massachusetts senator. But Jeeves' first link led to geographical information on Kerry, Ireland.

Of the sites we tested, Google provided the most consistently pertinent results--one reason it's our favorite engine overall. The site uses "page rank" technology to track the number of pages that link to a site. If a lot of pages link to a particular site on a specific topic, the reasoning goes, that site must be relevant to that subject. Consequently, Google gives it higher placement in the results.

Google has such confidence in this theory that it offers an "I'm feeling lucky" button you can click to go directly to the site Google thinks is most relevant to your search. But it might more appropriately be called "We're feeling lucky," since Google is gambling that the site it selects is the one you want. In our testing, Google's optimism was justified in some cases. We searched for Al Gore's official campaign site using the keywords Al Gore campaign site. Google listed algore2000.com as the first item, whereas GoTo.com buried it beneath nearly 70 other links.

Google was less fortunate when we searched with the keywords world beat, hoping for information on music; the engine sent us to the Web site of travel publisher Rough Guides. However, when we put quotation marks around the query ("world beat"), Google's luck (and ours) improved: It sent us to the highly relevant world beat page at the Internet Underground Music Archive--proof that subtle differences in how you enter a query can have a big impact on the results you receive.

Another Google plus: Below each link it finds, Google provides a snippet of text with the word or words you searched for highlighted in bold. That helps you eyeball the results and gauge the relevance of each link quickly. (Most engines simply display the first line or two of text from each linked page, whether the text contains your search terms or not.)

Direct Hit takes a different approach to maximizing the relevance of results by organizing them based on their popularity with previous searchers. For instance, if few searchers clicked the first link for a search about computer chips (say, because it is actually for a site about potato chips), that link will sink lower on the list. Several engines, including HotBot, Lycos, and MSN Search, have adopted Direct Hit's technology. Our tests of Direct Hit itself, however, produced mixed results. When we searched for Pok�mon and Queen, the engine served up the official site first in each case. But of ten results for world beat, only four were relevant. Most of the others weren't related to music at all.

<< Back