San Mateo County Times

Web crashes put reliability in question
Thursday, July 8, 1999

By Liz Garone
STAFF WRITER

Imagine driving to the grocery store in the middle of the day, only to find chained doors and a sign reading: Closed. Please try again later.

"It would be totally unacceptable," said Francine Schlaks, who sells kids' backpacks on EBay, the Internet's largest auction site. "People wouldn't stand for it. They'd go somewhere else."

But recently, closed doors have been the experience at a number of prominent Web sites. Last month, online auction site EBay, based in San Jose, was down for 22 hours. Afterwards, their stock plummeted 30 points. Over the first six months of the year, online investment sites ETrade of Menlo Park and Schwab.com of San Francisco have had numerous crashes. While the outages were shorter, they still undermined customers' confidence.

In a system with as many pieces as the Internet, there are dozens of things that can go wrong, experts say. And while engineers are coming up with bandages for specific problems, they say no one can guarantee a Web site will be up and running all the time.

"You have to continue to assume that, at some point, you're going to crash," said Cormac Foster of Jupiter Communications, an online research firm." No application is ever 100 percent bug-free."

Until recently, occasional outages and downtimes were expected and grudgingly accepted. But the tide is turning. Online customers are demanding more reliability -- especially when their own dollars are at stake.

"It's no longer acceptable to go down and just give an excuse," said Allan Mohess of Computer Associates, which provides software-based solutions for Web crashes.

For online companies, a significant crash can have serious consequences.

Not only did EBay's stock price plunge overnight after its crash, the company also lost $3 million to $5 million in revenue.

Companies also run the risk of losing customers for good.

"It's only a mouse click away to go to the competition," said Daniel Todd, director of strategic marketing for San Mateo-based Keynote Systems, which tracks the reliability of e-commerce sites.

In the case of auction and stock sites, visitors have a growing number of sites from which to choose when their favorite sites go down.

"I don't know what will happen if EBay has another one of those outages," said Schlaks, who sells her backpacks on the site. "I pity them if they do. People aren't going to be so forgiving the next time."

Sites stall and crash for a number of reasons, including having too many visitors, inadequate hardware, buggy software and human error.

Many of the problems can be traced to the Web's explosive growth, according to Richard Fichera, vice president of research for Giga Information Group's technical division. There are just too many people wanting too much information too quickly, he said.

And, too many visitors means an overloaded Web server -- the computer operated by a company that houses its Web site. The Web server is like a company's reception area -- build it too small and you're not going to accommodate crowds of people.

When the Mormon Church decided to open its genealogy Web site (www.familysearch.org) to the public in May, it had no foolproof way of predicting how many visitors it would get.

"We knew there was going to be interest, and we were prepared for a significant number of hits," said Dan Rascon, a church spokesman. "But it just snowballed."

The site never officially crashed. But, many of the tens of thousands of people who wanted to research their family names were unable to get on. Those who did found the site excruciatingly slow.

LavaStorm, the Boston-based company that built and maintains the Family Search site, found a temporary fix: timed visits. A limited number of people could get on to the site for a fixed amount of time. Then, another batch would be allowed on.

This went on for a number of days -- until LavaStorm engineers devised a more permanent solution, according to Alex Dunn, a company spokesman. They added more machines to handle the thousands of waiting people.

In the case of EBay, the exact cause of its crash is still unknown a month later. EBay has isolated it to the Sun Microsystems software used to run the site, according to Kevin Pursglove, an EBay spokesman.

A site redesign was being tested when the crash took place.

Upgrades and new interfaces have been known to wreak havoc with sites, causing many of them to crash, according to Andrew Bartels, a senior analyst with Giga Information Group, another online research firm.

But, EBay is not willing to shove off all the blame on Sun.

"The bottom line is it's our house," said Pursglove. "The guests that we invite into our house are our responsibility."

Company engineers -- along with engineers from Sun and Oracle -- have been working frantically to build a reliable soft backup system to replace the main server if -- and when -- it goes down again.

A soft backup means that the switchover isn't automatic and still can mean a downtime of one to one-and-a-half hours.

"Even that isn't desirable, but it's much better than 22 hours," said Pursglove, who could give no timetable for when that backup system will be in place.

There are three main ways that companies are trying to improve their reliability.

One is called scalability, designing a system in which they can easily add or subtract computers to deal with fluctuations in demand.

"Companies have to scale up rapidly to meet the increased demand and to stay competitive," Bartels said.

Another is improving backup systems. Incredibly enough, some major companies still have no backup system at all.

Earlier this month, San Francisco-based Charles Schwab Corp., which runs Schwab.com, announced that it had brought in computer networking equipment from IBM to prevent crashes -- or, at least, prevent the appearance of them.

With the new technology, trading on Schwab's site will be more evenly split among the company's 16 mainframes housed in Phoenix, according to IBM. If there's a problem on one computer, trading should smoothly switch to another.

A third way is testing sites before opening them to the public.

"All the major computer software that is produced goes through extensive testing before it is shipped to the customer," said Jupiter's Foster. "The Web shouldn't be any different."

Even with companies adopting these new methods to improve the reliability of their Web sites, users should still expect some downtime, Cormac said.

Still, with the number of opportunities for failure and crashes as great as it is, the Internet does a fairly good job, according to Keynote's Todd.

"In general, the Internet is fairly robust, he said. You can almost always get to the Web site you want."

<<<< Back to my Index page.