Today’s feature is a guest post by Branden Long. Branden is the President and CEO of Web Doodle, LLC a web applications development company that specializes in complex database solutions.

A little background on Search Engines

Search engines have evolved drastically since they were first introduced. Early search engines only indexed content for those sites that had been “submitted” to them. Essentially if you didn’t tell them about your web site, it didn’t exist.

The next innovation of search engines included a spidering system, that would find web sites by following all of the links from other sites. The sites were then simply ranked by the number of links. As you can imagine, link farms sprouted up so that they could manipulate the search rankings. This quickly led to the demise of the original MSN, Lycos and Altavista search engines, since the results of their searches would almost always include too many irrelevant results.

Enter Google

Google immediately became a hit because they used the same spidering system that most of the other search engines had adopted, but instead of counting raw links they devised a dynamic method of assigning a value to each link. They called this marvel the PageRank.

In very basic terms, page rank works by calculating a web page’s ranking based on the relevance of the pages that link to them. When a web site about Cars and Trucks receives a link from a web site about Home Loans, they aren’t relevant to each other, so the PageRank is significantly less than if they received a link from a Car Loans web site.

Because Google’s spidering program, called Googlebot, constantly spiders new web sites and continues to monitor existing sites. It’s in a constant state of change, updating the individual PageRank of every web page in its database regularly. This is good because it ensures that their search results are always as relevant as possible.

Google moved into town and took over Search so quickly that Yahoo, MSN and several other’s simply couldn’t change directions quick enough. According to SearchEngineWatch, in December 2007 Google accounted for nearly 60% of all searches conducted on the Internet. Their next closest competitor is Yahoo, at just over 20%.

Google Gets Gamed

Google hasn’t been without its own problems. It took some time, but Search Engine Optimizers (a fancy name for people who get web sites listed on search engines), learned how to manipulate even the highly relevant results of Google. Using a similar technique that ruined search for the other search engines, the “black hat” SEO’s started selling relevant links to paying customers, artificially inflating the target web site’s PageRank and increasing their prominence in the search engines.

Up until this point, Google had been a fully automated system, calculating PageRank on-the-fly. Because Googlebot can’t tell the difference between a free link and a paid link, Google had to create an entire department, whose sole job is to track down and penalize web sites that sell links.

With hundreds of millions of new web pages being created each day, this battle is one that Google simply cannot win, unless they figure out how to automatically determine which links a paid and which aren’t. This is a large problem, but it isn’t the only one…

PageRank is based on web links, not people

Because PageRank is all based on links to and from web sites, you can essentially say that a web site’s relevance is based on the relevance of web sites that link to them. So the people who hold the “power” to make a web site relevant are the webmasters who manage the web sites. What about the users, shouldn’t they have a say?

With over 1.3 billion people searching and using the Internet regularly, it seems only fair that they should have a say in which web sites are relevant and which are not. Google will tell you that they do customize the search experience for their users, but without actually asking them “do you like this site?” how can they expect their system to be accurate?

Social Fabric of Internet Usage

Social Networking, Social Bookmarking, and Blogging in general have taken the Internet by storm, exploding to challenge traditional mainstream news for viewers and advertisers. One of the most interesting facets of all these forms of communication, is that they openly share information, encouraging comments and debate, something mainstream news still resists.

During the early days of the Internet, many people were afraid of corporations’ data mining browser data, cookies, and web log files. Web 2.0 has gone in the opposite direction, with users frantically posting all sorts of private and personal data once considered taboo.

This dramatic shift in thinking is what allows sites like Digg, Delicious and StumbleUpon to not only survive, but thrive. By tracking the likes and dislikes of their users, they can provide a more enjoyable overall Web experience. So far they have all focused on news, articles, web pages and movies; and most don’t have a search system that is dynamic enough to make them effective and efficient. They may also be worried about incurring the wrath of Google.

StumbeUpon Search Beta

StumbleUpon’s innovative bookmarking system allows users to vote for or against web pages, allowing them to explain their vote, and therefore helping them to easily share their opinions with others. Because the system is based on who you are a “Fan” of, you should only be receiving pages from people you’re a fan of, or that you vote similarly too. This, in turn, is based upon your preference of the types of information in which you are interested.

The only drawback is that it has only worked in a browse/stumble sort of way. You couldn’t search for say “Movie Reviews” and get anything that was helpful. Recently however that all changed, with the announcement (which I received by stumbling), of the StumbleUpon Search Beta.

Unlike other search systems, the StumbleUpon Search appears to be customized to the users preferences and voting history. A friend and I tried several exact searches, and were quite happy to find that the results that we each received were occasionally different. In a lot of cases the same sites came up, but were ordered differently.

Similar to the tagging system, the search system also shows users that regularly submit/review web pages dealing with the topic you searched on. This ends up being really handy if you’re trying to find an authority on “environmental law”. It also displays any relevant videos, though sometimes the videos aren’t that relevant.

Conclusions

The StumbleUpon Search system is still in its infancy, but has a lot of potential to dramatically change the way people search for information. As more pages are put into the system and more reviews/votes take place, the search experience should only get better.

A few things I’d like to see added to the StumbeUpon Search:

  • Integration with the toolbar, so I can search directly from a search box in the toolbar
  • Right click integration so that I can search for selected text on a web page (similar to the “Search Google For” functionality in Firefox)
  • Show more than a page of search results (this may be because of limited data, but I doubt it)

I’m looking forward to the future of search, the thing that makes me excited is that in the near future the search results will be relevant to me.