..udayaAnthaya..

Tuesday, September 8, 2009

How Google search works


How many times for a day do you visit google? And can you imagine how many internet users visit google per minute, per day? The statistics shows that 34.7% of the global internet users visit google, where 25.72% visits yahoo which is the 2nd in the list. The secret (actually its not) of so many users relay on google is its simplicity and extraordinary quick result generating ability of the search engine. Have you ever think the technology behind this rapid and accurate search engine? In case if you are interested in finding out the answer continue reading…. The google have a well indexed database of documents which is used to query processing. That is simply it. But the hardest part is to create this database considering all the websites available in the world. Worst, the World Wide Web is very much dynamic at present. To get to know how google overcome this challenge is really interesting. Google uses parallel processing for their data processing which is a method of computation in which many calculations can be performed simultaneously, significantly speeding up data processing. The search engine of google consists of 3 distinct parts which are,

  • Googlebot
  • Indexer
  • The Query Processer


Googlebot

It is easy to explain googlebot as a spider which crawls through world wide web and retrieve pages and hands them off to the indexer. The googlebot send a request to a web server for a web page, downloading the entire page and hand it off to the Google’s indexer. Googlebot can request thousands of different pages simultaneously. To avoid traffic and crowding in web servers with human users, the Googlebot deliberately makes requests of each individual web server more slowly than it’s capable of doing. Another functionality of Googlebot is it visits sites which updates more often other that sites which are updated seldom. For an example Googlebot will visit CNN.com or slstockexchange.com 10 times per hour where it visits kln.ac.lk once a month. This helps the users to retrieve most up-to-data as possible.

Indexer

After getting the full text of the pages which handed over by the Googlebot, the indexer sorted and stored the pages in the index database. This index is sorted alphabetically by search term, with each index entry storing a list of documents in which the term appears and the location within the text where it occurs. This data structure allows rapid access to documents that contain user query terms. To improve search performance, Google ignores (doesn’t index) common words called stop words (such as the, is, on, or, of, how, why, as well as certain single digits and single letters). Stop words are so common that they do little to narrow a search, and therefore they can safely be discarded. The indexer also ignores some punctuation and multiple spaces, as well as converting all letters to lowercase, to improve Google’s performance.


The Query Processer

The query processor has several parts, including the user interface (search box), the “engine” that evaluates queries and matches them to relevant documents, and the results formatter. Google considers over a hundred factors in computing a rank to a page and determining which are most relevant to a query, including the popularity of the page, the position and size of the search terms within the page, and the proximity of the search terms to one another on the page. A patent application discusses other factors that Google considers when ranking a page. Indexing the full text of the web allows Google to go beyond simply matching single search terms. Google gives more priority to pages that have search terms near each other and in the same order as the query. Google can also match multi-word phrases and sentences.

This simple diagram shows how google process a query.

1 Comments:

Post a Comment

Subscribe to Post Comments [Atom]



<< Home