How Does Google Work?
Apr 1st, 2008 | By Shamim | Category: Google, How To's
The most commonly asked question by all novice webmasters is how to rank well in Google. But before you can rank well with Google, or any other search engine, you have to understand how they work. Not just the interface of Google or what it says on their FAQ pages. It is in the interest of all bloggers, webmasters and even Google that everyone gets a clearer picture of how Google works behind the scenes. There are 3 parts to how Google delivers quality results each and every time.
Part 1: Crawling The Web
This is the first step in Google’s search process. According to Netcraft there are about 30 billion total web pages as of February 2007. That is not something to spend time on. The more important issue is how does Google go about finding all these pages? In simple terms Google starts out by taking the first page – think of any random page on the internet. Once it has a starting point the Google bot then follows the links from that page. During this link following process the Google bot stores a copy of the pages on Google servers. During this stage some auxiliary work is also performed - such as duplicate content removal and taking care not to overload the target webservers.
Part 2: Building The Google Index
After the pages have been crawled and copied onto Google’s servers across multiple datacenters, Google goes about building the renowned Index. This is one of the most complicated tasks that the search engine performs. Google uses the cached version of all previously crawled pages on its servers to compute PageRank. Basically Pagerank is a measure of importance and how many other pages point to the pages. It also takes into consideration hundreds of other factors many of which are secrets that Google does not publicize. The ones which are known include number of inbound links, which is not just numbers but the importance of the referring pages and the extension of the referring domains.
For example .edu and .gov sites get far more link juice than simple .com, .org, .info etc. PageRank and position in the Google index also depends on the number of outgoing links – the more links going out the less will be your PageRank. Backward links and loops also get taken into account. Thus if you link to example.com in exchange for a link from example.com then Google’s indexing algorithm compensates for the link loop that was created. In effect this forces the algorithm to repeat the formula of PageRank several times – in the words of Google Engineers calculating PareRank is like computing the Eigen Vector of a 8billion-by-8billion matrix.
During the indexing process every word crawled on the web gets indexed. Notice that I said every word on every page. This creates a huge list of words and all the pages they occur. In order to search this immense index Google uses their own secret algorithm which makes searches faster than any off-the-shelf product. Similar technology is used by Google Mini which businesses can buy to make use of Google’s algorithm on their corporate lan.
Part 3: Serving Up Results To Users
This is the most commonly misinterpreted section of Google Search. When a user types in queries Google does not go out and crawl web pages. The results that will be served to the user gets computed beforehand. When user presses the “Search” button Google only serves up the results – no new crawling occurs when “serving up” the answers to the query.
Let me explain in a bit more detail. When a query comes in via the Google webserver it gets transferred to Google’s Index servers. The index being queried might be spread over multiple servers and the index is often sliced into multiple pieces and several copies of the index exist. So the Query goes out to index server and when query consists of multiple terms the intersection of the terms in the index are then analyzed using PageRank, how often they appear, the density of the terms in the pages (how close they occur), if they appear in the title, anchor text of links pointing to this page, terms are bold in the page and hundreds of other factors that Google does not make public.
Finally comes the time when Google delivers the results. The top 10 queries get formatted on the page that is about to be served up to the user and that data is passed on to Google’s Document Servers which have original pages. Note that the Document servers have copies of or point to the live pages on the internet which are used to generate snippets. The snippets are 3 line summary of each link on the result pages. Snippets are automated summaries of the best point on page which answers the users query.
Hope this gives you a better understanding of the process at work behind Google’s façade and colorful logo. Use this information to understand how your website can do better on the internet. I used this knowledge to outrank Google as I discussed on one of my previous articles.
Did you like this article? Subscribe to our
full RSS feed or
subscribe via email.














How Does Google Work? | TeqEdge Blog…
Before you can rank well with Google, or any other search engine, you have to understand how they work. Not just the interface of Google or what it says on their FAQ pages. It is in the interest of all bloggers, webmasters and even Google that everyone…
I am not sure that I can completely understand your comments. Would you be so kind as to expand on your reasoning a little more before I comment.
Very good article.
Especially loved the words, eigen vector of a 8billion by 8 billion matrix
Quakeboy’s last blog post..Helicopter Malfunction near Microsoft building !